WorldWideScience

Sample records for base sequence effects

  1. Children's Recall of Script-Based Event Sequences: The Effect of Sequencing.

    Science.gov (United States)

    Catellani, Patrizia

    1991-01-01

    Preschool and first grade children's recall of script-based event sequences was studied in four different instruction conditions. Differences in sequencing ability were observed in relation to age and sequence. Findings indicate that at both ages, the effort involved in sequencing aids semantic processing of the material and enhances recall. (SH)

  2. Studies of base pair sequence effects on DNA solvation based on all ...

    Indian Academy of Sciences (India)

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization ...

  3. Studies of base pair sequence effects on DNA solvation based on all

    Indian Academy of Sciences (India)

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization ...

  4. Studies of base pair sequence effects on DNA solvation based on all ...

    Indian Academy of Sciences (India)

    2012-06-25

    Jun 25, 2012 ... [Dixit SB, Mezei M and Beveridge DL 2012 Studies of base pair sequence effects on DNA salvation based on all-atom molecular dynamics simulations. J. Biosci. 37 399–421] DOI 10.1007/s12038-012-9223-5. 1. Introduction. Solvation plays an integral role in stabilizing the structure of the DNA molecule in ...

  5. Studies of base pair sequence effects on DNA solvation based on all ...

    Indian Academy of Sciences (India)

    2012-06-25

    Jun 25, 2012 ... base pair sequence, both via direct interactions and indirectly via sequence preferences for ... Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/ ..... trapped for fairly long times by current routine simulation lengths (~10 ns) but ...

  6. Studies of base pair sequence effects on DNA solvation based on all-atom molecular dynamics simulations.

    Science.gov (United States)

    Dixit, Surjit B; Mezei, Mihaly; Beveridge, David L

    2012-07-01

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization were observed in these simulations. The results were compared to essentially all known experimental data on the subject. Proximity analysis was employed to highlight the sequence dependent differences in solvation and ion localization properties in the grooves of DNA. Comparison of the MD-calculated DNA structure with canonical A- and B-forms supports the idea that the G/C-rich sequences are closer to canonical A- than B-form structures, while the reverse is true for the poly A sequences, with the exception of the alternating ATAT sequence. Analysis of hydration density maps reveals that the flexibility of solute molecule has a significant effect on the nature of observed hydration. Energetic analysis of solute-solvent interactions based on proximity analysis of solvent reveals that the GC or CG base pairs interact more strongly with water molecules in the minor groove of DNA that the AT or TA base pairs, while the interactions of the AT or TA pairs in the major groove are stronger than those of the GC or CG pairs. Computation of solvent-accessible surface area of the nucleotide units in the simulated trajectories reveals that the similarity with results derived from analysis of a database of crystallographic structures is excellent. The MD trajectories tend to follow Manning's counterion condensation theory, presenting a region of condensed counterions within a radius of about 17 A from the DNA surface independent of sequence. The GC and CG pairs tend to associate with cations in the major groove of the DNA structure to a greater extent than the AT and TA pairs. Cation association is more frequent in the minor groove of AT than the GC pairs. In general, the

  7. Neuromuscular adaptations to water-based concurrent training in postmenopausal women: effects of intrasession exercise sequence.

    Science.gov (United States)

    Pinto, Stephanie S; Alberton, Cristine L; Bagatini, Natália C; Zaffari, Paula; Cadore, Eduardo L; Radaelli, Régis; Baroni, Bruno M; Lanferdini, Fábio J; Ferrari, Rodrigo; Kanitz, Ana Carolina; Pinto, Ronei S; Vaz, Marco Aurélio; Kruel, Luiz Fernando M

    2015-02-01

    This study investigated the effects of different exercise sequences on the neuromuscular adaptations induced by water-based concurrent training in postmenopausal women. Twenty-one healthy postmenopausal women (57.14 ± 2.43 years) were randomly placed into two water-based concurrent training groups: resistance training prior to (RA, n = 10) or after (AR, n = 11) aerobic training. Subjects performed resistance and aerobic training twice a week over 12 weeks, performing both exercise types in the same training session. Upper (elbow flexors) and lower-body (knee extensors) one-repetition maximal test (1RM) and peak torque (PT) (knee extensors) were evaluated. The muscle thickness (MT) of upper (biceps brachii) and lower-body (vastus lateralis) was determined by ultrasonography. Moreover, the maximal and submaximal (neuromuscular economy) electromyographic activity (EMG) of lower-body (vastus lateralis and rectus femoris) was measured. Both RA and AR groups increased the upper- and lower-body 1RM and PT, while the lower-body 1RM increases observed in the RA was greater than AR (34.62 ± 13.51 vs. 14.16 ± 13.68 %). RA and AR showed similar MT increases in upper- and lower-body muscles evaluated. In addition, significant improvements in the maximal and submaximal EMG of lower-body muscles in both RA and AR were found, with no differences between groups. Both exercise sequences in water-based concurrent training presented relevant improvements to promote health and physical fitness in postmenopausal women. However, the exercise sequence resistance-aerobic optimizes the strength gains in lower limbs.

  8. Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA Sequencing.

    Science.gov (United States)

    Pope, Bernard J; Hammet, Fleur; Nguyen-Dumont, Tu; Park, Daniel J

    2018-01-01

    Hi-Plex is a suite of methods to enable simple, accurate, and cost-effective highly multiplex PCR-based targeted sequencing (Nguyen-Dumont et al., Biotechniques 58:33-36, 2015). At its core is the principle of using gene-specific primers (GSPs) to "seed" (or target) the reaction and universal primers to "drive" the majority of the reaction. In this manner, effects on amplification efficiencies across the target amplicons can, to a large extent, be restricted to early seeding cycles. Product sizes are defined within a relatively narrow range to enable high-specificity size selection, replication uniformity across target sites (including in the context of fragmented input DNA such as that derived from fixed tumor specimens (Nguyen-Dumont et al., Biotechniques 55:69-74, 2013; Nguyen-Dumont et al., Anal Biochem 470:48-51, 2015), and application of high-specificity genetic variant calling algorithms (Pope et al., Source Code Biol Med 9:3, 2014; Park et al., BMC Bioinformatics 17:165, 2016). Hi-Plex offers a streamlined workflow that is suitable for testing large numbers of specimens without the need for automation.

  9. A model of human motor sequence learning explains facilitation and interference effects based on spike-timing dependent plasticity.

    Directory of Open Access Journals (Sweden)

    Quan Wang

    2017-08-01

    Full Text Available The ability to learn sequential behaviors is a fundamental property of our brains. Yet a long stream of studies including recent experiments investigating motor sequence learning in adult human subjects have produced a number of puzzling and seemingly contradictory results. In particular, when subjects have to learn multiple action sequences, learning is sometimes impaired by proactive and retroactive interference effects. In other situations, however, learning is accelerated as reflected in facilitation and transfer effects. At present it is unclear what the underlying neural mechanism are that give rise to these diverse findings. Here we show that a recently developed recurrent neural network model readily reproduces this diverse set of findings. The self-organizing recurrent neural network (SORN model is a network of recurrently connected threshold units that combines a simplified form of spike-timing dependent plasticity (STDP with homeostatic plasticity mechanisms ensuring network stability, namely intrinsic plasticity (IP and synaptic normalization (SN. When trained on sequence learning tasks modeled after recent experiments we find that it reproduces the full range of interference, facilitation, and transfer effects. We show how these effects are rooted in the network's changing internal representation of the different sequences across learning and how they depend on an interaction of training schedule and task similarity. Furthermore, since learning in the model is based on fundamental neuronal plasticity mechanisms, the model reveals how these plasticity mechanisms are ultimately responsible for the network's sequence learning abilities. In particular, we find that all three plasticity mechanisms are essential for the network to learn effective internal models of the different training sequences. This ability to form effective internal models is also the basis for the observed interference and facilitation effects. This suggests that

  10. A model of human motor sequence learning explains facilitation and interference effects based on spike-timing dependent plasticity.

    Science.gov (United States)

    Wang, Quan; Rothkopf, Constantin A; Triesch, Jochen

    2017-08-01

    The ability to learn sequential behaviors is a fundamental property of our brains. Yet a long stream of studies including recent experiments investigating motor sequence learning in adult human subjects have produced a number of puzzling and seemingly contradictory results. In particular, when subjects have to learn multiple action sequences, learning is sometimes impaired by proactive and retroactive interference effects. In other situations, however, learning is accelerated as reflected in facilitation and transfer effects. At present it is unclear what the underlying neural mechanism are that give rise to these diverse findings. Here we show that a recently developed recurrent neural network model readily reproduces this diverse set of findings. The self-organizing recurrent neural network (SORN) model is a network of recurrently connected threshold units that combines a simplified form of spike-timing dependent plasticity (STDP) with homeostatic plasticity mechanisms ensuring network stability, namely intrinsic plasticity (IP) and synaptic normalization (SN). When trained on sequence learning tasks modeled after recent experiments we find that it reproduces the full range of interference, facilitation, and transfer effects. We show how these effects are rooted in the network's changing internal representation of the different sequences across learning and how they depend on an interaction of training schedule and task similarity. Furthermore, since learning in the model is based on fundamental neuronal plasticity mechanisms, the model reveals how these plasticity mechanisms are ultimately responsible for the network's sequence learning abilities. In particular, we find that all three plasticity mechanisms are essential for the network to learn effective internal models of the different training sequences. This ability to form effective internal models is also the basis for the observed interference and facilitation effects. This suggests that STDP, IP, and SN

  11. External control of the stream of consciousness: Stimulus-based effects on involuntary thought sequences.

    Science.gov (United States)

    Merrick, Christina; Farnia, Melika; Jantz, Tiffany K; Gazzaley, Adam; Morsella, Ezequiel

    2015-05-01

    The stream of consciousness often appears whimsical and free from external control. Recent advances, however, reveal that the stream is more susceptible to external influence than previously assumed. Thoughts can be triggered by external stimuli in a manner that is involuntary, systematic, and nontrivial. Based on these advances, our experimental manipulation systematically triggered a sequence of, not one, but two involuntary thoughts. Participants were instructed to (a) not subvocalize the name of visual objects and (b) not count the number of letters comprising object names. On a substantial proportion of trials, participants experienced both kinds of involuntary thoughts. Each thought arose from distinct, high-level processes (naming versus counting). This is the first demonstration of the induction of two involuntary thoughts into the stream of consciousness. Stimulus word length influenced dependent measures systematically. Our findings are relevant to many fields associated with the study of consciousness, including attention, imagery, and action control. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Putting instruction sequences into effect

    NARCIS (Netherlands)

    Bergstra, J.A.

    2011-01-01

    An attempt is made to define the concept of execution of an instruction sequence. It is found to be a special case of directly putting into effect of an instruction sequence. Directly putting into effect of an instruction sequences comprises interpretation as well as execution. Directly putting into

  13. Effect of selection and sequencing of representative wave conditions on process-based predictions of equilibrium embayed beach morphology

    Science.gov (United States)

    Daly, Christopher J.; Bryan, Karin R.; Gonzalez, Mauricio R.; Klein, Antonio H. F.; Winter, Christian

    2014-06-01

    In order to decrease the simulation time of morphodynamic models, often-complex wave climates are reduced to a few representative wave conditions (RWC). When applied to embayed beaches, a test of whether a reduced wave climate is representative or not is to see whether it can recreate the observed equilibrium (long-term averaged) bathymetry of the bay. In this study, the wave climate experienced at Milagro Beach, Tarragona, Spain was discretized into `average' and `extreme' RWCs. Process-based morphodynamic simulations were sequenced and merged based on `persistent' and `transient' forcing conditions, the results of which were used to estimate the equilibrium bathymetry of the bay. Results show that the effect of extreme wave events appeared to have less influence on the equilibrium of the bay compared to average conditions of longer overall duration. Additionally, the persistent seasonal variation of the wave climate produces pronounced beach rotation and tends to accumulate sediment at the extremities of the beach, rather than in the central sections. It is, therefore, important to account for directional variability and persistence in the selection and sequencing of representative wave conditions as is it essential for accurately balancing the effects beach rotation events.

  14. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies.

    Directory of Open Access Journals (Sweden)

    Patrick D Schloss

    Full Text Available Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results

  15. Method for sequencing DNA base pairs

    Science.gov (United States)

    Sessler, Andrew M.; Dawson, John

    1993-01-01

    The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.

  16. Effects of an Additional Sequence of Color Stimuli on Visuomotor Sequence Learning

    Directory of Open Access Journals (Sweden)

    Kanji Tanaka

    2017-06-01

    Full Text Available Through practice, people are able to integrate a secondary sequence (e.g., a stimulus-based sequence into a primary sequence (e.g., a response-based sequence, but it is still controversial whether the integrated sequences lead to better learning than only the primary sequence. In the present study, we aimed to investigate the effects of a sequence that integrated space and color sequences on early and late learning phases (corresponding to effector-independent and effector-dependent learning, respectively and how the effects differed in the integrated and primary sequences in each learning phase. In the task, the participants were required to learn a sequence of button presses using trial-and-error and to perform the sequence successfully for 20 trials (m × n task. First, in the baseline task, all participants learned a non-colored sequence, in which the response button always turned red. Then, in the learning task, the participants were assigned to two groups: a colored sequence group (i.e., space and color or a non-colored sequence group (i.e., space. In the colored sequence, the response button turned a pre-determined color and the participants were instructed to attend to the sequences of both location and color as much as they could. The results showed that the participants who performed the colored sequence acquired the correct button presses of the sequence earlier, but showed a slower mean performance time than those who performed the non-colored sequence. Moreover, the slower performance time in the colored sequence group remained in a subsequent transfer task in which the spatial configurations of the buttons were vertically mirrored from the learning task. These results indicated that if participants explicitly attended to both the spatial response sequence and color stimulus sequence at the same time, they could develop their spatial representations of the sequence earlier (i.e., early development of the effector

  17. Comparison of sequence-based and structure-based phylogenetic ...

    Indian Academy of Sciences (India)

    Prakash

    Sequence-based and structure-based phylogeny. 85. J. Biosci. 32(1), January 2007 dispersion is independent of phylogenetic tree construction algorithms. 3. ... two groups only. 3.1 Families with high correlation between sequence- based and structure-based phylogenetic trees. There are 27 families with considerably high ...

  18. Comparison of sequence-based and structure-based phylogenetic ...

    Indian Academy of Sciences (India)

    Prakash

    pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low ...

  19. The Effect of a Classroom-Based Intensive Robotics and Programming Workshop on Sequencing Ability in Early Childhood

    Science.gov (United States)

    Kazakoff, Elizabeth R.; Sullivan, Amanda; Bers, Marina U.

    2013-01-01

    This paper examines the impact of programming robots on sequencing ability during a 1-week intensive robotics workshop at an early childhood STEM magnet school in the Harlem area of New York City. Children participated in computer programming activities using a developmentally appropriate tangible programming language CHERP, specifically designed…

  20. Deep Illumina-based shotgun sequencing reveals dietary effects on the structure and function of the fecal microbiome of growing kittens.

    Directory of Open Access Journals (Sweden)

    Oliver Deusch

    Full Text Available Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome.Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high-protein, low-carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC were collected at 8, 12 and 16 weeks of age (n = 6 per group. A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007 between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022 enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome.These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary

  1. Effects of polymerase, template dilution and cycle number on PCR based 16 S rRNA diversity analysis using the deep sequencing method

    Directory of Open Access Journals (Sweden)

    Zou Fei

    2010-10-01

    Full Text Available Abstract Background The primer and amplicon length have been found to affect PCR based estimates of microbial diversity by pyrosequencing, while other PCR conditions have not been addressed using any deep sequencing method. The present study determined the effects of polymerase, template dilution and PCR cycle number using the Solexa platform. Results The PfuUltra II Fusion HS DNA Polymerase (Stratagene with higher fidelity showed lower amount of PCR artifacts and determined lower taxa richness than the Ex Taq (Takara. More importantly, the two polymerases showed different efficiencies for amplifying some of very abundant sequences, and determined significantly different community structures. As expected, the dilution of the DNA template resulted in a reduced estimation of taxa richness, particularly at the 200 fold dilution level, but the community structures were similar for all dilution levels. The 30 cycle group increased the PCR artifacts while comparing to the 25 cycle group, but the determined taxa richness was lower than that of the 25 cycle group. The PCR cycle number did not changed the microbial community structure significantly. Conclusions These results highlight the PCR conditions, particularly the polymerase, have significant effect on the analysis of microbial diversity with next generation sequencing methods.

  2. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  3. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids.

    Science.gov (United States)

    Jansen, Robert K; Kaittanis, Charalambos; Saski, Christopher; Lee, Seung-Bum; Tomkins, Jeffrey; Alverson, Andrew J; Daniell, Henry

    2006-04-09

    The Vitaceae (grape) is an economically important family of angiosperms whose phylogenetic placement is currently unresolved. Recent phylogenetic analyses based on one to several genes have suggested several alternative placements of this family, including sister to Caryophyllales, asterids, Saxifragales, Dilleniaceae or to rest of rosids, though support for these different results has been weak. There has been a recent interest in using complete chloroplast genome sequences for resolving phylogenetic relationships among angiosperms. These studies have clarified relationships among several major lineages but they have also emphasized the importance of taxon sampling and the effects of different phylogenetic methods for obtaining accurate phylogenies. We sequenced the complete chloroplast genome of Vitis vinifera and used these data to assess relationships among 27 angiosperms, including nine taxa of rosids. The Vitis vinifera chloroplast genome is 160,928 bp in length, including a pair of inverted repeats of 26,358 bp that are separated by small and large single copy regions of 19,065 bp and 89,147 bp, respectively. The gene content and order of Vitis is identical to many other unrearranged angiosperm chloroplast genomes, including tobacco. Phylogenetic analyses using maximum parsimony and maximum likelihood were performed on DNA sequences of 61 protein-coding genes for two datasets with 28 or 29 taxa, including eight or nine taxa from four of the seven currently recognized major clades of rosids. Parsimony and likelihood phylogenies of both data sets provide strong support for the placement of Vitaceae as sister to the remaining rosids. However, the position of the Myrtales and support for the monophyly of the eurosid I clade differs between the two data sets and the two methods of analysis. In parsimony analyses, the inclusion of Gossypium is necessary to obtain trees that support the monophyly of the eurosid I clade. However, maximum likelihood analyses place

  4. Phylogenetic analyses of Vitis (Vitaceae based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids

    Directory of Open Access Journals (Sweden)

    Alverson Andrew J

    2006-04-01

    Full Text Available Abstract Background The Vitaceae (grape is an economically important family of angiosperms whose phylogenetic placement is currently unresolved. Recent phylogenetic analyses based on one to several genes have suggested several alternative placements of this family, including sister to Caryophyllales, asterids, Saxifragales, Dilleniaceae or to rest of rosids, though support for these different results has been weak. There has been a recent interest in using complete chloroplast genome sequences for resolving phylogenetic relationships among angiosperms. These studies have clarified relationships among several major lineages but they have also emphasized the importance of taxon sampling and the effects of different phylogenetic methods for obtaining accurate phylogenies. We sequenced the complete chloroplast genome of Vitis vinifera and used these data to assess relationships among 27 angiosperms, including nine taxa of rosids. Results The Vitis vinifera chloroplast genome is 160,928 bp in length, including a pair of inverted repeats of 26,358 bp that are separated by small and large single copy regions of 19,065 bp and 89,147 bp, respectively. The gene content and order of Vitis is identical to many other unrearranged angiosperm chloroplast genomes, including tobacco. Phylogenetic analyses using maximum parsimony and maximum likelihood were performed on DNA sequences of 61 protein-coding genes for two datasets with 28 or 29 taxa, including eight or nine taxa from four of the seven currently recognized major clades of rosids. Parsimony and likelihood phylogenies of both data sets provide strong support for the placement of Vitaceae as sister to the remaining rosids. However, the position of the Myrtales and support for the monophyly of the eurosid I clade differs between the two data sets and the two methods of analysis. In parsimony analyses, the inclusion of Gossypium is necessary to obtain trees that support the monophyly of the eurosid I clade

  5. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  6. Base sequence effects on DNA replication influenced by bulky adducts. Final report, March 1, 1995--February 28, 1997

    Energy Technology Data Exchange (ETDEWEB)

    Geacintov, N.E.

    1997-05-31

    Polycyclic aromatic hydrocarbons (PAH) are environmental pollutants that are present in air, food, and water. While PAH compounds are chemically inert and are sparingly soluble in aqueous solutions, in living cells they are metabolized to a variety of oxygenated derivatives, including the high mutagenic and tumorigenic diol epoxide derivatives. The diol epoxides of the sterically hindered fjord region compound benzo[c]phenanthrene (B[c]PhDE) are among the most powerful tumorigenic compounds in animal model test systems. In this project, site-specifically modified oligonucleotides containing single B[c]PhDE-N{sup 6}-dA lesions derived from the reactions of the 1S,2R,3R,4S and 1R,2S,3S,4R diol epoxides of B[c]PhDE with dA residues were synthesized. The replication of DNA catalyzed by a prokaryotic DNA polymerase (the exonuclease-free Klenow fragment E. Coli Po1 I) in the vicinity of the lesion at base-specific sites on B[c]PhDE-modified template strands was investigated in detail. The Michaelis-Menten parameters for the insertion of single deoxynucleotide triphosphates into growing DNA (primer) strands using the modified dA* and the bases just before and after the dA* residue as templates, depend markedly on the stereochemistry of the B[c]PhDE-modified dA residues. These observations provide novel insights into the mechanisms by which bulky PAH-DNA adducts affect normal DNA replication.

  7. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  8. Skeleton-based human action recognition using multiple sequence alignment

    Science.gov (United States)

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  9. Secondary structure-based analysis of mouse brain small RNA sequences obtained by using next-generation sequencing.

    Science.gov (United States)

    Kiyosawa, Hidenori; Okumura, Akio; Okui, Saya; Ushida, Chisato; Kawai, Gota

    2015-08-01

    In order to find novel structured small RNAs, next-generation sequencing was applied to small RNA fractions with lengths ranging from 40 to 140 nt and secondary structure-based clustering was performed. Sequences of structured RNAs were effectively clustered and analyzed by secondary structure. Although more than 99% of the obtained sequences were known RNAs, 16 candidate mouse structured small non-coding RNAs (MsncRs) were isolated. Based on these results, the merits of secondary structure-based analysis are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Solid-State Nanopore-Based DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Zewen Liu

    2016-01-01

    Full Text Available The solid-state nanopore-based DNA sequencing technology is becoming more and more attractive for its brand new future in gene detection field. The challenges that need to be addressed are diverse: the effective methods to detect base-specific signatures, the control of the nanopore’s size and surface properties, and the modulation of translocation velocity and behavior of the DNA molecules. Among these challenges, the realization of the high-quality nanopores with the help of modern micro/nanofabrication technologies is a crucial one. In this paper, typical technologies applied in the field of solid-state nanopore-based DNA sequencing have been reviewed.

  11. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  12. Effect of dephasing on DNA sequencing via transverse electronic transport

    Energy Technology Data Exchange (ETDEWEB)

    Zwolak, Michael [Los Alamos National Laboratory; Krems, Matt [NON LANL; Pershin, Yuriy V [NON LANL; Di Ventra, Massimiliano [NON LANL

    2009-01-01

    We study theoretically the effects of dephasing on DNA sequencing in a nanopore via transverse electronic transport. To do this, we couple classical molecular dynamics simulations with transport calculations using scattering theory. Previous studies, which did not include dephasing, have shown that by measuring the transverse current of a particular base multiple times, one can get distributions of currents for each base that are distinguishable. We introduce a dephasing parameter into transport calculations to simulate the effects of the ions and other fluctuations. These effects lower the overall magnitude of the current, but have little effect on the current distributions themselves. The results of this work further implicate that distinguishing DNA bases via transverse electronic transport has potential as a sequencing tool.

  13. Sequence-Based Identification of Aerobic Actinomycetes

    Science.gov (United States)

    Patel, Jean Baldus; Wallace, Richard J.; Brown-Elliott, Barbara A.; Taylor, Tony; Imperatrice, Carol; Leonard, Deborah G. B.; Wilson, Rebecca W.; Mann, Linda; Jost, Kenneth C.; Nachamkin, Irving

    2004-01-01

    We investigated the utility of 500-bp 16S rRNA gene sequencing for identifying clinically significant species of aerobic actinomycetes. A total of 28 reference strains and 71 clinical isolates that included members of the genera Streptomyces, Gordonia, and Tsukamurella and 10 taxa of Nocardia were studied. Methods of nonsequencing analyses included growth and biochemical analysis, PCR-restriction enzyme analysis of the 439-bp Telenti fragment of the 65 hsp gene, susceptibility testing, and, for selected isolates, high-performance liquid chromatography. Many of the isolates were included in prior taxonomic studies. Sequencing of Nocardia species revealed that members of the group were generally most closely related to the American Type Culture Collection (ATCC) type strains. However, the sequences of Nocardia transvalensis, N. otitidiscaviarum, and N. nova isolates were highly variable; and it is likely that each of these species contains multiple species. We propose that these three species be designated complexes until they are more taxonomically defined. The sequences of several taxa did not match any recognized species. Among other aerobic actinomycetes, each group most closely resembled the associated reference strain, but with some divergence. The study demonstrates the ability of partial 16S rRNA gene sequencing to identify members of the aerobic actinomycetes, but the study also shows that a high degree of sequence divergence exists within many species and that many taxa within the Nocardia spp. are unnamed at present. A major unresolved issue is the type strain of N. asteroides, as the present one (ATCC 19247), chosen before the availability of molecular analysis, does not represent any of the common taxa associated with clinical nocardiosis. PMID:15184431

  14. Function-Based Algorithms for Biological Sequences

    Science.gov (United States)

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  15. An adaptive single pole autoreclosure based on zero sequence power

    Energy Technology Data Exchange (ETDEWEB)

    Elkalashy, Nagy I.; Darwish, Hatem A.; Taalab, Abdel-Maksoud I.; Izzularab, Mohmmad A. [Electrical Engineering Department, Faculty of Engineering, Menoufiya University, Shebin Elkom 32511 (Egypt)

    2007-04-15

    In this paper, a novel adaptive single pole autoreclosure is introduced. This reclosure is based on monitoring the fundamental component of the zero sequence instantaneous power to detect the extinction instant of the arc in its secondary period. Thus, adaptive closing instant can be achieved. The concept of reclosure is validated via typical examples of transmission line exposed to ground arcing fault. Effects of fault location and load flow on the accuracy of the technique are examined. Discriminatory zones of the secondary arc period in the zero sequence power domains are determined. A proposed threshold for the reclosing instant is introduced and examined. Validation of the proposed algorithm is verified via Digital Signal Processing (DSP) experimental test set-up. The test results corroborate the efficacy of proposed technique. (author)

  16. [Segmentation Method for Liver Organ Based on Image Sequence Context].

    Science.gov (United States)

    Zhang, Meiyun; Fang, Bin; Wang, Yi; Zhong, Nanchang

    2015-10-01

    In view of the problems of more artificial interventions and segmentation defects in existing two-dimensional segmentation methods and abnormal liver segmentation errors in three-dimensional segmentation methods, this paper presents a semi-automatic liver organ segmentation method based on the image sequence context. The method takes advantage of the existing similarity between the image sequence contexts of the prior knowledge of liver organs, and combines region growing and level set method to carry out semi-automatic segmentation of livers, along with the aid of a small amount of manual intervention to deal with liver mutation situations. The experiment results showed that the liver segmentation algorithm presented in this paper had a high precision, and a good segmentation effect on livers which have greater variability, and can meet clinical application demands quite well.

  17. Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

    Science.gov (United States)

    Chen, Zitian; Zhou, Wenxiong; Qiao, Shuo; Kang, Li; Duan, Haifeng; Xie, X Sunney; Huang, Yanyi

    2017-12-01

    Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

  18. Light-induced free-radical reactions of nucleic acid constituents. Effect of sequence and base--base interactions on the reactivity of purines and pyrimidines in ribonucleotides

    International Nuclear Information System (INIS)

    Livneh, E.; Elad, D.; Sperling, J.

    1978-01-01

    The reaction with 2-propanol of purines and pyrimidines, induced photochemically with light of lambda > 300 nm and di-tert-butyl peroxide as an initiator, was applied to a variety of adenosine-, guanosine-, and uridine-containing ribonucleotides in order to determine the rules which govern the reactivity of the heterocyclic bases of nucleotides. The reactivity of the purine moieties was found to depend on the conformation of the appropriate nucleotide (anti or syn) and on the site of binding of the phosphate group to the ribose moiety. Adenosine moietics (assuming an anti conformation) blocked at their 3'-hydroxyl reacted faster than those blocked at their 5'-hydroxyl. The reactivity of the guanosine moieties (tending to assume a syn conformation) was independent of the site of binding of the phosphate. The uridine moieties of the various nucleotides exhibited a wide range of reactivity. A correlation between the reactivity of the uridines and their involvement in stacking interactions with next- and second-neighboring purines could be made. Thus, the uridine moieties of U-U-U, G-U, U-G, A-U-A, and A-U-G were reactive, while those of A-U and A-U-U were unreactive. The relative reactivity of uridine moieties of nucleotides can, therefore, be used as a measure of the extent of pyrimidine-purine stacking and vice versa

  19. Sequence classification with side effect machines evolved via ring optimization.

    Science.gov (United States)

    McEachern, Andrew; Ashlock, Daniel; Schonfeld, Justin

    2013-07-01

    The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This study introduces a sequence-learning technology called side effect machines. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines. A comparison is done between side effect machines evolved in the ring structure and side effect machines evolved using a standard evolutionary algorithm based on tournament selection. At the core of the training of side effect machines is a nearest neighbor classifier. A parameter study was performed to investigate the impact of the division of training data into examples for nearest neighbor assessment and training cases. The parameter study demonstrates that parameter setting is important in the baseline runs but had little impact in the ring-optimization runs. The ring optimization technique was also found to exhibit improved and also more reliable training performance. Side effect machines are tested on two types of synthetic data, one based on GC-content and the other checking for the ability of side effect machines to recognize an embedded motif. Three types of biological data are used, a data set with different types of immune-system genes, a data set with normal and retro-virally derived human genomic sequence, and standard and nonstandard initiation regions from the cytochrome-oxidase subunit one in the mitochondrial genome. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  20. An assembly sequence planning method based on composite algorithm

    OpenAIRE

    Enfu LIU; Bo LIU; Xiaoyang LIU; Yi LI

    2016-01-01

    To solve the combination explosion problem and the blind searching problem in assembly sequence planning of complex products, an assembly sequence planning method based on composite algorithm is proposed. In the composite algorithm, a sufficient number of feasible assembly sequences are generated using formalization reasoning algorithm as the initial population of genetic algorithm. Then fuzzy knowledge of assembly is integrated into the planning process of genetic algorithm and ant algorithm...

  1. Prediction of potential drug targets based on simple sequence properties

    Directory of Open Access Journals (Sweden)

    Lai Luhua

    2007-09-01

    Full Text Available Abstract Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.

  2. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    Science.gov (United States)

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  3. Exploring Sequence Alignment Algorithms on FPGA-based Heterogeneous Architectures

    NARCIS (Netherlands)

    Chang, Xin; Escobar, Fernando A.; Valderrama, Carlos; Robert, Vincent; Ortuno, F.; Rojas, I.

    2014-01-01

    With the rapid development of DNA sequencer, the rate of data generation is rapidly outpacing the rate at which it can be computationally processed. Traditional sequence alignment based on PC cannot fulfill the increasing demand. Accelerating the algorithm using FPGA provides the better performance

  4. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    The phylogenetic relationships among flesh flies of the family Sarcophagidae has been based mainly on the morphology of male genitalia. However, the male genitalic character-based relationships are far from satisfactory. Therefore, in the present study mitochondrial DNA has been used as marker to unravel genetic ...

  5. Nanopore-Based Target Sequence Detection.

    Directory of Open Access Journals (Sweden)

    Trevor J Morin

    Full Text Available The promise of portable diagnostic devices relies on three basic requirements: comparable sensitivity to established platforms, inexpensive manufacturing and cost of operations, and the ability to survive rugged field conditions. Solid state nanopores can meet all these requirements, but to achieve high manufacturing yields at low costs, assays must be tolerant to fabrication imperfections and to nanopore enlargement during operation. This paper presents a model for molecular engineering techniques that meets these goals with the aim of detecting target sequences within DNA. In contrast to methods that require precise geometries, we demonstrate detection using a range of pore geometries. As a result, our assay model tolerates any pore-forming method and in-situ pore enlargement. Using peptide nucleic acid (PNA probes modified for conjugation with synthetic bulk-adding molecules, pores ranging 15-50 nm in diameter are shown to detect individual PNA-bound DNA. Detection of the CFTRΔF508 gene mutation, a codon deletion responsible for ∼66% of all cystic fibrosis chromosomes, is demonstrated with a 26-36 nm pore size range by using a size-enhanced PNA probe. A mathematical framework for assessing the statistical significance of detection is also presented.

  6. Swarm-based Sequencing Recommendations in E-learning

    NARCIS (Netherlands)

    Van den Berg, Bert; Tattersall, Colin; Janssen, José; Brouns, Francis; Kurvers, Hub; Koper, Rob

    2005-01-01

    Van den Berg, B., Tattersall, C., Janssen, J., Brouns, F., Kurvers, H., & Koper, R. (2006). Swarm-based Sequencing Recommendations in E-learning. International Journal of Computer Science & Applications, III(III), 1-11.

  7. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    2007 Population structure of the malaria vector Anopheles dar- lingi in Rondonia, Brazilian Amazon, based on mitochondrial. DNA. Mem. Inst. Oswaldo Cruz 102, 953–958. Avise J. C. 2004 Molecular markers, natural history, and evolution,. 2nd edition. Sinauer, Sunderland, USA. Cameron S. L., Lambkin C. L., Barker S. C. ...

  8. Task sequencing for sensor-based control

    OpenAIRE

    Mansard, Nicolas; Chaumette, François

    2007-01-01

    International audience; Classical sensor-based approaches tend to constrain all the degrees of freedom of a robot during the execution of a task. In this paper, a new solution is proposed. The key idea is to divide the global full-constraining task into several subtasks, which can be applied or inactivated to take into account potential constraints of the environment. Far from any constraint, the robot moves according to the full task. When it comes closer to a configuration to avoid, a highe...

  9. Effects of sequence on DNA wrapping around histones

    Science.gov (United States)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  10. An Optimal Seed Based Compression Algorithm for DNA Sequences

    Directory of Open Access Journals (Sweden)

    Pamela Vinitha Eric

    2016-01-01

    Full Text Available This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.

  11. An assembly sequence planning method based on composite algorithm

    Directory of Open Access Journals (Sweden)

    Enfu LIU

    2016-02-01

    Full Text Available To solve the combination explosion problem and the blind searching problem in assembly sequence planning of complex products, an assembly sequence planning method based on composite algorithm is proposed. In the composite algorithm, a sufficient number of feasible assembly sequences are generated using formalization reasoning algorithm as the initial population of genetic algorithm. Then fuzzy knowledge of assembly is integrated into the planning process of genetic algorithm and ant algorithm to get the accurate solution. At last, an example is conducted to verify the feasibility of composite algorithm.

  12. An Optimal Seed Based Compression Algorithm for DNA Sequences.

    Science.gov (United States)

    Eric, Pamela Vinitha; Gopalakrishnan, Gopakumar; Karunakaran, Muralikrishnan

    2016-01-01

    This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.

  13. An optical CDMA system based on chaotic sequences

    Science.gov (United States)

    Liu, Xiao-lei; En, De; Wang, Li-guo

    2014-03-01

    In this paper, a coherent asynchronous optical code division multiple access (OCDMA) system is proposed, whose encoder/decoder is an all-optical generator. This all-optical generator can generate analog and bipolar chaotic sequences satisfying the logistic maps. The formula of bit error rate (BER) is derived, and the relationship of BER and the number of simultaneous transmissions is analyzed. Due to the good property of correlation, this coherent OCDMA system based on these bipolar chaotic sequences can support a large number of simultaneous users, which shows that these chaotic sequences are suitable for asynchronous OCDMA system.

  14. Automation tools for control systems a network based sequencer

    International Nuclear Information System (INIS)

    Clout, P.; Geib, M.; Westervelt, R.

    1990-01-01

    This paper reports on development of a sequencer for control systems which works in conjunction with its realtime, distributed Vsystem database. Vsystem is a network-based data acquisition, monitoring and control system which has been applied successfully to many different types of projects. The network-based sequencer allows a user to simple define a thread of execution in any supported computer on the network. The scrip defining a sequence has a simple syntax designed for non-programmers, with facilities for selectively abbreviating the channel names for easy reference. The semantics of the script contains most of the familiar capabilities of conventional programming languages, including standard stream I/O and the ability to start other processes with parameters passed. The scrip is compiled to threaded code for execution efficiency. The implementation will be described in some detail and examples will be given of applications for which the sequencer has been used

  15. A base composition analysis of natural patterns for the preprocessing of metagenome sequences.

    Science.gov (United States)

    Bonham-Carter, Oliver; Ali, Hesham; Bastola, Dhundy

    2013-01-01

    On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

  16. Accuracy of structure-based sequence alignment of automatic methods

    Directory of Open Access Journals (Sweden)

    Lee Byungkook

    2007-09-01

    similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.

  17. Co-barcoded sequence reads from long DNA fragments: A cost-effective solution for Perfect Genome sequencing

    Directory of Open Access Journals (Sweden)

    Brock A Peters

    2015-01-01

    Full Text Available Next generation sequencing (NGS technologies, primarily based on massively parallel sequencing (MPS, have touched and radically changed almost all aspects of research worldwide. These technologies have allowed for the rapid analysis, to date, of the genomes of more than 2,000 different species. In humans, NGS has arguably had the largest impact. Over 100,000 genomes of individual humans (based on various estimates have been sequenced allowing for deep insights into what makes individuals and families unique and what causes disease in each of us. Despite all of this progress, the current state of the art in sequence technology is far from generating a perfect genome sequence and much remains to be understood in the biology of human and other organisms’ genomes. In the article that follows we outline, why the perfect genome in humans is important, what is lacking from current human whole genome sequences, and a potential strategy for achieving the perfect genome in a cost effective manner.

  18. Recursive organizer (ROR): an analytic framework for sequence-based association analysis.

    Science.gov (United States)

    Zhao, Lue Ping; Huang, Xin

    2013-07-01

    The advent of next-generation sequencing technologies affords the ability to sequence thousands of subjects cost-effectively, and is revolutionizing the landscape of genetic research. With the evolving genotyping/sequencing technologies, it is not unrealistic to expect that we will soon obtain a pair of diploidic fully phased genome sequences from each subject in the near future. Here, in light of this potential, we propose an analytic framework called, recursive organizer (ROR), which recursively groups sequence variants based upon sequence similarities and their empirical disease associations, into fewer and potentially more interpretable super sequence variants (SSV). As an illustration, we applied ROR to assess an association between HLA-DRB1 and type 1 diabetes (T1D), discovering SSVs of HLA-DRB1 with sequence data from the Wellcome Trust Case Control Consortium. Specifically, ROR reduces 36 observed unique HLA-DRB1 sequences into 8 SSVs that empirically associate with T1D, a fourfold reduction of sequence complexity. Using HLA-DRB1 data from Type 1 Diabetes Genetics Consortium as cases and data from Fred Hutchinson Cancer Research Center as controls, we are able to validate associations of these SSVs with T1D. Further, SSVs consist of nine nucleotides, and each associates with its corresponding amino acids. Detailed examination of these selected amino acids reveals their potential functional roles in protein structures and possible implication to the mechanism of T1D.

  19. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  20. Genomic Variance Estimation Based on Genotyping-by-Sequencing with Different Coverage in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Fé, Dario; Jensen, Just

    2014-01-01

    on optimizing methods and models utilizing F2 family phenotype records and NGS information from F2 family pools in perennial ryegrass. Genomic variance was estimated using genomic relationship matrices based on different coverage depths to verify effects of coverage depth. Example traits were seed yield, rust......Advancement in next generation sequencing (NGS) technologies has significantly decreased the cost of DNA sequencing enabling increased use of genotyping by sequencing (GBS) in several plant species. In contrast to array-based genotyping GBS also allows for easy estimation of allele frequencies...... at each SNP in family pools or polyploids. There are, however, several statistical challenges associated with this method, including low sequencing depth and missing values. Low sequencing depth results in inaccuracies in estimates of allele frequencies for each SNP. In this work we have focused...

  1. Generating Multiple Base-Resolution DNA Methylomes Using Reduced Representation Bisulfite Sequencing.

    Science.gov (United States)

    Chatterjee, Aniruddha; Rodger, Euan J; Stockwell, Peter A; Le Mée, Gwenn; Morison, Ian M

    2017-01-01

    Reduced representation bisulfite sequencing (RRBS) is an effective technique for profiling genome-wide DNA methylation patterns in eukaryotes. RRBS couples size selection, bisulfite conversion, and second-generation sequencing to enrich for CpG-dense regions of the genome. The progressive improvement of second-generation sequencing technologies and reduction in cost provided an opportunity to examine the DNA methylation patterns of multiple genomes. Here, we describe a protocol for sequencing multiple RRBS libraries in a single sequencing reaction to generate base-resolution methylomes. Furthermore, we provide a brief guideline for base-calling and data analysis of multiplexed RRBS libraries. These strategies will be useful to perform large-scale, genome-wide DNA methylation analysis.

  2. (Brassicaceae) based on nuclear ribosomal ITS DNA sequences

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 93; Issue 2. Phylogeny and biogeography of Alyssum (Brassicaceae) based on nuclear ribosomal ITS DNA sequences. Yan Li Yan Kong Zhe Zhang Yanqiang Yin Bin Liu Guanghui Lv Xiyong Wang. Research Article Volume 93 Issue 2 August 2014 pp 313-323 ...

  3. A Diagnostic HIV-1 Tropism System Based on Sequence Relatedness

    Science.gov (United States)

    Edwards, Suzanne; Stucki, Heinz; Bader, Joëlle; Vidal, Vincent; Kaiser, Rolf; Battegay, Manuel

    2014-01-01

    Key clinical studies for HIV coreceptor antagonists have used the phenotyping-based Trofile test. Meanwhile various simpler-to-do genotypic tests have become available that are compatible with standard laboratory equipment and Web-based interpretation tools. However, these systems typically analyze only the most prominent virus sequence in a specimen. We present a new diagnostic HIV tropism test not needing DNA sequencing. The system, XTrack, uses physical properties of DNA duplexes after hybridization of single-stranded HIV-1 env V3 loop probes to the clinical specimen. Resulting “heteroduplexes” possess unique properties driven by sequence relatedness to the reference and resulting in a discrete electrophoretic mobility. A detailed optimization process identified diagnostic probe candidates relating best to a large number of HIV-1 sequences with known tropism. From over 500 V3 sequences representing all main HIV-1 subtypes (Los Alamos database), we obtained a small set of probes to determine the tropism in clinical samples. We found a high concordance with the commercial TrofileES test (84.9%) and the Web-based tool Geno2Pheno (83.0%). Moreover, the new system reveals mixed virus populations, and it was successful on specimens with low virus loads or on provirus from leukocytes. A replicative phenotyping system was used for validation. Our data show that the XTrack test is favorably suitable for routine diagnostics. It detects and dissects mixed virus populations and viral minorities; samples with viral loads (VL) of <200 copies/ml are successfully analyzed. We further expect that the principles of the platform can be adapted also to other sequence-divergent pathogens, such as hepatitis B and C viruses. PMID:25502529

  4. Gene-based segregation method for identifying rare variants in family-based sequencing studies.

    Science.gov (United States)

    Qiao, Dandi; Lange, Christoph; Laird, Nan M; Won, Sungho; Hersh, Craig P; Morrow, Jarrett; Hobbs, Brian D; Lutz, Sharon M; Ruczinski, Ingo; Beaty, Terri H; Silverman, Edwin K; Cho, Michael H

    2017-05-01

    Whole-exome sequencing using family data has identified rare coding variants in Mendelian diseases or complex diseases with Mendelian subtypes, using filters based on variant novelty, functionality, and segregation with the phenotype within families. However, formal statistical approaches are limited. We propose a gene-based segregation test (GESE) that quantifies the uncertainty of the filtering approach. It is constructed using the probability of segregation events under the null hypothesis of Mendelian transmission. This test takes into account different degrees of relatedness in families, the number of functional rare variants in the gene, and their minor allele frequencies in the corresponding population. In addition, a weighted version of this test allows incorporating additional subject phenotypes to improve statistical power. We show via simulations that the GESE and weighted GESE tests maintain appropriate type I error rate, and have greater power than several commonly used region-based methods. We apply our method to whole-exome sequencing data from 49 extended pedigrees with severe, early-onset chronic obstructive pulmonary disease (COPD) in the Boston Early-Onset COPD study (BEOCOPD) and identify several promising candidate genes. Our proposed methods show great potential for identifying rare coding variants of large effect and high penetrance for family-based sequencing data. The proposed tests are implemented in an R package that is available on CRAN (https://cran.r-project.org/web/packages/GESE/). © 2017 WILEY PERIODICALS, INC.

  5. Nanopore-based fourth-generation DNA sequencing technology.

    Science.gov (United States)

    Feng, Yanxiao; Zhang, Yuechuan; Ying, Cuifeng; Wang, Deqiang; Du, Chunlei

    2015-02-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  6. Nanopore-based Fourth-generation DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Yanxiao Feng

    2015-02-01

    Full Text Available Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications.

  7. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences

    DEFF Research Database (Denmark)

    Tully, Philip J; Lindén, Henrik; Hennig, Matthias H

    2016-01-01

    and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times......Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed...... in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN) learning rule. We find that the formation of distributed memories, embodied by increased periods...

  8. Phylogenetic relationships of Malassezia species based on multilocus sequence analysis.

    Science.gov (United States)

    Castellá, Gemma; Coutinho, Selene Dall' Acqua; Cabañes, F Javier

    2014-01-01

    Members of the genus Malassezia are lipophilic basidiomycetous yeasts, which are part of the normal cutaneous microbiota of humans and other warm-blooded animals. Currently, this genus consists of 14 species that have been characterized by phenetic and molecular methods. Although several molecular methods have been used to identify and/or differentiate Malassezia species, the sequencing of the rRNA genes and the chitin synthase-2 gene (CHS2) are the most widely employed. There is little information about the β-tubulin gene in the genus Malassezia, a gene has been used for the analysis of complex species groups. The aim of the present study was to sequence a fragment of the β-tubulin gene of Malassezia species and analyze their phylogenetic relationship using a multilocus sequence approach based on two rRNA genes (ITS including 5.8S rRNA and D1/D2 region of 26S rRNA) together with two protein encoding genes (CHS2 and β-tubulin). The phylogenetic study of the partial β-tubulin gene sequences indicated that this molecular marker can be used to assess diversity and identify new species. The multilocus sequence analysis of the four loci provides robust support to delineate species at the terminal nodes and could help to estimate divergence times for the origin and diversification of Malassezia species.

  9. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.

    2015-04-18

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  11. Phylogeny and classification of Dickeya based on multilocus sequence analysis.

    Science.gov (United States)

    Marrero, Glorimar; Schneider, Kevin L; Jenkins, Daniel M; Alvarez, Anne M

    2013-09-01

    Bacterial heart rot of pineapple reported in Hawaii in 2003 and reoccurring in 2006 was caused by an undetermined species of Dickeya. Classification of the bacterial strains isolated from infected pineapple to one of the recognized Dickeya species and their phylogenetic relationships with Dickeya were determined by a multilocus sequence analysis (MLSA), based on the partial gene sequences of dnaA, dnaJ, dnaX, gyrB and recN. Individual and concatenated gene phylogenies revealed that the strains form a clade with reference Dickeya sp. isolated from pineapple in Malaysia and are closely related to D. zeae; however, previous DNA-DNA reassociation values suggest that these strains do not meet the genomic threshold for consideration in D. zeae, and require further taxonomic analysis. An analysis of the markers used in this MLSA determined that recN was the best overall marker for resolution of species within Dickeya. Differential intraspecies resolution was observed with the other markers, suggesting that marker selection is important for defining relationships within a clade. Phylogenies produced with gene sequences from the sequenced genomes of strains D. dadantii Ech586, D. dadantii Ech703 and D. zeae Ech1591 did not place the sequenced strains with members of other well-characterized members of their respective species. The average nucleotide identity (ANI) and tetranucleotide frequencies determined for the sequenced strains corroborated the results of the MLSA that D. dadantii Ech586 and D. dadantii Ech703 should be reclassified as Dickeya zeae Ech586 and Dickeya paradisiaca Ech703, respectively, whereas D. zeae Ech1591 should be reclassified as Dickeya chrysanthemi Ech1591.

  12. Effect of rapid mass accretion onto the main sequence stars

    International Nuclear Information System (INIS)

    Sarna, M.

    1984-01-01

    During the evolution of a close binary system there is a phase of rapid mass exchange between its components. Effect of rapid mass inflow on the internal structure of the main sequence stars is studied. 28 refs., 13 figs. (author)

  13. Analyzing Plasmodium falciparum erythrocyte membrane protein 1 gene expression by a next generation sequencing based method

    DEFF Research Database (Denmark)

    Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine

    2013-01-01

    at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...

  14. FPGA-based protein sequence alignment : A review

    Science.gov (United States)

    Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi

    2017-11-01

    Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.

  15. Primacy and recency effects in immediate free recall of sequences of spatial positions.

    Science.gov (United States)

    Bonanni, Rita; Pasqualetti, Patrizio; Caltagirone, Carlo; Carlesimo, Giovanni Augusto

    2007-10-01

    This study evaluated the serial position curve based on free recall of spatial position sequences. To evaluate the memory processes underlying spatial recall, some manipulations were introduced by varying the length of spatial sequences (Exp. 1) and modifying the presentation rate of individual positions (Exp. 2). A primacy effect emerged for all sequence lengths, while a recency effect was evident only in the longer sequences. Moreover, slowing the presentation rate increased the magnitude of the primacy effect and abolished the recency effect. The main novelty of the present results is represented by the finding that better recall of early items in a sequence of spatial positions does not depend on the task requirement of an ordered recall but it can also be observed in a free recall paradigm.

  16. Spike-Based Bayesian-Hebbian Learning of Temporal Sequences.

    Directory of Open Access Journals (Sweden)

    Philip J Tully

    2016-05-01

    Full Text Available Many cognitive and motor functions are enabled by the temporal representation and processing of stimuli, but it remains an open issue how neocortical microcircuits can reliably encode and replay such sequences of information. To better understand this, a modular attractor memory network is proposed in which meta-stable sequential attractor transitions are learned through changes to synaptic weights and intrinsic excitabilities via the spike-based Bayesian Confidence Propagation Neural Network (BCPNN learning rule. We find that the formation of distributed memories, embodied by increased periods of firing in pools of excitatory neurons, together with asymmetrical associations between these distinct network states, can be acquired through plasticity. The model's feasibility is demonstrated using simulations of adaptive exponential integrate-and-fire model neurons (AdEx. We show that the learning and speed of sequence replay depends on a confluence of biophysically relevant parameters including stimulus duration, level of background noise, ratio of synaptic currents, and strengths of short-term depression and adaptation. Moreover, sequence elements are shown to flexibly participate multiple times in the sequence, suggesting that spiking attractor networks of this type can support an efficient combinatorial code. The model provides a principled approach towards understanding how multiple interacting plasticity mechanisms can coordinate hetero-associative learning in unison.

  17. Branched Adaptive Testing with a Rasch-Model-Calibrated Test: Analysing Item Presentation's Sequence Effects Using the Rasch-Model-Based LLTM

    Science.gov (United States)

    Kubinger, Klaus D.; Reif, Manuel; Yanagida, Takuya

    2011-01-01

    Item position effects provoke serious problems within adaptive testing. This is because different testees are necessarily presented with the same item at different presentation positions, as a consequence of which comparing their ability parameter estimations in the case of such effects would not at all be fair. In this article, a specific…

  18. Sequence-based classification and identification of Fungi.

    Science.gov (United States)

    Hibbett, David; Abarenkov, Kessy; Kõljalg, Urmas; Öpik, Maarja; Chai, Benli; Cole, James; Wang, Qiong; Crous, Pedro; Robert, Vincent; Helgason, Thorunn; Herr, Joshua R; Kirk, Paul; Lueschow, Shiloh; O'Donnell, Kerry; Nilsson, R Henrik; Oono, Ryoko; Schoch, Conrad; Smyth, Christopher; Walker, Donald M; Porras-Alfaro, Andrea; Taylor, John W; Geiser, David M

    Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.

  19. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    Science.gov (United States)

    Mascher, Martin; Wu, Shuangye; Amand, Paul St; Stein, Nils; Poland, Jesse

    2013-01-01

    The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS), a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL) population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new sequencing

  20. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley.

    Directory of Open Access Journals (Sweden)

    Martin Mascher

    Full Text Available The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS, a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new

  1. Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing.

    Science.gov (United States)

    Cao, Chang-Chang; Li, Cheng; Huang, Zheng; Ma, Xin; Sun, Xiao

    2013-12-01

    Genome-wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large-scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost-effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb. © 2013 WILEY PERIODICALS, INC.

  2. Antigenic cartography of H1N1 influenza viruses using sequence-based antigenic distance calculation.

    Science.gov (United States)

    Anderson, Christopher S; McCall, Patrick R; Stern, Harry A; Yang, Hongmei; Topham, David J

    2018-02-12

    The ease at which influenza virus sequence data can be used to estimate antigenic relationships between strains and the existence of databases containing sequence data for hundreds of thousands influenza strains make sequence-based antigenic distance estimates an attractive approach to researchers. Antigenic mismatch between circulating strains and vaccine strains results in significantly decreased vaccine effectiveness. Furthermore, antigenic relatedness between the vaccine strain and the strains an individual was originally primed with can affect the cross-reactivity of the antibody response. Thus, understanding the antigenic relationships between influenza viruses that have circulated is important to both vaccinologists and immunologists. Here we develop a method of mapping antigenic relationships between influenza virus stains using a sequence-based antigenic distance approach (SBM). We used a modified version of the p-all-epitope sequence-based antigenic distance calculation, which determines the antigenic relatedness between strains using influenza hemagglutinin (HA) genetic coding sequence data and provide experimental validation of the p-all-epitope calculation. We calculated the antigenic distance between 4838 H1N1 viruses isolated from infected humans between 1918 and 2016. We demonstrate, for the first time, that sequence-based antigenic distances of H1N1 Influenza viruses can be accurately represented in 2-dimenstional antigenic cartography using classic multidimensional scaling. Additionally, the model correctly predicted decreases in cross-reactive antibody levels with 87% accuracy and was highly reproducible with even when small numbers of sequences were used. This work provides a highly accurate and precise bioinformatics tool that can be used to assess immune risk as well as design optimized vaccination strategies. SBM accurately estimated the antigenic relationship between strains using HA sequence data. Antigenic maps of H1N1 virus strains reveal

  3. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

    Science.gov (United States)

    2013-01-01

    Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected

  4. Establishment of screening technique for mutant cell and analysis of base sequence in the mutation

    International Nuclear Information System (INIS)

    Sofuni, Toshio; Nomi, Takehiko; Yamada, Masami; Masumura, Kenichi

    2000-01-01

    This research project aimed to establish an easy and quick detection method for radiation-induced mutation using molecular-biological techniques and an effective analyzing method for the molecular changes in base sequence. In this year, Spi mutants derived from γ-radiation exposed mouse were analyzed by PCR method and DNA sequence method. Male transgenic mice were exposed to γ-ray at 5,10, 50 Gy and the transgene was taken out from the genome DNA from the spleen in vivo packaging method. Spi mutant plaques were obtained by infecting the recovered phage to E. coli. Sequence analysis for the mutants was made using ALFred DNA sequencer and SequiTherm TM Long-Red Cycle sequencing kit. Sequence analysis was carried out for 41 of 50 independent Spi mutants obtained. The deletions were classified into 4 groups; Group 1 included 15 mutants that were characterized with a large deletion (43 bp-10 kb) with a short homologous sequence. Group 2 included 11 mutants of a large deletion having no homologous sequence at the connecting region. Group 3 included 11 mutants having a short deletion of less than 20 bp, which occurred in the non-repetitive sequence of gam gene and possibly caused by oxidative breakage of DNA or recombination of DNA fragment produced by the breakage. Group 4 included 4 mutants having deletions as short as 20 bp or less in the repetitive sequence of gam gene, resulting in an alteration of the reading frame. Thus, the synthesis of Gam protein was terminated by the appearance of TGA between code 13 and 14 of redB gene, leading to inactivation of gam gene and redBA gene. These results indicated that most of Spi mutants had a deletion in red/gam region and the deletions in more than half mutants occurred in homologous sequences as short as 8 bp. (M.N.)

  5. Developing a framework to assess the cost-effectiveness of COMPARE -A global platform for the exchange of sequence-based pathogen data

    DEFF Research Database (Denmark)

    Alleweldt, F.; Kara, Sami; Osinski, A.

    2017-01-01

    implementation of NGS also depends on its cost-effectiveness. COMPARE - short for 'Collaborative Management Platform for detection and Analyses of (Re-) emerging and foodborne outbreaks' - is a major project, funded by the European Union, to develop a global platform for sharing and analysing NGS data...

  6. Translating sanger-based routine DNA diagnostics into generic massive parallel ion semiconductor sequencing

    NARCIS (Netherlands)

    Diekstra, A.; Bosgoed, E.A.J.; Rikken, A.; Lier, B. van; Kamsteeg, E.J.; Tychon, M.W.J.; Derks, R.C.; Soest, R.A.; Mensenkamp, A.R.; Scheffer, H.; Neveling, K.; Nelen, M.R.

    2015-01-01

    BACKGROUND: Dideoxy-based chain termination sequencing developed by Sanger is the gold standard sequencing approach and allows clinical diagnostics of disorders with relatively low genetic heterogeneity. Recently, new next generation sequencing (NGS) technologies have found their way into diagnostic

  7. Entamoeba histolytica: observations on metabolism based on thegenome sequence

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain J.; Loftus, Brendan J.

    2005-07-01

    The sequencing of the genome of Entamoeba histolytica has allowed a reconstruction of its metabolic pathways, many of which are unusual for a eukaryote. Based on the genome sequence, it appears that amino acids may play a larger role than previously thought in energy metabolism, with roles in both ATP synthesis and NAD regeneration. Arginine decarboxylase may be involved in survival of E. histolytica during its passage through the stomach. The usual pyrimidine synthesis pathway is absent, but a partial pyrimidine degradation pathway could be part of a novel pyrimidine synthesis pathway. Ribonucleotide reductase was not found in the E. histolytica genome, but it was found in the close relatives Entamoeba invadens and Entamoeba moshkovskii, suggesting a recent loss from E. histolytica. The usual eukaryotic glucose transporters are not present, but members of a prokaryotic monosaccharide transporter family are present.

  8. Quality Control of the Traditional Patent Medicine Yimu Wan Based on SMRT Sequencing and DNA Barcoding

    Science.gov (United States)

    Jia, Jing; Xu, Zhichao; Xin, Tianyi; Shi, Linchun; Song, Jingyuan

    2017-01-01

    Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and psbA-trnH regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and psbA-trnH intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines. PMID:28620408

  9. [Comparison of effectiveness and safety between Twisted File technique and ProTaper Universal rotary full sequence based on micro-computed tomography].

    Science.gov (United States)

    Chen, Xiao-bo; Chen, Chen; Liang, Yu-hong

    2016-02-18

    To evaluate the efficacy and security of two type of rotary nickel titanium system (Twisted File and ProTaper Universal) for root canal preparation based on micro-computed tomography(micro-CT). Twenty extracted molars (including 62 canals) were divided into two experimental groups and were respectively instrumented using Twisted File rotary nickel titanium system (TF) and ProTaper Universal rotary nickel titanium system (PU) to #25/0.08 following recommended protocol. Time for root canal instrumentation (accumulation of time for every single file) was recorded. The 0-3 mm root surface from apex was observed under an optical stereomicroscope at 25 × magnification. The presence of crack line was noted. The root canals were scanned with micro-CT before and after root canal preparation. Three-dimensional shape images of canals were reconstructed, calculated and evaluated. The amount of canal central transportation of the two groups was calculated and compared. The shorter preparation time [(0.53 ± 0.14) min] was observed in TF group, while the preparation time of PU group was (2.06 ± 0.39) min (Pvs. (0.097 ± 0.084) mm, P<0.05]. No instrument separation was observed in both the groups. Cracks were not found in both the groups either based in micro-CT images or observation under an optical stereomicroscope at 25 × magnification. Compared with ProTaper Universal, Twisted File took less time in root canal preparation and exhibited better shaping ability, and less canal transportation.

  10. Motor imagery-based implicit sequence learning depends on the formation of stimulus-response associations.

    Science.gov (United States)

    Kraeutner, Sarah N; Gaughan, Theresa C; Eppler, Sarah N; Boe, Shaun G

    2017-07-01

    Implicit sequence learning (ISL) occurs without conscious awareness and is critical for skill acquisition. The extent to which ISL occurs is a function of exposure (i.e., total training time and/or sequence to noise ratio) to a repeated sequence, and thus the cognitive mechanism underlying ISL is the formation of stimulus-response associations. As the majority of ISL studies employ paradigms whereby individuals unknowingly physically practice a repeated sequence, the cognitive mechanism underlying ISL through motor imagery (MI), the mental rehearsal of movement, remains unknown. This study examined the cognitive mechanisms of MI-based ISL by probing the link between exposure and the resultant ISL. Seventy-two participants underwent MI-based practice of an ISL task following randomization to one of four conditions: 4 training blocks with a high (4-High) or low (4-Low) sequence to noise ratio, or 2 training blocks with a high (2-High) or low (2-Low) sequence to noise ratio. Reaction time differences (dRT) and effect sizes between repeated and random sequences assessed the extent of learning. All groups showed a degree of ISL, yet effect sizes indicated a greater degree of learning in groups with higher exposure (4-Low and 4-High). Findings indicate that the extent to which ISL occurs through MI is impacted by manipulations to total training time and the sequence to noise ratio. Overall, we show that the extent of ISL occurring through MI is a function of exposure, indicating that like physical practice, the cognitive mechanisms of MI-based ISL rely on the formation of stimulus response associations. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination.

    Science.gov (United States)

    Qiao, Yongliang

    2017-10-25

    Visual localization is widely used in autonomous navigation system and Advanced Driver Assistance Systems (ADAS). However, visual-based localization in seasonal changing situations is one of the most challenging topics in computer vision and the intelligent vehicle community. The difficulty of this task is related to the strong appearance changes that occur in scenes due to weather or season changes. In this paper, a place recognition based visual localization method is proposed, which realizes the localization by identifying previously visited places using the sequence matching method. It operates by matching query image sequences to an image database acquired previously (video acquired during traveling period). In this method, in order to improve matching accuracy, multi-feature is constructed by combining a global GIST descriptor and local binary feature CSLBP (Center-symmetric local binary patterns) to represent image sequence. Then, similarity measurement according to Chi-square distance is used for effective sequences matching. For experimental evaluation, the relationship between image sequence length and sequences matching performance is studied. To show its effectiveness, the proposed method is tested and evaluated in four seasons outdoor environments. The results have shown improved precision-recall performance against the state-of-the-art SeqSLAM algorithm.

  12. Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination

    Directory of Open Access Journals (Sweden)

    Yongliang Qiao

    2017-10-01

    Full Text Available Visual localization is widely used in autonomous navigation system and Advanced Driver Assistance Systems (ADAS. However, visual-based localization in seasonal changing situations is one of the most challenging topics in computer vision and the intelligent vehicle community. The difficulty of this task is related to the strong appearance changes that occur in scenes due to weather or season changes. In this paper, a place recognition based visual localization method is proposed, which realizes the localization by identifying previously visited places using the sequence matching method. It operates by matching query image sequences to an image database acquired previously (video acquired during traveling period. In this method, in order to improve matching accuracy, multi-feature is constructed by combining a global GIST descriptor and local binary feature CSLBP (Center-symmetric local binary patterns to represent image sequence. Then, similarity measurement according to Chi-square distance is used for effective sequences matching. For experimental evaluation, the relationship between image sequence length and sequences matching performance is studied. To show its effectiveness, the proposed method is tested and evaluated in four seasons outdoor environments. The results have shown improved precision–recall performance against the state-of-the-art SeqSLAM algorithm.

  13. Effects of loading sequence for notched specimens under high-low two-step fatigue loading

    Science.gov (United States)

    Crews, J. H., Jr.

    1971-01-01

    The effects of loading sequence on crack-initiation period were investigated for notched aluminum-alloy specimens under high-low two-step loading with special emphasis on local cyclic stresses and strains at the notch root. Local stress and strain were determined by a procedure based on an equation proposed by Neuber which relates elastoplastic stress and strain at a notch. Local stress and strain were also measured experimentally to verify the Neuber equation. The effects of initial high load on the crack-initiation periods were demonstrated with notched specimens and were simulated in unnotched specimens fatigue tested with local stress sequences. An analysis of the results indicated that sequence effects were not caused solely by local residual stresses, as is usually assumed; the existence of a damaging effect, resulting from the high local strain cycles, was demonstrated. The sequence effects observed with notched specimens were interpreted as the combined result of residual stresses and high local strain cycles.

  14. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Science.gov (United States)

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  15. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  16. The heterogeneous world of congruency sequence effects: An update.

    Directory of Open Access Journals (Sweden)

    Wout eDuthoo

    2014-09-01

    Full Text Available Congruency sequence effects (CSEs refer to the observation that congruency effects in conflict tasks are typically smaller following incongruent compared to following congruent trials. This measure has long been thought to provide a unique window into top-down attentional adjustments and their underlying brain mechanisms. According to the renowned conflict monitoring theory, CSEs reflect enhanced selective attention following conflict detection. Still, alternative accounts suggested that bottom-up associative learning suffices to explain the pattern of reaction times and error rates. A couple of years ago, a review by Egner (2007 pitted these two rivalry accounts against each other, concluding that both conflict adaptation and feature integration contribute to the CSE. Since then, a wealth of studies has further debated this issue, and two additional accounts have been proposed, offering intriguing alternative explanations. Contingency learning accounts put forward that predictive relationships between stimuli and responses drive the CSE, whereas the repetition expectancy hypothesis suggests that top-down, expectancy-driven control adjustments affect the CSE. In the present paper, we build further on the previous review (Egner, 2007 by summarizing and integrating recent behavioural and neurophysiological studies on the CSE. In doing so, we evaluate the relative contribution and theoretical value of the different attentional and memory-based accounts. Moreover, we review how all of these influences can be experimentally isolated, and discuss designs and procedures that can critically judge between them.

  17. A dispersion-balanced Discrete Fourier Transform of repetitive pulse sequences using temporal Talbot effect

    Science.gov (United States)

    Fernández-Pousa, Carlos R.

    2017-11-01

    We propose a processor based on the concatenation of two fractional temporal Talbot dispersive lines with balanced dispersion to perform the DFT of a repetitive electrical sequence, for its use as a controlled source of optical pulse sequences. The electrical sequence is used to impart the amplitude and phase of a coherent train of optical pulses by use of a modulator placed between the two Talbot lines. The proposal has been built on a representation of the action of fractional Talbot effect on repetitive pulse sequences and a comparison with related results and proposals. It is shown that the proposed system is reconfigurable within a few repetition periods, has the same processing rate as the input optical pulse train, and requires the same technical complexity in terms of dispersion and pulse width as the standard, passive pulse-repetition rate multipliers based on fractional Talbot effect.

  18. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  19. Persisting Viral Sequences Shape Microbial CRISPR-based Immunity

    Science.gov (United States)

    Weinberger, Ariel D.; Sun, Christine L.; Pluciński, Mateusz M.; Denef, Vincent J.; Thomas, Brian C.; Horvath, Philippe; Barrangou, Rodolphe; Gilmore, Michael S.; Getz, Wayne M.; Banfield, Jillian F.

    2012-01-01

    Well-studied innate immune systems exist throughout bacteria and archaea, but a more recently discovered genomic locus may offer prokaryotes surprising immunological adaptability. Mediated by a cassette-like genomic locus termed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), the microbial adaptive immune system differs from its eukaryotic immune analogues by incorporating new immunities unidirectionally. CRISPR thus stores genomically recoverable timelines of virus-host coevolution in natural organisms refractory to laboratory cultivation. Here we combined a population genetic mathematical model of CRISPR-virus coevolution with six years of metagenomic sequencing to link the recoverable genomic dynamics of CRISPR loci to the unknown population dynamics of virus and host in natural communities. Metagenomic reconstructions in an acid-mine drainage system document CRISPR loci conserving ancestral immune elements to the base-pair across thousands of microbial generations. This ‘trailer-end conservation’ occurs despite rapid viral mutation and despite rapid prokaryotic genomic deletion. The trailer-ends of many reconstructed CRISPR loci are also largely identical across a population. ‘Trailer-end clonality’ occurs despite predictions of host immunological diversity due to negative frequency dependent selection (kill the winner dynamics). Statistical clustering and model simulations explain this lack of diversity by capturing rapid selective sweeps by highly immune CRISPR lineages. Potentially explaining ‘trailer-end conservation,’ we record the first example of a viral bloom overwhelming a CRISPR system. The polyclonal viruses bloom even though they share sequences previously targeted by host CRISPR loci. Simulations show how increasing random genomic deletions in CRISPR loci purges immunological controls on long-lived viral sequences, allowing polyclonal viruses to bloom and depressing host fitness. Our results thus link documented

  20. Comparative effectiveness of inter-simple sequence repeat and ...

    African Journals Online (AJOL)

    A study to compare the effectiveness of inter-simple sequence repeats (ISSR) and randomly amplified polymorphic DNA (RAPD) profiling was carried out with a total of 65 DNA samples using 12 species of Indian Garcinia. ISSR and RAPD profiling were performed with 19 and 12 primers, respectively. ISSR markers ...

  1. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  2. New algorithm for iris recognition based on video sequences

    Science.gov (United States)

    Bourennane, Salah; Fossati, Caroline; Ketchantang, William

    2010-07-01

    Among existing biometrics, iris recognition systems are among the most accurate personal biometric identification systems. However, the acquisition of a workable iris image requires strict cooperation of the user; otherwise, the image will be rejected by a verification module because of its poor quality, inducing a high false reject rate (FRR). The FRR may also increase when iris localization fails or when the pupil is too dilated. To improve the existing methods, we propose to use video sequences acquired in real time by a camera. In order to keep the same computational load to identify the iris, we propose a new method to estimate the iris characteristics. First, we propose a new iris texture characterization based on Fourier-Mellin transform, which is less sensitive to pupil dilatations than previous methods. Then, we develop a new iris localization algorithm that is robust to variations of quality (partial occlusions due to eyelids and eyelashes, light reflects, etc.), and finally, we introduce a fast and new criterion of suitable image selection from an iris video sequence for an accurate recognition. The accuracy of each step of the algorithm in the whole proposed recognition process is tested and evaluated using our own iris video database and several public image databases, such as CASIA, UBIRIS, and BATH.

  3. Development of Sequence-Based Microsatellite Marker for Phalaenopsis Orchid

    Directory of Open Access Journals (Sweden)

    FATIMAH

    2011-06-01

    Full Text Available Phalaenopsis is one of the most interesting genera of orchids due to the members are often used as parents to produce hybrids. The establishment and development of highly reliable and discriminatory methods for identifying species and cultivars has become increasingly more important to plant breeders and members of the nursery industry. The aim of this research was to develop sequence-based microsatellite (eSSR markers for the Phalaenopsis orchid designed from the sequence of GenBank NCBI. Seventeen primers were designed and thirteen primers pairs could amplify the DNA giving the expected PCR product with polymorphism. A total of 51 alleles, with an average of 3 alleles per locus and polymorphism information content (PIC values at 0.674, were detected at the 16 SSR loci. Therefore, these markers could be used for identification of the Phalaenopsis orchid used in this study. Genetic similarity and principle coordinate analysis identified five major groups of Phalaenopsis sp. the first group consisted of P. amabilis, P. fuscata, P. javanica, and P. zebrine. The second group consisted of P. amabilis, P. amboinensis, P. bellina, P. floresens, and P. mannii. The third group consisted of P. bellina, P. cornucervi, P. cornucervi, P. violaceae sumatra, P. modesta. The forth group consisted of P. cornucervi and P. lueddemanniana, and the fifth group was P. amboinensis.

  4. Heart rate measurement based on face video sequence

    Science.gov (United States)

    Xu, Fang; Zhou, Qin-Wu; Wu, Peng; Chen, Xing; Yang, Xiaofeng; Yan, Hong-jian

    2015-03-01

    This paper proposes a new non-contact heart rate measurement method based on photoplethysmography (PPG) theory. With this method we can measure heart rate remotely with a camera and ambient light. We collected video sequences of subjects, and detected remote PPG signals through video sequences. Remote PPG signals were analyzed with two methods, Blind Source Separation Technology (BSST) and Cross Spectral Power Technology (CSPT). BSST is a commonly used method, and CSPT is used for the first time in the study of remote PPG signals in this paper. Both of the methods can acquire heart rate, but compared with BSST, CSPT has clearer physical meaning, and the computational complexity of CSPT is lower than that of BSST. Our work shows that heart rates detected by CSPT method have good consistency with the heart rates measured by a finger clip oximeter. With good accuracy and low computational complexity, the CSPT method has a good prospect for the application in the field of home medical devices and mobile health devices.

  5. Evolutionary insights from suffix array-based genome sequence ...

    Indian Academy of Sciences (India)

    2007-08-06

    Aug 6, 2007 ... Keywords. Biological language modelling toolkit (BLMT); genome sequence analysis; n-grams; pattern matching; suffix arrays; suffix trees; short peptide sequences genetic code bias ...

  6. HLA class I sequence-based typing using DNA recovered from frozen plasma.

    Science.gov (United States)

    Cotton, Laura A; Abdur Rahman, Manal; Ng, Carmond; Le, Anh Q; Milloy, M-J; Mo, Theresa; Brumme, Zabrina L

    2012-08-31

    We describe a rapid, reliable and cost-effective method for intermediate-to-high-resolution sequence-based HLA class I typing using frozen plasma as a source of genomic DNA. The plasma samples investigated had a median age of 8.5 years. Total nucleic acids were isolated from matched frozen PBMC (~2.5 million) and plasma (500 μl) samples from a panel of 25 individuals using commercial silica-based kits. Extractions yielded median [IQR] nucleic acid concentrations of 85.7 [47.0-130.0]ng/μl and 2.2 [1.7-2.6]ng/μl from PBMC and plasma, respectively. Following extraction, ~1000 base pair regions spanning exons 2 and 3 of HLA-A, -B and -C were amplified independently via nested PCR using universal, locus-specific primers and sequenced directly. Chromatogram analysis was performed using commercial DNA sequence analysis software and allele interpretation was performed using a free web-based tool. HLA-A, -B and -C amplification rates were 100% and chromatograms were of uniformly high quality with clearly distinguishable mixed bases regardless of DNA source. Concordance between PBMC and plasma-derived HLA types was 100% at the allele and protein levels. At the nucleotide level, a single partially discordant base (resulting from a failure to call both peaks in a mixed base) was observed out of >46,975 bases sequenced (>99.9% concordance). This protocol has previously been used to perform HLA class I typing from a variety of genomic DNA sources including PBMC, whole blood, granulocyte pellets and serum, from specimens up to 30 years old. This method provides comparable specificity to conventional sequence-based approaches and could be applied in situations where cell samples are unavailable or DNA quantities are limiting. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography.

    Science.gov (United States)

    Hirase, Shotaro; Takeshima, Hirohiko; Nishida, Mutsumi; Iwasaki, Wataru

    2016-04-28

    Reliably rooted phylogenetic trees play irreplaceable roles in clarifying diversification in the patterns of species and populations. However, such trees are often unavailable in phylogeographic studies, particularly when the focus is on rapidly expanded populations that exhibit star-like trees. A fundamental bottleneck is known as the random rooting effect, where a distant outgroup tends to root an unrooted tree "randomly." We investigated whether parallel mitochondrial genome (mitogenome) sequencing alleviates this effect in phylogeography using a case study on the Sea of Japan lineage of the intertidal goby Chaenogobius annularis Eighty-three C. annularis individuals were collected and their mitogenomes were determined by high-throughput and low-cost parallel sequencing. Phylogenetic analysis of these mitogenome sequences was conducted to root the Sea of Japan lineage, which has a star-like phylogeny and had not been reliably rooted. The topologies of the bootstrap trees were investigated to determine whether the use of mitogenomes alleviated the random rooting effect. The mitogenome data successfully rooted the Sea of Japan lineage by alleviating the effect, which hindered phylogenetic analysis that used specific gene sequences. The reliable rooting of the lineage led to the discovery of a novel, northern lineage that expanded during an interglacial period with high bootstrap support. Furthermore, the finding of this lineage suggested the existence of additional glacial refugia and provided a new recent calibration point that revised the divergence time estimation between the Sea of Japan and Pacific Ocean lineages. This study illustrates the effectiveness of parallel mitogenome sequencing for solving the random rooting problem in phylogeographic studies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  8. Masking as an effective quality control method for next-generation sequencing data analysis.

    Science.gov (United States)

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  9. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  10. Evolutionary insights from suffix array-based genome sequence ...

    Indian Academy of Sciences (India)

    2007-08-06

    Aug 6, 2007 ... Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational ...

  11. Illumina-based de novotranscriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    ZHONGXIAN XU

    2017-12-18

    Dec 18, 2017 ... Next-generation sequencing technique is an efficient method for generating an enormous amount of sequence data that can represent a large number of genes and their expression levels. In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland.

  12. Autonomously Generating Operations Sequences for a Mars Rover Using Artificial Intelligence-Based Planning

    Science.gov (United States)

    Sherwood, R.; Mutz, D.; Estlin, T.; Chien, S.; Backes, P.; Norris, J.; Tran, D.; Cooper, B.; Rabideau, G.; Mishkin, A.; Maxwell, S.

    2001-07-01

    This article discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from high-level science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This artificial intelligence (AI)-based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules. An automated planning and scheduling system encodes rover design knowledge and uses search and reasoning techniques to automatically generate low-level command sequences while respecting rover operability constraints, science and engineering preferences, environmental predictions, and also adhering to hard temporal constraints. This prototype planning system has been field-tested using the Rocky 7 rover at JPL and will be field-tested on more complex rovers to prove its effectiveness before transferring the technology to flight operations for an upcoming NASA mission. Enabling goal-driven commanding of planetary rovers greatly reduces the requirements for highly skilled rover engineering personnel. This in turn greatly reduces mission operations costs. In addition, goal-driven commanding permits a faster response to changes in rover state (e.g., faults) or science discoveries by removing the time-consuming manual sequence validation process, allowing rapid "what-if" analyses, and thus reducing overall cycle times.

  13. An FTIR investigation of flanking sequence effects on the structure and flexibility of DNA binding sites.

    Science.gov (United States)

    Kahn, Talia R; Fong, Kimberly K; Jordan, Brian; Lek, Janista C; Levitan, Rachel; Mitchell, Patrick S; Wood, Corrina; Hatcher, Mary E

    2009-02-17

    Fourier transform infrared (FTIR) spectroscopy and a library of FTIR marker bands have been used to examine the structure and relative flexibilities conferred by different flanking sequences on the EcoRI binding site. This approach allowed us to examine unique peaks and subtle changes in the spectra of d(AAAGAATTCTTT)(2), d(TTCGAATTCGAA)(2), and d(CGCGAATTCGCG)(2) and thereby identify local changes in base pairing, base stacking, backbone conformation, glycosidic bond rotation, and sugar puckering in the studied sequences. The changes in flanking sequences induce differences in the sugar puckers, glycosidic bond rotation, and backbone conformations. Varying levels of local flexibility are observed within the sequences in agreement with previous biological activity assays. The results also provide supporting evidence for the presence of a splay in the G(4)-C(9) base pair of the EcoRI binding site and a potential pocket of flexibility at the G(4) cleavage site that have been proposed in the literature. In sum, we have demonstrated that FTIR is a powerful methodology for studying the effect of flanking sequences on DNA structure and flexibility, for it can provide information about the local structure of the nucleic acid and the overall relative flexibilities conferred by different flanking sequences.

  14. Generalized min-max bound-based MRI pulse sequence design framework for wide-range T1 relaxometry: A case study on the tissue specific imaging sequence.

    Directory of Open Access Journals (Sweden)

    Yang Liu

    Full Text Available This paper proposes a new design strategy for optimizing MRI pulse sequences for T1 relaxometry. The design strategy optimizes the pulse sequence parameters to minimize the maximum variance of unbiased T1 estimates over a range of T1 values using the Cramér-Rao bound. In contrast to prior sequences optimized for a single nominal T1 value, the optimized sequence using our bound-based strategy achieves improved precision and accuracy for a broad range of T1 estimates within a clinically feasible scan time. The optimization combines the downhill simplex method with a simulated annealing process. To show the effectiveness of the proposed strategy, we optimize the tissue specific imaging (TSI sequence. Preliminary Monte Carlo simulations demonstrate that the optimized TSI sequence yields improved precision and accuracy over the popular driven-equilibrium single-pulse observation of T1 (DESPOT1 approach for normal brain tissues (estimated T1 700-2000 ms at 3.0T. The relative mean estimation error (MSE for T1 estimation is less than 1.7% using the optimized TSI sequence, as opposed to less than 7.0% using DESPOT1 for normal brain tissues. The optimized TSI sequence achieves good stability by keeping the MSE under 7.0% over larger T1 values corresponding to different lesion tissues and the cerebrospinal fluid (up to 5000 ms. The T1 estimation accuracy using the new pulse sequence also shows improvement, which is more pronounced in low SNR scenarios.

  15. CLUSS: Clustering of protein sequences based on a new similarity measure

    Directory of Open Access Journals (Sweden)

    Brzezinski Ryszard

    2007-08-01

    Full Text Available Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins

  16. Prediction of S-glutathionylation sites based on protein sequences.

    Directory of Open Access Journals (Sweden)

    Chenglei Sun

    Full Text Available S-glutathionylation, the reversible formation of mixed disulfides between glutathione(GSH and cysteine residues in proteins, is a specific form of post-translational modification that plays important roles in various biological processes, including signal transduction, redox homeostasis, and metabolism inside cells. Experimentally identifying S-glutathionylation sites is labor-intensive and time consuming, whereas bioinformatics methods provide an alternative way to this problem by predicting S-glutathionylation sites in silico. The bioinformatics approaches give not only candidate sites for further experimental verification but also bio-chemical insights into the mechanism of S-glutathionylation. In this paper, we firstly collect experimentally determined S-glutathionylated proteins and their corresponding modification sites from the literature, and then propose a new method for predicting S-glutathionylation sites by employing machine learning methods based on protein sequence data. Promising results are obtained by our method with an AUC (area under ROC curve score of 0.879 in 5-fold cross-validation, which demonstrates the predictive power of our proposed method. The datasets used in this work are available at http://csb.shu.edu.cn/SGDB.

  17. The Effect of Stock Return Sequences on Trading Volumes

    Directory of Open Access Journals (Sweden)

    Andrey Kudryavtsev

    2017-10-01

    Full Text Available The present study explores the effect of the gambler’s fallacy on stock trading volumes. I hypothesize that if a stock’s price rises (falls during a number of consecutive trading days, then the gambler’s fallacy may cause at least some of the investors to expect that the stock’s price “has” to subsequently fall (rise, and thus, to increase their willingness to sell (buy the stock, resulting in a stronger degree of disagreement between the investors and a higher-than-usual stock trading volume on the first day when the stock’s price indeed falls (rises. Employing a large sample of daily price and trading volume data, I document that following relatively long sequences of the same-sign stock returns, on the days when the sign is reversed, the trading activity in the respective stocks is abnormally high. Moreover, average abnormal trading volumes gradually and significantly increase with the length of the preceding return sequence. The effect is slightly more pronounced following the sequences of negative stock returns, and remains significant after controlling for other potentially influential factors, including contemporaneous and lagged actual and absolute stock returns, historical stock returns and volatilities, and company-specific events, such as earnings announcements and dividend payments.

  18. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  19. Experimental and theoretical studies of sequence effects on the fluctuation and melting of short DNA molecules

    Energy Technology Data Exchange (ETDEWEB)

    Peyrard, M; Cuesta-Lopez, S [Universite de Lyon, Ecole Normale Superieure de Lyon, Laboratoire de Physique, CNRS UMR 5672, 46 allee d' Italie, F-69364 Lyon Cedex 07 (France); Angelov, D [Universite de Lyon, Ecole Normale Superieure de Lyon, Laboratoire de Biologie Moleculaire de la Cellule, CNRS UMR 5239, 46 allee d' Italie, F-69364 Lyon Cedex 07 (France)], E-mail: Michel.Peyrard@ens-lyon.fr

    2009-01-21

    Understanding the melting of short DNA sequences probes DNA at the scale of the genetic code and raises questions which are very different from those posed by very long sequences, which have been extensively studied. We investigate this problem by combining experiments and theory. A new experimental method allows us to make a mapping of the opening of the guanines along the sequence as a function of temperature. The results indicate that non-local effects may be important in DNA because an AT-rich region is able to influence the opening of a base pair which is about 10 base pairs away. An earlier mesoscopic model of DNA is modified to correctly describe the timescales associated with the opening of individual base pairs well below melting, and to properly take into account the sequence. Using this model to analyze some characteristic sequences for which detailed experimental data on the melting is available (Montrichok et al 2003 Europhys. Lett. 62 452), we show that we have to introduce non-local effects of AT-rich regions to get acceptable results. This brings a second indication that the influence of these highly fluctuating regions of DNA on their neighborhood can extend to some distance.

  20. Congruency sequence effect without feature integration and contingency learning.

    Science.gov (United States)

    Kim, Sanga; Cho, Yang Seok

    2014-06-01

    The magnitude of congruency effects, such as the flanker-compatibility effects, has been found to vary as a function of the congruency of the previous trial. Some studies have suggested that this congruency sequence effect is attributable to stimulus and/or response priming, and/or contingency learning, whereas other studies have suggested that the control process triggered by conflict modulates the congruency effect. The present study examined whether sequential modulation can occur without stimulus and response repetitions and contingency learning. Participants were asked to perform two color flanker-compatibility tasks alternately in a trial-by-trial manner, with four fingers of one hand in Experiment 1 and with the index and middle fingers of two hands in Experiment 2, to avoid stimulus and response repetitions and contingency learning. A significant congruency sequence effect was obtained between the congruencies of the two tasks in Experiment 1 but not in Experiment 2. These results provide evidence for the idea that the sequential modulation is, at least in part, an outcome of the top-down control process triggered by conflict, which is specific to response mode. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. PCR-based assays versus direct sequencing for evaluating the effect of KRAS status on anti-EGFR treatment response in colorectal cancer patients: a systematic review and meta-analysis.

    Directory of Open Access Journals (Sweden)

    Lianfeng Shan

    Full Text Available The survival rate of colorectal cancer (CRC patients carrying wild-type KRAS is significantly increased by combining anti-EGFR monoclonal antibody (mAb with standard chemotherapy. However, conflicting data exist in both the wild-type KRAS and mutant KRAS groups, which strongly challenge CRC anti-EGFR treatment. Here we conducted a meta-analysis in an effort to provide more reliable information regarding anti-EGFR treatment in CRC patients.We searched full reports of randomized clinical trials using Medline, the American Society of Clinical Oncology (ASCO, and the European Society for Medical Oncology (ESMO. Two investigators independently screened the published literature according to our inclusive and exclusive criteria and the relative data were extracted. We used Review Manager 5.2 software to analyze the data.The addition of anti-EGFR mAb to standard chemotherapy significantly improved both progression-free survival (PFS and median overall survival (mOS in the wild-type KRAS group; hazard ratios (HRs for PFS and mOS were 0.70 [95% confidence interval (CI, 0.58-0.84] and 0.83 [95% CI, 0.75-0.91], respectively. In sub-analyses of the wild-type KRAS group, when PCR-based assays are employed, PFS and mOS notably increase: the HRs were 0.74 [95% CI, 0.62-0.88] and 0.87 [95% CI, 0.78-0.96], respectively. In sub-analyses of the mutant KRAS group, neither PCR-based assays nor direct sequencing enhance PFS or mOS.Our data suggest that PCR-based assays with high sensitivity and specificity allow accurate identification of patients with wild-type KRAS and thus increase PFS and mOS. Furthermore, such assays liberate patients with mutant KRAS from unnecessary drug side effects, and provide them an opportunity to receive appropriate treatment. Thus, establishing a precise standard reference test will substantially optimize CRC-targeted therapies.

  2. Cultural sequence of Bet Dwarka Island based on thermoluminescence dating

    Digital Repository Service at National Institute of Oceanography (India)

    Vora, K.H.; Gaur, A.S.; Price, D.; Sundaresh

    , are apparently considerably more recent (2000 years BP), which may suggest the continuation of protohistoric habitation up to historical period at the same site. These TL ages assist in establishing a cultural sequence for Bet Dwarka Island....

  3. PHARMACOGENETIC TESTING OPPORTUNITIES IN CARDIOLOGY BASED ON EXOME SEQUENCING

    Directory of Open Access Journals (Sweden)

    N. V. Shcherbakova

    2014-01-01

    Full Text Available Aim. To study what cardiac drugs currently have any comments on biomarkers and what information can be obtained by pharmacogenetic testing using data exome sequencing in patients with cardiac diseases.Material and methods. Exome sequencing in random participant of the ATEROGEN IVANOVO study and bioinformatics analysis of the data were performed. Point mutations were annotated using ANNOVAR program, as well as comparison with a number of specialized databases was done on the basis of user protocols.Results. 11 cardiac drugs and 7 genes which variants can influence cardiac drug metabolism were analyzed. According to exome sequencing of the participant we did not reveal allelic variants that require dose regime correction and careful efficacy control.Conclusion. The exome sequencing application is the next step to a wide range of personalized therapy. Future opportunities for improvement of the risk-benefit ratio in each patient are the main purpose of the collection and analysis of pharmacogenetic data.

  4. Hybrid detection of target sequence DNA based on phosphorescence resonance energy transfer.

    Science.gov (United States)

    Miao, Yanming; Lv, Jinzhi; Yan, Guiqin

    2017-08-15

    The severe background fluorescence and scattering light of real biological samples or environmental samples largely reduce the sensitivity and accuracy of fluorescence resonance energy transfer sensors based on fluorescent quantum dots (QDs). To solve this problem, we designed a novel target sequence DNA biosensor based on phosphorescent resonance energy transfer (PRET). This sensor relied on Mn-doped ZnS (Mn-ZnS) room-temperature phosphorescence (RTP) QDs/poly-(diallyldimethylammonium chloride) (PDADMAC) nanocomposite (QDs + ) as the energy donor and the single-strand DNA-ROX as the energy receptor. Thereby, an RTP biosensor was built and used to quantitatively detect target sequence DNA. This biosensor had a detection limit of 0.16nM and a linear range of 0.5-20nM for target sequence DNA. The dependence on RTP of QDs effectively avoided the interference from background fluorescence and scattering light in biological samples. Moreover, this sensor did not need sample pretreatment. Thus, this sensor compared with FRET is more feasible for quantitative detection of target sequence DNA in biological samples. Interestingly, the QDs + nanocomposite prolonged the phosphorescence lifetime of Mn-ZnS QDs by 2.6 times to 4.94ms, which was 5-6 magnitude-order larger than that of fluorescent QDs. Thus, this sensor largely improves the optical properties of QDs and permits chemical reactions at a long enough time scale. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  6. Expectation violations in sensorimotor sequences: shifting from LTM-based attentional selection to visual search.

    Science.gov (United States)

    Foerster, Rebecca M; Schneider, Werner X

    2015-03-01

    Long-term memory (LTM) delivers important control signals for attentional selection. LTM expectations have an important role in guiding the task-driven sequence of covert attention and gaze shifts, especially in well-practiced multistep sensorimotor actions. What happens when LTM expectations are disconfirmed? Does a sensory-based visual-search mode of attentional selection replace the LTM-based mode? What happens when prior LTM expectations become valid again? We investigated these questions in a computerized version of the number-connection test. Participants clicked on spatially distributed numbered shapes in ascending order while gaze was recorded. Sixty trials were performed with a constant spatial arrangement. In 20 consecutive trials, either numbers, shapes, both, or no features switched position. In 20 reversion trials, participants worked on the original arrangement. Only the sequence-affecting number switches elicited slower clicking, visual search-like scanning, and lower eye-hand synchrony. The effects were neither limited to the exchanged numbers nor to the corresponding actions. Thus, expectation violations in a well-learned sensorimotor sequence cause a regression from LTM-based attentional selection to visual search beyond deviant-related actions and locations. Effects lasted for several trials and reappeared during reversion. © 2015 New York Academy of Sciences.

  7. Comparison of hybridization-based and sequencing-based gene expression technologies on biological replicates.

    Science.gov (United States)

    Liu, Fang; Jenssen, Tor-Kristian; Trimarchi, Jeff; Punzo, Claudio; Cepko, Connie L; Ohno-Machado, Lucila; Hovig, Eivind; Kuo, Winston Patrick

    2007-06-07

    High-throughput systems for gene expression profiling have been developed and have matured rapidly through the past decade. Broadly, these can be divided into two categories: hybridization-based and sequencing-based approaches. With data from different technologies being accumulated, concerns and challenges are raised about the level of agreement across technologies. As part of an ongoing large-scale cross-platform data comparison framework, we report here a comparison based on identical samples between one-dye DNA microarray platforms and MPSS (Massively Parallel Signature Sequencing). The DNA microarray platforms generally provided highly correlated data, while moderate correlations between microarrays and MPSS were obtained. Disagreements between the two types of technologies can be attributed to limitations inherent to both technologies. The variation found between pooled biological replicates underlines the importance of exercising caution in identification of differential expression, especially for the purposes of biomarker discovery. Based on different principles, hybridization-based and sequencing-based technologies should be considered complementary to each other, rather than competitive alternatives for measuring gene expression, and currently, both are important tools for transcriptome profiling.

  8. Comparison of hybridization-based and sequencing-based gene expression technologies on biological replicates

    Directory of Open Access Journals (Sweden)

    Cepko Connie L

    2007-06-01

    Full Text Available Abstract Background High-throughput systems for gene expression profiling have been developed and have matured rapidly through the past decade. Broadly, these can be divided into two categories: hybridization-based and sequencing-based approaches. With data from different technologies being accumulated, concerns and challenges are raised about the level of agreement across technologies. As part of an ongoing large-scale cross-platform data comparison framework, we report here a comparison based on identical samples between one-dye DNA microarray platforms and MPSS (Massively Parallel Signature Sequencing. Results The DNA microarray platforms generally provided highly correlated data, while moderate correlations between microarrays and MPSS were obtained. Disagreements between the two types of technologies can be attributed to limitations inherent to both technologies. The variation found between pooled biological replicates underlines the importance of exercising caution in identification of differential expression, especially for the purposes of biomarker discovery. Conclusion Based on different principles, hybridization-based and sequencing-based technologies should be considered complementary to each other, rather than competitive alternatives for measuring gene expression, and currently, both are important tools for transcriptome profiling.

  9. Multifunctional hybrid networks based on self assembling peptide sequences

    Science.gov (United States)

    Sathaye, Sameer

    The overall aim of this dissertation is to achieve a comprehensive correlation between the molecular level changes in primary amino acid sequences of amphiphilic beta-hairpin peptides and their consequent solution-assembly properties and bulk network hydrogel behavior. This has been accomplished using two broad approaches. In the first approach, amino acid substitutions were made to peptide sequence MAX1 such that the hydrophobic surfaces of the folded beta-hairpins from the peptides demonstrate shape specificity in hydrophobic interactions with other beta-hairpins during the assembly process, thereby causing changes to the peptide nanostructure and bulk rheological properties of hydrogels formed from the peptides. Steric lock and key complementary hydrophobic interactions were designed to occur between two beta-hairpin molecules of a single molecule, LNK1 during beta-sheet fibrillar assembly of LNK1. Experimental results from circular dichroism, transmission electron microscopy and oscillatory rheology collectively indicate that the molecular design of the LNK1 peptide can be assigned the cause of the drastically different behavior of the networks relative to MAX1. The results indicate elimination or significant reduction of fibrillar branching due to steric complementarity in LNK1 that does not exist in MAX1, thus supporting the original hypothesis. As an extension of the designed steric lock and key complementarity between two beta-hairpin molecules of the same peptide molecule. LNK1, three new pairs of peptide molecules LP1-KP1, LP2-KP2 and LP3-KP3 that resemble complementary 'wedge' and 'trough' shapes when folded into beta-hairpins were designed and studied. All six peptides individually and when blended with their corresponding shape complement formed fibrillar nanostructures with non-uniform thickness values. Loose packing in the assembled structures was observed in all the new peptides as compared to the uniform tight packing in MAX1 by SANS analysis. This

  10. Phytophthora-ID.org: A sequence-based Phytophthora identification tool

    Science.gov (United States)

    N.J. Grünwald; F.N. Martin; M.M. Larsen; C.M. Sullivan; C.M. Press; M.D. Coffey; E.M. Hansen; J.L. Parke

    2010-01-01

    Contemporary species identification relies strongly on sequence-based identification, yet resources for identification of many fungal and oomycete pathogens are rare. We developed two web-based, searchable databases for rapid identification of Phytophthora spp. based on sequencing of the internal transcribed spacer (ITS) or the cytochrome oxidase...

  11. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung

    2011-04-30

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  12. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... with known priming sites....

  13. Postexercise hypotension during different water-based concurrent training intrasession sequences in young women.

    Science.gov (United States)

    Pinto, Stephanie Santana; Umpierre, Daniel; Ferreira, Hector Kerchirne; Nunes, Gabriela Neves; Ferrari, Rodrigo; Alberton, Cristine Lima

    2017-10-01

    The purpose of the study was to compare the acute effects of water-based resistance-aerobic (RA) and aerobic-resistance (AR) sequences on systolic blood pressure, diastolic blood pressure (DBP), and mean blood pressure (MBP) in young women. Thirteen active women participated in four sessions: (1) exercises familiarization, (2) aquatic maximal test to determine the heart rate (HR) corresponding to the anaerobic threshold (HR AT ), (3) concurrent protocol RA, and (4) concurrent protocol AR. Both protocols were initiated with the blood pressure measurements at rest in supine position. After that, either RA or AR concurrent protocol was performed. At the end of both protocols, blood pressure was measured throughout 60 minutes (every 10 minutes). The water-based resistance protocol was made up by exercises at maximal velocity, and the water-based aerobic protocol was performed at ±5 bpm of HR AT continuously. Two-way analysis of variance with repeated measures was used to analyze the data (α = 0.05). There was no hypotensive effect on systolic blood pressure among the time points (P = .235) in both water-based intrasession exercise sequences (P = .423). Regarding the DBP and MBP, both intrasession exercise sequences presented similar (DBP: P = .980; MBP: P = .796) hypotensive effects in the first 10 minutes (DBP: P = .003; MBP: P = .008) at the end of RA and AR sessions (DBP: -4 vs. -13 mm Hg; MBP: -3 vs. -10 mm Hg). It was concluded that both RA and AR water-based concurrent training sessions resulted in postexercise hypotension (DBP and MBP) in normotensive young women. Copyright © 2017 American Society of Hypertension. Published by Elsevier Inc. All rights reserved.

  14. A method to prioritize quantitative traits and individuals for sequencing in family-based studies.

    Directory of Open Access Journals (Sweden)

    Kaanan P Shah

    Full Text Available Owing to recent advances in DNA sequencing, it is now technically feasible to evaluate the contribution of rare variation to complex traits and diseases. However, it is still cost prohibitive to sequence the whole genome (or exome of all individuals in each study. For quantitative traits, one strategy to reduce cost is to sequence individuals in the tails of the trait distribution. However, the next challenge becomes how to prioritize traits and individuals for sequencing since individuals are often characterized for dozens of medically relevant traits. In this article, we describe a new method, the Rare Variant Kinship Test (RVKT, which leverages relationship information in family-based studies to identify quantitative traits that are likely influenced by rare variants. Conditional on nuclear families and extended pedigrees, we evaluate the power of the RVKT via simulation. Not unexpectedly, the power of our method depends strongly on effect size, and to a lesser extent, on the frequency of the rare variant and the number and type of relationships in the sample. As an illustration, we also apply our method to data from two genetic studies in the Old Order Amish, a founder population with extensive genealogical records. Remarkably, we implicate the presence of a rare variant that lowers fasting triglyceride levels in the Heredity and Phenotype Intervention (HAPI Heart study (p = 0.044, consistent with the presence of a previously identified null mutation in the APOC3 gene that lowers fasting triglyceride levels in HAPI Heart study participants.

  15. Simple sequence repeat (SSR)-based genetic variability among ...

    African Journals Online (AJOL)

    The objective of this study was to compare if simple sequence repeat (SSR) markers could correctly identify peanut genotypes with difference in specific leaf weight (SLW) and relative water content (RWC). Four peanut genotypes and two water regimes (FC and 1/3 available water; 1/3 AW) were arranged in factorial ...

  16. Phylogenetic relationships of Salmonella based on rRNA sequences

    DEFF Research Database (Denmark)

    Christensen, H.; Nordentoft, Steen; Olsen, J.E.

    1998-01-01

    separated by 16S rRNA analysis and found to be closely related to the Escherichia coli and Shigella complex by both 16S and 23S rRNA analyses. The diphasic serotypes S. enterica subspp. I and VI were separated from the monophasic serotypes subspp. IIIa and IV, including S. bongori, by 23S rRNA sequence...

  17. Illumina-based de novo transcriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    2017-12-18

    Dec 18, 2017 ... In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI ...

  18. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  19. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  20. Sequencing Learning Events in Performance-Based Instructional Systems.

    Science.gov (United States)

    Passmore, David L.

    The need for an empirically defensible means of sequencing instruction appears to have been the primary motivator for research into learning hierarchies. Four methods for generating candidates for learning hierarchies were reviewed: introspection, formal analysis, observation, and statistical "fishing." Experimental transfer of training…

  1. Angiosperm phylogeny based on matK sequence information

    NARCIS (Netherlands)

    Hilu, K.W.; Borsch, T.; Müller, K.; Soltis, D.E.; Savolainen, V.; Chase, M.W.; Powell, M.; Alice, L.A.; Evans, R.; Sauquet, H.; Neinhuis, C.; Slotta, T.A.B.; Rohwer, J.G.; Campbell, C.; Chatrou, L.W.

    2003-01-01

    Plastid matK gene sequences for 374 genera representing all angiosperm orders and 12 genera of gymnosperms were analyzed using parsimony (MP) and Bayesian inference (BI) approaches. Traditionally, slowly evolving genomic regions have been preferred for deep-level phylogenetic inference in

  2. Instruction sequence based non-uniform complexity classes

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2013-01-01

    We present an approach to non-uniform complexity in which single-pass instruction sequences play a key part, and answer various questions that arise from this approach. We introduce several kinds of non-uniform complexity classes. One kind includes a counterpart of the well-known non-uniform

  3. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    KAUST Repository

    Wang, Bing

    2013-05-09

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  4. Information Order Effects: Examining The Effect of Sequencing and Complexity in a Long Information Series

    Science.gov (United States)

    2007-06-01

    Primacy / Recency / No Effect ) Order Mental Effort Complexity (within Subject) Sequencing Length Response Mode 10 Hypotheses 1. Anchoring...1----------No Effect 92/100%124Recency 21/68%-----3/60%3/60%15/79% Primacy Long Series 124314No Effect 2814221/68% Primacy Short Series...Force towards primacy Force towards primacy Force towards primacy Long Series No effect No effect

  5. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  6. Translating sanger-based routine DNA diagnostics into generic massive parallel ion semiconductor sequencing.

    Science.gov (United States)

    Diekstra, Adinda; Bosgoed, Ermanno; Rikken, Alwin; van Lier, Bart; Kamsteeg, Erik-Jan; Tychon, Marloes; Derks, Ronny C; van Soest, Ronald A; Mensenkamp, Arjen R; Scheffer, Hans; Neveling, Kornelia; Nelen, Marcel R

    2015-01-01

    Dideoxy-based chain termination sequencing developed by Sanger is the gold standard sequencing approach and allows clinical diagnostics of disorders with relatively low genetic heterogeneity. Recently, new next generation sequencing (NGS) technologies have found their way into diagnostic laboratories, enabling the sequencing of large targeted gene panels or exomes. The development of benchtop NGS instruments now allows the analysis of single genes or small gene panels, making these platforms increasingly competitive with Sanger sequencing. We developed a generic automated ion semiconductor sequencing work flow that can be used in a clinical setting and can serve as a substitute for Sanger sequencing. Standard amplicon-based enrichment remained identical to PCR for Sanger sequencing. A novel postenrichment pooling strategy was developed, limiting the number of library preparations and reducing sequencing costs up to 70% compared to Sanger sequencing. A total of 1224 known pathogenic variants were analyzed, yielding an analytical sensitivity of 99.92% and specificity of 99.99%. In a second experiment, a total of 100 patient-derived DNA samples were analyzed using a blind analysis. The results showed an analytical sensitivity of 99.60% and specificity of 99.98%, comparable to Sanger sequencing. Ion semiconductor sequencing can be a first choice mutation scanning technique, independent of the genes analyzed. © 2014 American Association for Clinical Chemistry.

  7. One-dimensional TRFLP-SSCP is an effective DNA fingerprinting strategy for soil Archaea that is able to simultaneously differentiate broad taxonomic clades based on terminal fragment length polymorphisms and closely related sequences based on single stranded conformation polymorphisms.

    Science.gov (United States)

    Swanson, Colby A; Sliwinski, Marek K

    2013-09-01

    DNA fingerprinting methods provide a means to rapidly compare microbial assemblages from environmental samples without the need to first cultivate species in the laboratory. The profiles generated by these techniques are able to identify statistically significant temporal and spatial patterns, correlations to environmental gradients, and biological variability to estimate the number of replicates for clone libraries or next generation sequencing (NGS) surveys. Here we describe an improved DNA fingerprinting technique that combines terminal restriction fragment length polymorphisms (TRFLP) and single stranded conformation polymorphisms (SSCP) so that both can be used to profile a sample simultaneously rather than requiring two sequential steps as in traditional two-dimensional (2-D) gel electrophoresis. For the purpose of profiling Archaeal 16S rRNA genes from soil, the dynamic range of this combined 1-D TRFLP-SSCP approach was superior to TRFLP and SSCP. 1-D TRFLP-SSCP was able to distinguish broad taxonomic clades with genetic distances greater than 10%, such as Euryarchaeota and the Thaumarchaeal clades g_Ca. Nitrososphaera (formerly 1.1b) and o_NRP-J (formerly 1.1c) better than SSCP. In addition, 1-D TRFLP-SSCP was able to simultaneously distinguish closely related clades within a genus such as s_SCA1145 and s_SCA1170 better than TRFLP. We also tested the utility of 1-D TRFLP-SSCP fingerprinting of environmental assemblages by comparing this method to the generation of a 16S rRNA clone library of soil Archaea from a restored Tallgrass prairie. This study shows 1-D TRFLP-SSCP fingerprinting provides a rapid and phylogenetically informative screen of Archaeal 16S rRNA genes in soil samples. © 2013.

  8. In vitro HIV-1 selective integration into the target sequence and decoy-effect of the modified sequence.

    Directory of Open Access Journals (Sweden)

    Tatsuaki Tsuruyama

    Full Text Available Although there have been a few reports that the HIV-1 genome can be selectively integrated into the genomic DNA of cultured host cell, the biochemistry of integration selectivity has not been fully understood. We modified the in vitro integration reaction protocol and developed a reaction system with higher efficiency. We used a substrate repeat, 5'-(GTCCCTTCCCAGT(n(ACTGGGAAGGGAC(n-3', and a modified sequence DNA ligated into a circular plasmid. CAGT and ACTG (shown in italics in the above sequence in the repeat units originated from the HIV-1 proviral genome ends. Following the incubation of the HIV-1 genome end cDNA and recombinant integrase for the formation of the pre-integration (PI complex, substrate DNA was reacted with this complex. It was confirmed that the integration selectively occurred in the middle segment of the repeat sequence. In addition, integration frequency and selectivity were positively correlated with repeat number n. On the other hand, both frequency and selectivity decreased markedly when using sequences with deletion of CAGT in the middle position of the original target sequence. Moreover, on incubation with the deleted DNAs and original sequence, the integration efficiency and selectivity for the original target sequence were significantly reduced, which indicated interference effects by the deleted sequence DNAs. Efficiency and selectivity were also found to vary discontinuously with changes in manganese dichloride concentration in the reaction buffer, probably due to its influence on the secondary structure of substrate DNA. Finally, integrase was found to form oligomers on the binding site and substrate DNA formed a loop-like structure. In conclusion, there is a considerable selectivity in HIV-integration into the specified sequence; however, similar DNA sequences can interfere with the integration process, and it is therefore difficult for in vivo integration to occur selectively in the actual host genome DNA.

  9. Genetic diversity in breonadia salicina based on intra-species sequence variation of chloroplast dna spacer sequence

    International Nuclear Information System (INIS)

    Qurainy, F.A.; Gaafar, A.R.Z.

    2014-01-01

    Assessment and knowledge of the genetic diversity and variation within and between populations of rare and endangered plants is very important for effective conservation. Intergenic spacer sequences variation of psbA-trnH locus of chloroplast genome was assessed within Breonadia salicina (Rubiaceae), a critically endangered and endemic plant species to South western part of Kingdom of Saudi Arabia. The obtained sequence data from 19 individuals in three populations revealed nine haplotypes. The aligned sequences obtained from the overall Saudi accessions extended to 355 bp, revealing nine haplotypes. A high level of haplotype diversity (Hd = 0.842) and low level of nucleotide diversity (Pi = 0.0058) were detected. Consistently, both hierarchical analysis of molecular variance (AMOVA) and constructed neighbor-joining tree indicated null genetic differentiation among populations. This level of differentiation between populations or between regions in psbA-trnH sequences may be due to effects of the abundance of ancestral haplotype sharing and the presence of private haplotypes fixed for each population. Furthermore, the results revealed almost the same level of genetic diversity in comparison with Yemeni accessions, in which Saudi accessions were sharing three haplotypes from the four haplotypes found in Yemeni accessions. (author)

  10. Robust QKD-based private database queries based on alternative sequences of single-qubit measurements

    Science.gov (United States)

    Yang, YuGuang; Liu, ZhiChao; Chen, XiuBo; Zhou, YiHua; Shi, WeiMin

    2017-12-01

    Quantum channel noise may cause the user to obtain a wrong answer and thus misunderstand the database holder for existing QKD-based quantum private query (QPQ) protocols. In addition, an outside attacker may conceal his attack by exploiting the channel noise. We propose a new, robust QPQ protocol based on four-qubit decoherence-free (DF) states. In contrast to existing QPQ protocols against channel noise, only an alternative fixed sequence of single-qubit measurements is needed by the user (Alice) to measure the received DF states. This property makes it easy to implement the proposed protocol by exploiting current technologies. Moreover, to retain the advantage of flexible database queries, we reconstruct Alice's measurement operators so that Alice needs only conditioned sequences of single-qubit measurements.

  11. Characterization and Amplification of Gene-Based Simple Sequence Repeat (SSR) Markers in Date Palm.

    Science.gov (United States)

    Zhao, Yongli; Keremane, Manjunath; Prakash, Channapatna S; He, Guohao

    2017-01-01

    The paucity of molecular markers limits the application of genetic and genomic research in date palm (Phoenix dactylifera L.). Availability of expressed sequence tag (EST) sequences in date palm may provide a good resource for developing gene-based markers. This study characterizes a substantial fraction of transcriptome sequences containing simple sequence repeats (SSRs) from the EST sequences in date palm. The EST sequences studied are mainly homologous to those of Elaeis guineensis and Musa acuminata. A total of 911 gene-based SSR markers, characterized with functional annotations, have provided a useful basis not only for discovering candidate genes and understanding genetic basis of traits of interest but also for developing genetic and genomic tools for molecular research in date palm, such as diversity study, quantitative trait locus (QTL) mapping, and molecular breeding. The procedures of DNA extraction, polymerase chain reaction (PCR) amplification of these gene-based SSR markers, and gel electrophoresis of PCR products are described in this chapter.

  12. The effects of receptive and expressive instructional sequences on varied conditional discriminations.

    Science.gov (United States)

    Bao, Shimin; Sweatt, Kristin T; Lechago, Sarah A; Antal, Sarah

    2017-10-01

    Many Early Intensive Behavioral Intervention (EIBI) curricula recommend teaching receptive responding before targeting expressive responding (Leaf & McEachin, 1999; Lovaas, 2003). However, a small literature base suggests that teaching expressive responses first may be more efficient when teaching children with ASD and other developmental disabilities (Petursdottir & Carr, 2011). The present study employed an alternating treatments design to compare the effects of three instructional sequences to teach feature, function, and class to three children diagnosed with ASD: (a) receptive-expressive, (b) expressive-receptive, and (c) mixed. The results suggested that expressive-receptive was the most efficient training sequence for all three participants. Additionally, greater emergent responding was observed with the expressive-receptive training sequence. © 2017 Society for the Experimental Analysis of Behavior.

  13. Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

    Science.gov (United States)

    Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

    2016-06-02

    Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine. Copyright © 2016 American Society of Human Genetics. All rights reserved.

  14. Efficient DNA fingerprinting based on the targeted sequencing of active retrotransposon insertion sites using a bench-top high-throughput sequencing platform.

    Science.gov (United States)

    Monden, Yuki; Yamamoto, Ayaka; Shindo, Akiko; Tahara, Makoto

    2014-10-01

    In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  15. A Parallel Non-Alignment Based Approach to Efficient Sequence Comparison using Longest Common Subsequences

    International Nuclear Information System (INIS)

    Bhowmick, S; Shafiullah, M; Rai, H; Bastola, D

    2010-01-01

    Biological sequence comparison programs have revolutionized the practice of biochemistry, and molecular and evolutionary biology. Pairwise comparison of genomic sequences is a popular method of choice for analyzing genetic sequence data. However the quality of results from most sequence comparison methods are significantly affected by small perturbations in the data and furthermore, there is a dearth of computational tools to compare sequences beyond a certain length. In this paper, we describe a parallel algorithm for comparing genetic sequences using an alignment free-method based on computing the Longest Common Subsequence (LCS) between genetic sequences. We validate the quality of our results by comparing the phylogenetic tress obtained from ClustalW and LCS. We also show through complexity analysis of the isoefficiency and by empirical measurement of the running time that our algorithm is very scalable.

  16. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Directory of Open Access Journals (Sweden)

    Xiaoxia Yang

    Full Text Available Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  17. Sequence-Based Introgression Mapping Identifies Candidate White Mold Tolerance Genes in Common Bean

    Directory of Open Access Journals (Sweden)

    Sujan Mamidi

    2016-07-01

    Full Text Available White mold, caused by the necrotrophic fungus (Lib. de Bary, is a major disease of common bean ( L.. WM7.1 and WM8.3 are two quantitative trait loci (QTL with major effects on tolerance to the pathogen. Advanced backcross populations segregating individually for either of the two QTL, and a recombinant inbred (RI population segregating for both QTL were used to fine map and confirm the genetic location of the QTL. The QTL intervals were physically mapped using the reference common bean genome sequence, and the physical intervals for each QTL were further confirmed by sequence-based introgression mapping. Using whole-genome sequence data from susceptible and tolerant DNA pools, introgressed regions were identified as those with significantly higher numbers of single-nucleotide polymorphisms (SNPs relative to the whole genome. By combining the QTL and SNP data, WM7.1 was located to a 660-kb region that contained 41 gene models on the proximal end of chromosome Pv07, while the WM8.3 introgression was narrowed to a 1.36-Mb region containing 70 gene models. The most polymorphic candidate gene in the WM7.1 region encodes a BEACH-domain protein associated with apoptosis. Within the WM8.3 interval, a receptor-like protein with the potential to recognize pathogen effectors was the most polymorphic gene. The use of gene and sequence-based mapping identified two candidate genes whose putative functions are consistent with the current model of pathogenicity.

  18. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    Science.gov (United States)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  19. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature.......Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...

  20. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    Science.gov (United States)

    2008-07-01

    Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC...COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES...sequences which are generalizations of the Fibonacci sequences. 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16

  1. Iterative refinement of structure-based sequence alignments by Seed Extension

    Directory of Open Access Journals (Sweden)

    Lee Byungkook

    2009-07-01

    Full Text Available Abstract Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or

  2. An Analysis of Delay-based and Integrator-based Sequence Detectors for Grid-Connected Converters

    DEFF Research Database (Denmark)

    Khazraj, Hesam; Silva, Filipe Miguel Faria da; Bak, Claus Leth

    2017-01-01

    Detecting and separating positive and negative sequence components of the grid voltage or current is of vital importance in the control of grid-connected power converters, HVDC systems, etc. To this end, several techniques have been proposed in recent years. These techniques can be broadly...... classified into two main classes: The integrator-based techniques and Delay-based techniques. The complex-coefficient filter-based technique, dual second-order generalized integrator-based method, multiple reference frame approach are the main members of the integrator-based sequence detector and the delay......-signal cancellation operators are the main members of the delay-based sequence detectors. The aim of this paper is to provide a theoretical and experimental comparative study between integrator and delay based sequence detectors. The theoretical analysis is conducted based on the small-signal modelling...

  3. A Priori Knowledge and Probability Density Based Segmentation Method for Medical CT Image Sequences

    Directory of Open Access Journals (Sweden)

    Huiyan Jiang

    2014-01-01

    Full Text Available This paper briefly introduces a novel segmentation strategy for CT images sequences. As first step of our strategy, we extract a priori intensity statistical information from object region which is manually segmented by radiologists. Then we define a search scope for object and calculate probability density for each pixel in the scope using a voting mechanism. Moreover, we generate an optimal initial level set contour based on a priori shape of object of previous slice. Finally the modified distance regularity level set method utilizes boundaries feature and probability density to conform final object. The main contributions of this paper are as follows: a priori knowledge is effectively used to guide the determination of objects and a modified distance regularization level set method can accurately extract actual contour of object in a short time. The proposed method is compared to other seven state-of-the-art medical image segmentation methods on abdominal CT image sequences datasets. The evaluated results demonstrate our method performs better and has the potential for segmentation in CT image sequences.

  4. A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

    Science.gov (United States)

    Tong, Xiaojun; Cui, Minggen; Wang, Zhu

    2009-07-01

    The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.

  5. Comparison of ompP5 sequence-based typing and pulsed-filed gel ...

    African Journals Online (AJOL)

    In this study, comparison of the outer membrane protein P5 gene (ompP5) sequence-based typing with pulsed-field gel electrophoresis (PFGE) for the genotyping of Haemophilus parasuis, the 15 serovar reference strains and 43 isolates were investigated. When comparing the two methods, 31 ompP5 sequence types ...

  6. PGSTE-WATERGATE: An STE-based PGSE NMR sequence with excellent solvent suppression

    Science.gov (United States)

    Zheng, Gang; Stait-Gardner, Timothy; Anil Kumar, P. G.; Torres, Allan M.; Price, William S.

    2008-03-01

    A new stimulated-echo based pulsed gradient spin-echo NMR diffusion sequence incorporating WATERGATE solvent suppression, PGSTE-WATERGATE, is presented. The sequence provides superb solvent suppression without any phase distortions. The sequence is simple to set up and particularly suited to measuring diffusion coefficients in aqueous solution such as is commonly required in pharmaceutical and combinatorial applications. The utility of the sequence is demonstrated on samples containing lysozyme and sucrose. Importantly, the high degree of phase-distortion suppression allows more complicated selective π pulses to be used to enhance the selectivity of solvent suppression.

  7. Zadoff-Chu sequence-based hitless ranging scheme for OFDMA-PON configured 5G fronthaul uplinks

    Science.gov (United States)

    Reza, Ahmed Galib; Rhee, June-Koo Kevin

    2017-05-01

    A Zadoff-Chu (ZC) sequence-based low-complexity hitless upstream time synchronization scheme is proposed for an orthogonal frequency division multiple access passive optical network configured cloud radio access network fronthaul. The algorithm is based on gradual loading of the ZC sequences, where the phase discontinuity due to the cyclic prefix is alleviated by a frequency domain phase precoder, eliminating the requirements of guard bands to mitigate intersymbol interference and inter-carrier interference. Simulation results for uncontrolled-wavelength asynchronous transmissions from four concurrent transmitting optical network units are presented to demonstrate the effectiveness of the proposed scheme.

  8. SHAPE Selection (SHAPES) enrich for RNA structure signal in SHAPE sequencing-based probing data.

    Science.gov (United States)

    Poulsen, Line Dahl; Kielpinski, Lukasz Jan; Salama, Sofie R; Krogh, Anders; Vinther, Jeppe

    2015-05-01

    Selective 2' Hydroxyl Acylation analyzed by Primer Extension (SHAPE) is an accurate method for probing of RNA secondary structure. In existing SHAPE methods, the SHAPE probing signal is normalized to a no-reagent control to correct for the background caused by premature termination of the reverse transcriptase. Here, we introduce a SHAPE Selection (SHAPES) reagent, N-propanone isatoic anhydride (NPIA), which retains the ability of SHAPE reagents to accurately probe RNA structure, but also allows covalent coupling between the SHAPES reagent and a biotin molecule. We demonstrate that SHAPES-based selection of cDNA-RNA hybrids on streptavidin beads effectively removes the large majority of background signal present in SHAPE probing data and that sequencing-based SHAPES data contain the same amount of RNA structure data as regular sequencing-based SHAPE data obtained through normalization to a no-reagent control. Moreover, the selection efficiently enriches for probed RNAs, suggesting that the SHAPES strategy will be useful for applications with high-background and low-probing signal such as in vivo RNA structure probing. © 2015 Poulsen et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  9. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

    Directory of Open Access Journals (Sweden)

    Chengxi Ye

    2016-06-01

    Full Text Available Motivation. The third generation sequencing (3GS technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS technologies (less than 1%. Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.

  10. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-08-01

    RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.

  11. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer

    DEFF Research Database (Denmark)

    Álvarez-Martos, Isabel; Ferapontova, Elena

    2017-01-01

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding...... of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained...... by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus...

  12. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  13. Graph-based sequence annotation using a data integration approach

    Directory of Open Access Journals (Sweden)

    Pesch Robert

    2008-06-01

    Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.

  14. Graph-based sequence annotation using a data integration approach.

    Science.gov (United States)

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  15. An efficient binomial model-based measure for sequence comparison and its application.

    Science.gov (United States)

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  16. A novel DNA restriction technology based on laser pulse energy conversion on sequence-specific bound metal nanoparticles

    Science.gov (United States)

    Csaki, Andrea; Maubach, Gunter; Garwe, Frank; Steinbrueck, Andrea; Koenig, Karsten; Fritzsche, Wolfgang

    2005-03-01

    DNA restriction is a basic method in today"s molecular biology. Besides application for DNA manipulation, this method is used in DNA analytics for 'restriction analysis'. Thereby DNA is digested by sequence specific restriction enzymes, and the length distribution of the resulting fragments is detected by gel electrophoresis. Differences in the sequence lead to different restriction patterns. A disadvantage of this standard method is the limitation to a small set of fixed sequences, so that the assay can not be adapted to any sequence of interest (e.g. SNP). We designed a scheme for DNA restriction in order to provide access to any desired sequence, based on laser light conversion on sequence-specific positioned metal nanoparticles. Especially gold nanoparticles are known for their interesting optical properties caused by plasmon resonance. The resulting absorption can be used to convert laser light pulses into heat, resulting in nanoparticle destruction. We work on the combination of this principle with DNA-modification of nanoparticles and the sequence-specific binding (hybridization) of these DNA-nanoparticle complexes along DNA molecules. Different mechanisms of light-conversion were studied, and the destructive effect of laser light on the nanoparticles and DNA is demonstrated.

  17. EFFECT OF DYE CONCENTRATION ON SEQUENCING BATCH REACTOR PERFORMANCE

    Directory of Open Access Journals (Sweden)

    A. A. Vaigan ، M. R. Alavi Moghaddam ، H. Hashemi

    2009-01-01

    Full Text Available Reactive dyes have been identified as problematic compounds in textile industries wastewater as they are water soluble and cannot be easily removed by conventional aerobic biological treatment systems. The treatability of a reactive dye (Brill Blue KN-R by sequencing batch reactor and the influence of the dye concentration on system performance were investigated in this study. Brill Blue KN-R is one of the main dyes that are used in textile industries in Iran. Four cylindrical Plexiglas reactors were run for 36 days (5 days for acclimatization of sludge and 31 days for normal operation at different initial dye concentrations. The dye concentrations were adjusted to be 20, 25, 30 and 40 mg/L in the reactors R1, R2, R3 and R4, respectively. In all reactors, effective volume, influent wastewater flowrate and sludge retention time were 5.5 L, 3.0 L/d and 10 d, respectively. According to the obtained data, average dye removal efficiencies of R1, R2, R3 and R4 were 57% ± 2, 50.18% ± 3, 44.97% ± 3 and 30.98% ± 3, respectively. The average COD removal efficiencies of all reactors were 97% ± 1, 97.12% ± 1, 96.93% ± 1 and 97.22% ± 1, respectively. The dye removal efficiency was decreased by increasing the dye concentration with the correlation coefficient of 0.997.

  18. A comparison of single molecule and amplification based sequencing of cancer transcriptomes.

    Directory of Open Access Journals (Sweden)

    Lee T Sam

    2011-03-01

    Full Text Available The second wave of next generation sequencing technologies, referred to as single-molecule sequencing (SMS, carries the promise of profiling samples directly without employing polymerase chain reaction steps used by amplification-based sequencing (AS methods. To examine the merits of both technologies, we examine mRNA sequencing results from single-molecule and amplification-based sequencing in a set of human cancer cell lines and tissues. We observe a characteristic coverage bias towards high abundance transcripts in amplification-based sequencing. A larger fraction of AS reads cover highly expressed genes, such as those associated with translational processes and housekeeping genes, resulting in relatively lower coverage of genes at low and mid-level abundance. In contrast, the coverage of high abundance transcripts plateaus off using SMS. Consequently, SMS is able to sequence lower- abundance transcripts more thoroughly, including some that are undetected by AS methods; however, these include many more mapping artifacts. A better understanding of the technical and analytical factors introducing platform specific biases in high throughput transcriptome sequencing applications will be critical in cross platform meta-analytic studies.

  19. CyclinPred: a SVM-based method for predicting cyclin protein sequences.

    Directory of Open Access Journals (Sweden)

    Mridul K Kalita

    Full Text Available Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server--CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.

  20. A Window Into Clinical Next-Generation Sequencing-Based Oncology Testing Practices.

    Science.gov (United States)

    Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D

    2017-12-01

    - Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.

  1. Study on multiple-hops performance of MOOC sequences-based optical labels for OPS networks

    Science.gov (United States)

    Zhang, Chongfu; Qiu, Kun; Ma, Chunli

    2009-11-01

    In this paper, we utilize a new study method that is under independent case of multiple optical orthogonal codes to derive the probability function of MOOCS-OPS networks, discuss the performance characteristics for a variety of parameters, and compare some characteristics of the system employed by single optical orthogonal code or multiple optical orthogonal codes sequences-based optical labels. The performance of the system is also calculated, and our results verify that the method is effective. Additionally it is found that performance of MOOCS-OPS networks would, negatively, be worsened, compared with single optical orthogonal code-based optical label for optical packet switching (SOOC-OPS); however, MOOCS-OPS networks can greatly enlarge the scalability of optical packet switching networks.

  2. Effect of promoter strength and signal sequence on the periplasmic ...

    African Journals Online (AJOL)

    Two plasmids, pFLAG-ATS and pET 26b(+), were studied for the periplasmic expression of recombinant human interferon-2b (IFN-2b) in Escherichia coli. The pFLAG-ATS contains ompA signal sequence and tac promoter while pET 26b(+) contains pelB signal sequence and T7lac promoter. It was observed that periplasmic ...

  3. Correlated mutations in protein sequences: Phylogenetic and structural effects

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.S. [Los Alamos National Lab., NM (United States). Theoretical Div.]|[Santa Fe Inst., NM (United States); Giraud, B.G. [C.E.N. Saclay, Gif/Yvette (France). Service Physique Theorique; Liu, L.C. [Los Alamos National Lab., NM (United States). Theoretical Div.; Stormo, G.D. [Univ. of Colorado, Boulder, CO (United States). Dept. of Molecular, Cellular and Developmental Biology

    1998-12-01

    Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain structural and functional links, but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure. In this paper the authors identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a non-trivial phylogenetic tree. The authors present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. They present a maximum entropy solution to this classic problem of causation versus correlation. The methodologies are validated in simulation.

  4. Improved PCR-Based Detection of Soil Transmitted Helminth Infections Using a Next-Generation Sequencing Approach to Assay Design.

    Directory of Open Access Journals (Sweden)

    Nils Pilotte

    2016-03-01

    Full Text Available The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays.Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis. Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay.The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.

  5. Swarm-based Sequencing Recommendations in E-learning

    NARCIS (Netherlands)

    Van den Berg, Bert; Van Es, René; Tattersall, Colin; Janssen, José; Manderveld, Jocelyn; Brouns, Francis; Kurvers, Hub; Koper, Rob

    2005-01-01

    To be presented at the International Workshop on Recommender Agents and Adaptive Web-based Systems (RAAWS 2005) held in conjunction with the Intelligent Systems Design and Applications 2005 Conference (ISDA 2005), Wroclaw, Poland, September 8-10, 2005. Proceedings 5th International Conference on

  6. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature....

  7. Team-based learning to improve learning outcomes in a therapeutics course sequence.

    Science.gov (United States)

    Bleske, Barry E; Remington, Tami L; Wells, Trisha D; Dorsch, Michael P; Guthrie, Sally K; Stumpf, Janice L; Alaniz, Marissa C; Ellingrod, Vicki L; Tingen, Jeffrey M

    2014-02-12

    To compare the effectiveness of team-based learning (TBL) to that of traditional lectures on learning outcomes in a therapeutics course sequence. A revised TBL curriculum was implemented in a therapeutic course sequence. Multiple choice and essay questions identical to those used to test third-year students (P3) taught using a traditional lecture format were administered to the second-year pharmacy students (P2) taught using the new TBL format. One hundred thirty-one multiple-choice questions were evaluated; 79 tested recall of knowledge and 52 tested higher level, application of knowledge. For the recall questions, students taught through traditional lectures scored significantly higher compared to the TBL students (88%±12% vs. 82%±16%, p=0.01). For the questions assessing application of knowledge, no differences were seen between teaching pedagogies (81%±16% vs. 77%±20%, p=0.24). Scores on essay questions and the number of students who achieved 100% were also similar between groups. Transition to a TBL format from a traditional lecture-based pedagogy allowed P2 students to perform at a similar level as students with an additional year of pharmacy education on application of knowledge type questions. However, P3 students outperformed P2 students regarding recall type questions and overall. Further assessment of long-term learning outcomes is needed to determine if TBL produces more persistent learning and improved application in clinical settings.

  8. Interframe DPCM with robust median-based predictors for transmission of image sequences over noisy channels.

    Science.gov (United States)

    Song, X; Viero, T; Neuvo, Y

    1996-01-01

    A new image sequence coding technique based on robust median-based predictors is presented for the transmission of image sequences over noisy channels. We analyze the robustness of median-based predictors against channel errors. A heuristic algorithm for the design of a robust predictor from a given median-based predictor is presented. It is shown that with small modifications in terms of a necessary requirement for a median-based predictor to be robust against channel errors, the robustness of a given median-based predictor can be considerably improved. Simulations on a real image sequence show significant improvement over the conventional differential pulse code modulation (DPCM) at high bit error rate (BER) using this new technique. The technique does not increase the transmission rate. It is shown that the quality of reconstructed images obtained by robust median-based predictors can be further improved by postprocessing the image using a nonlinear detail-preserving noise-smoothing filter.

  9. Base-calling algorithm with vocabulary (BCV method for analyzing population sequencing chromatograms.

    Directory of Open Access Journals (Sweden)

    Yuri S Fantin

    Full Text Available Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.

  10. Base-calling algorithm with vocabulary (BCV) method for analyzing population sequencing chromatograms.

    Science.gov (United States)

    Fantin, Yuri S; Neverov, Alexey D; Favorov, Alexander V; Alvarez-Figueroa, Maria V; Braslavskaya, Svetlana I; Gordukova, Maria A; Karandashova, Inga V; Kuleshov, Konstantin V; Myznikova, Anna I; Polishchuk, Maya S; Reshetov, Denis A; Voiciehovskaya, Yana A; Mironov, Andrei A; Chulanov, Vladimir P

    2013-01-01

    Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.

  11. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    Science.gov (United States)

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  12. How effective is graphene nanopore geometry on DNA sequencing?

    OpenAIRE

    Satarifard, Vahid; Foroutan, Masumeh; Ejtehadi, Mohammad Reza

    2015-01-01

    In this paper we investigate the effects of graphene nanopore geometry on homopolymer ssDNA pulling process through nanopore using steered molecular dynamic (SMD) simulations. Different graphene nanopores are examined including axially symmetric and asymmetric monolayer graphene nanopores as well as five layer graphene polyhedral crystals (GPC). The pulling force profile, moving fashion of ssDNA, work done in irreversible DNA pulling and orientations of DNA bases near the nanopore are assesse...

  13. Effects of the antimicrobial tylosin on the microbial community structure of an anaerobic sequencing batch reactor.

    Science.gov (United States)

    Shimada, Toshio; Li, Xu; Zilles, Julie L; Morgenroth, Eberhard; Raskin, Lutgarde

    2011-02-01

    The effects of the antimicrobial tylosin on a methanogenic microbial community were studied in a glucose-fed laboratory-scale anaerobic sequencing batch reactor (ASBR) exposed to stepwise increases of tylosin (0, 1.67, and 167 mg/L). The microbial community structure was determined using quantitative fluorescence in situ hybridization (FISH) and phylogenetic analyses of bacterial 16S ribosomal RNA (rRNA) gene clone libraries of biomass samples. During the periods without tylosin addition and with an influent tylosin concentration of 1.67 mg/L, 16S rRNA gene sequences related to Syntrophobacter were detected and the relative abundance of Methanosaeta species was high. During the highest tylosin dose of 167 mg/L, 16S rRNA gene sequences related to Syntrophobacter species were not detected and the relative abundance of Methanosaeta decreased considerably. Throughout the experimental period, Propionibacteriaceae and high GC Gram-positive bacteria were present, based on 16S rRNA gene sequences and FISH analyses, respectively. The accumulation of propionate and subsequent reactor failure after long-term exposure to tylosin are attributed to the direct inhibition of propionate-oxidizing syntrophic bacteria closely related to Syntrophobacter and the indirect inhibition of Methanosaeta by high propionate concentrations and low pH. © 2010 Wiley Periodicals, Inc.

  14. Next generation sequencing-based emerging trends in molecular biology of gastric cancer.

    Science.gov (United States)

    Verma, Renu; Sharma, Prakash C

    2018-01-01

    Gastric cancer (GC) is one of the leading causes of cancer related mortality in the world. Being asymptomatic in nature till advanced stage, diagnosis of gastric cancer becomes difficult in early stages of the disease. The onset and progression of gastric cancer has been attributed to multiple factors including genetic alterations, epigenetic modifications, Helicobacter pylori and Epstein-Barr Virus (EBV) infection, and dietary habits. Next Generation Sequencing (NGS) based approaches viz . Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES), RNA-Seq, and targeted sequencing have expanded the knowledge base of molecular pathogenesis of gastric cancer. In this review, we highlight recent NGS-based advances covering various genetic alterations (Microsatellite Instability, Single Nucleotide Variations, and Copy Number Variations), epigenetic changes (DNA methylation, histone modification, microRNAs) and differential gene expression during gastric tumorigenesis. We also briefly discuss the current and future potential biomarkers, drugs and therapeutic approaches available for the management of gastric cancer.

  15. Performance of Correspondence Algorithms in Vision-Based Driver Assistance Using an Online Image Sequence Database

    DEFF Research Database (Denmark)

    Klette, Reinhard; Krüger, Norbert; Vaudrey, Tobi

    2011-01-01

    the classification of recorded video data into situations defined by a cooccurrence of some events in recorded traffic scenes. About 100-400 stereo frames (or 4-16 s of recording) are considered a basic sequence, which will be identified with one particular situation. Future testing is expected to be on data......This paper discusses options for testing correspondence algorithms in stereo or motion analysis that are designed or considered for vision-based driver assistance. It introduces a globally available database, with a main focus on testing on video sequences of real-world data. We suggest...... that report on hours of driving, and multiple hours of long video data may be segmented into basic sequences and classified into situations. This paper prepares for this expected development. This paper uses three different evaluation approaches (prediction error, synthesized sequences, and labeled sequences...

  16. Effects of KLK Peptide on Adjuvanticity of Different ODN Sequences

    Directory of Open Access Journals (Sweden)

    Ghania Chikh

    2016-05-01

    Full Text Available Endosomal Toll-like receptors (TLR such as TLR3, 7, 8 and 9 recognize pathogen associated nucleic acids. While DNA sequence does influence degree of binding to and activation of TLR9, it also appears to influence the ability of the ligand to reach the intracellular endosomal compartment. The KLK (KLKL5KLK antimicrobial peptide, which is immunostimulatory itself, can translocate into cells without cell membrane permeabilization and thus can be used for endosomal delivery of TLR agonists, as has been shown with the IC31 formulation that contains an oligodeoxynucleotide (ODN TLR9 agonist. We evaluated the adjuvant activity of KLK combined with CpG or non-CpG (GpC ODN synthesized with nuclease resistant phosphorothioate (S or native phosphodiester (O backbones with ovalbumin (OVA antigen in mice. As single adjuvants, CpG(S gave the strongest enhancement of OVA-specific immunity and the addition of KLK provided no benefit and was actually detrimental for some readouts. In contrast, KLK enhanced the adjuvant effects of CpG(O and to a lesser extent of GpC (S, which on their own had little or no activity. Indeed while CD8 T cells, IFN-γ secretion and humoral response to vaccine antigen were enhanced when CpG(O was combined with KLK, only IFN-γ secretion was enhanced when GpC (S was combined to KLK. The synergistic adjuvant effects with KLK/ODN combinations were TLR9-mediated since they did not occur in TLR9 knock-out mice. We hypothesize that a nuclease resistant ODN with CpG motifs has its own mechanism for entering cells to reach the endosome. For ODN without CpG motifs, KLK appears to provide an alternate mechanism for accessing the endosome, where it can activate TLR9, albeit with lower potency than a CpG ODN. For nuclease sensitive (O backbone ODN, KLK may also provide protection from nucleases in the tissues.

  17. Human Gait Recognition Based on Multiview Gait Sequences

    Directory of Open Access Journals (Sweden)

    Xiaxi Huang

    2008-05-01

    Full Text Available Most of the existing gait recognition methods rely on a single view, usually the side view, of the walking person. This paper investigates the case in which several views are available for gait recognition. It is shown that each view has unequal discrimination power and, therefore, should have unequal contribution in the recognition process. In order to exploit the availability of multiple views, several methods for the combination of the results that are obtained from the individual views are tested and evaluated. A novel approach for the combination of the results from several views is also proposed based on the relative importance of each view. The proposed approach generates superior results, compared to those obtained by using individual views or by using multiple views that are combined using other combination methods.

  18. A classification approach for genotyping viral sequences based on multidimensional scaling and linear discriminant analysis.

    Science.gov (United States)

    Kim, Jiwoong; Ahn, Yongju; Lee, Kichan; Park, Sung Hee; Kim, Sangsoo

    2010-08-21

    Accurate classification into genotypes is critical in understanding evolution of divergent viruses. Here we report a new approach, MuLDAS, which classifies a query sequence based on the statistical genotype models learned from the known sequences. Thus, MuLDAS utilizes full spectra of well characterized sequences as references, typically of an order of hundreds, in order to estimate the significance of each genotype assignment. MuLDAS starts by aligning the query sequence to the reference multiple sequence alignment and calculating the subsequent distance matrix among the sequences. They are then mapped to a principal coordinate space by multidimensional scaling, and the coordinates of the reference sequences are used as features in developing linear discriminant models that partition the space by genotype. The genotype of the query is then given as the maximum a posteriori estimate. MuLDAS tests the model confidence by leave-one-out cross-validation and also provides some heuristics for the detection of 'outlier' sequences that fall far outside or in-between genotype clusters. We have tested our method by classifying HIV-1 and HCV nucleotide sequences downloaded from NCBI GenBank, achieving the overall concordance rates of 99.3% and 96.6%, respectively, with the benchmark test dataset retrieved from the respective databases of Los Alamos National Laboratory. The highly accurate genotype assignment coupled with several measures for evaluating the results makes MuLDAS useful in analyzing the sequences of rapidly evolving viruses such as HIV-1 and HCV. A web-based genotype prediction server is available at http://www.muldas.org/MuLDAS/.

  19. STUDY OF BLOCKING EFFECT ELIMINATION METHODS BY MEANS OF INTRAFRAME VIDEO SEQUENCE INTERPOLATION

    Directory of Open Access Journals (Sweden)

    I. S. Rubina

    2015-01-01

    Full Text Available The paper deals with image interpolation methods and their applicability to eliminate some of the artifacts related to both the dynamic properties of objects in video sequences and algorithms used in the order of encoding steps. The main drawback of existing methods is the high computational complexity, unacceptable in video processing. Interpolation of signal samples for blocking - effect elimination at the output of the convertion encoding is proposed as a part of the study. It was necessary to develop methods for improvement of compression ratio and quality of the reconstructed video data by blocking effect elimination on the borders of the segments by intraframe interpolating of video sequence segments. The main point of developed methods is an adaptive recursive algorithm application with adaptive-sized interpolation kernel both with and without the brightness gradient consideration at the boundaries of objects and video sequence blocks. Within theoretical part of the research, methods of information theory (RD-theory and data redundancy elimination, methods of pattern recognition and digital signal processing, as well as methods of probability theory are used. Within experimental part of the research, software implementation of compression algorithms with subsequent comparison of the implemented algorithms with the existing ones was carried out. Proposed methods were compared with the simple averaging algorithm and the adaptive algorithm of central counting interpolation. The advantage of the algorithm based on the adaptive kernel size selection interpolation is in compression ratio increasing by 30%, and the advantage of the modified algorithm based on the adaptive interpolation kernel size selection is in the compression ratio increasing by 35% in comparison with existing algorithms, interpolation and quality of the reconstructed video sequence improving by 3% compared to the one compressed without interpolation. The findings will be

  20. Reassociation kinetics-based approach for partial genome sequencing of the cattle tick, Rhipicephalus (Boophilus microplus

    Directory of Open Access Journals (Sweden)

    Bellgard Matthew

    2010-06-01

    Full Text Available Abstract Background The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence fiscally and technically problematic. To selectively obtain gene-enriched regions of this tick's genome, Cot filtration was performed, and Cot-filtered DNA was sequenced via 454 FLX pyrosequencing. Results The sequenced Cot-filtered genomic DNA was assembled with an EST-based gene index of 14,586 unique entries where each EST served as a potential "seed" for scaffold formation. The new sequence assembly extended the lengths of 3,913 of the 14,586 gene index entries. Over half of the extensions corresponded to extensions of over 30 amino acids. To survey the repetitive elements in the tick genome, the complete sequences of five BAC clones were determined. Both Class I and II transposable elements were found. Comparison of the BAC and Cot filtration data indicates that Cot filtration was highly successful in filtering repetitive DNA out of the genomic DNA used in 454 sequencing. Conclusion Cot filtration is a very useful strategy to incorporate into genome sequencing projects on organisms with large genome sizes and which contain high percentages of repetitive, difficult to assemble, genomic DNA. Combining the Cot selection approach with 454 sequencing and assembly with a pre-existing EST database as seeds resulted in extensions of 27% of the members of the EST database.

  1. Effects of Early Musical Experience on Auditory Sequence Memory

    Directory of Open Access Journals (Sweden)

    Adam T. Tierney

    2008-12-01

    Full Text Available The present study investigated a possible link between musical training and immediate memory span by testing experienced musicians and three groups of musically inexperienced subjects (gymnasts, Psychology 101 students, and video game players on sequence memory and word familiarity tasks. By including skilled gymnasts who began studying their craft by age six, video game players, and Psychology 101 students as comparison groups, we attempted to control for some of the ways skilled musicians may differ from participants drawn from the general population in terms of gross motor skills and intensive experience in a highly skilled domain from an early age. We found that musicians displayed longer immediate memory spans than the comparison groups on auditory presentation conditions of the sequence reproductive span task. No differences were observed between the four groups on the visual conditions of the sequence memory task. These results provide additional converging support to recent findings showing that early musical experience and activity-dependent learning may selectively affect verbal rehearsal processes and the allocation of attention in sequence memory tasks.

  2. Effects of Early Musical Experience on Auditory Sequence Memory.

    Science.gov (United States)

    Tierney, Adam T; Bergeson-Dana, Tonya R; Pisoni, David B

    2008-10-01

    The present study investigated a possible link between musical training and immediate memory span by testing experienced musicians and three groups of musically inexperienced subjects (gymnasts, Psychology 101 students, and video game players) on sequence memory and word familiarity tasks. By including skilled gymnasts who began studying their craft by age six, video game players, and Psychology 101 students as comparison groups, we attempted to control for some of the ways skilled musicians may differ from participants drawn from the general population in terms of gross motor skills and intensive experience in a highly skilled domain from an early age. We found that musicians displayed longer immediate memory spans than the comparison groups on auditory presentation conditions of the sequence reproductive span task. No differences were observed between the four groups on the visual conditions of the sequence memory task. These results provide additional converging support to recent findings showing that early musical experience and activity-dependent learning may selectively affect verbal rehearsal processes and the allocation of attention in sequence memory tasks.

  3. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  4. Network-Based Effectiveness

    National Research Council Canada - National Science Library

    Friman, Henrik

    2006-01-01

    ... (extended from Leavitt, 1965). This text identifies aspects of network-based effectiveness that can benefit from a better understanding of leadership and management development of people, procedures, technology, and organizations...

  5. A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine

    Science.gov (United States)

    Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing

    2017-09-01

    Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.

  6. Micro-motion Recognition of Spatial Cone Target Based on ISAR Image Sequences

    Directory of Open Access Journals (Sweden)

    Changyong Shu

    2016-04-01

    Full Text Available The accurate micro-motions recognition of spatial cone target is the foundation of the characteristic parameter acquisition. For this reason, a micro-motion recognition method based on the distinguishing characteristics extracted from the Inverse Synthetic Aperture Radar (ISAR sequences is proposed in this paper. The projection trajectory formula of cone node strong scattering source and cone bottom slip-type strong scattering sources, which are located on the spatial cone target, are deduced under three micro-motion types including nutation, precession, and spinning, and the correctness is verified by the electromagnetic simulation. By comparison, differences are found among the projection of the scattering sources with different micro-motions, the coordinate information of the scattering sources in the Inverse Synthetic Aperture Radar sequences is extracted by the CLEAN algorithm, and the spinning is recognized by setting the threshold value of Doppler. The double observation points Interacting Multiple Model Kalman Filter is used to separate the scattering sources projection of the nutation target or precession target, and the cross point number of each scattering source’s projection track is used to classify the nutation or precession. Finally, the electromagnetic simulation data are used to verify the effectiveness of the micro-motion recognition method.

  7. Introducing a model of pairing based on base pair specific interactions between identical DNA sequences

    Science.gov (United States)

    (O’ Lee, Dominic J.

    2018-02-01

    At present, there have been suggested two types of physical mechanism that may facilitate preferential pairing between DNA molecules, with identical or similar base pair texts, without separation of base pairs. One mechanism solely relies on base pair specific patterns of helix distortion being the same on the two molecules, discussed extensively in the past. The other mechanism proposes that there are preferential interactions between base pairs of the same composition. We introduce a model, built on this second mechanism, where both thermal stretching and twisting fluctuations are included, as well as the base pair specific helix distortions. Firstly, we consider an approximation for weak pairing interactions, or short molecules. This yields a dependence of the energy on the square root of the molecular length, which could explain recent experimental data. However, analysis suggests that this approximation is no longer valid at large DNA lengths. In a second approximation, for long molecules, we define two adaptation lengths for twisting and stretching, over which the pairing interaction can limit the accumulation of helix disorder. When the pairing interaction is sufficiently strong, both adaptation lengths are finite; however, as we reduce pairing strength, the stretching adaptation length remains finite but the torsional one becomes infinite. This second state persists to arbitrarily weak values of the pairing strength; suggesting that, if the molecules are long enough, the pairing energy scales as length. To probe differences between the two pairing mechanisms, we also construct a model of similar form. However, now, pairing between identical sequences solely relies on the intrinsic helix distortion patterns. Between the two models, we see interesting qualitative differences. We discuss our findings, and suggest new work to distinguish between the two mechanisms.

  8. Effective noninvasive zygosity determination by maternal plasma target region sequencing.

    Directory of Open Access Journals (Sweden)

    Jing Zheng

    Full Text Available BACKGROUND: Currently very few noninvasive molecular genetic approaches are available to determine zygosity for twin pregnancies in clinical laboratories. This study aimed to develop a novel method to determine zygosity by using maternal plasma target region sequencing. METHODS: We constructed a statistic model to calculate the possibility of each zygosity type using likelihood ratios ( Li and empirical dynamic thresholds targeting at 4,524 single nucleotide polymorphisms (SNPs loci on 22 autosomes. Then two dizygotic (DZ twin pregnancies,two monozygotic (MZ twin pregnancies and two singletons were recruited to evaluate the performance of our novel method. Finally we estimated the sensitivity and specificity of the model in silico under different cell-free fetal DNA (cff-DNA concentration and sequence depth. RESULTS/CONCLUSIONS: We obtained 8.90 Gbp sequencing data on average for six clinical samples. Two samples were classified as DZ with L values of 1.891 and 1.554, higher than the dynamic DZ cut-off values of 1.162 and 1.172, respectively. Another two samples were judged as MZ with 0.763 and 0.784 of L values, lower than the MZ cut-off values of 0.903 and 0.918. And the rest two singleton samples were regarded as MZ twins, with L values of 0.639 and 0.757, lower than the MZ cut-off values of 0.921 and 0.799. In silico, the estimated sensitivity of our noninvasive zygosity determination was 99.90% under 10% total cff-DNA concentration with 2 Gbp sequence data. As the cff-DNA concentration increased to 15%, the specificity was as high as 97% with 3.50 Gbp sequence data, much higher than 80% with 10% cff-DNA concentration. SIGNIFICANCE: This study presents the feasibility to noninvasively determine zygosity of twin pregnancy using target region sequencing, and illustrates the sensitivity and specificity under various detecting condition. Our method can act as an alternative approach for zygosity determination of twin pregnancies in clinical

  9. An accurate clone-based haplotyping method by overlapping pool sequencing.

    Science.gov (United States)

    Li, Cheng; Cao, Changchang; Tu, Jing; Sun, Xiao

    2016-07-08

    Chromosome-long haplotyping of human genomes is important to identify genetic variants with differing gene expression, in human evolution studies, clinical diagnosis, and other biological and medical fields. Although several methods have realized haplotyping based on sequencing technologies or population statistics, accuracy and cost are factors that prohibit their wide use. Borrowing ideas from group testing theories, we proposed a clone-based haplotyping method by overlapping pool sequencing. The clones from a single individual were pooled combinatorially and then sequenced. According to the distinct pooling pattern for each clone in the overlapping pool sequencing, alleles for the recovered variants could be assigned to their original clones precisely. Subsequently, the clone sequences could be reconstructed by linking these alleles accordingly and assembling them into haplotypes with high accuracy. To verify the utility of our method, we constructed 130 110 clones in silico for the individual NA12878 and simulated the pooling and sequencing process. Ultimately, 99.9% of variants on chromosome 1 that were covered by clones from both parental chromosomes were recovered correctly, and 112 haplotype contigs were assembled with an N50 length of 3.4 Mb and no switch errors. A comparison with current clone-based haplotyping methods indicated our method was more accurate. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction.

    Science.gov (United States)

    Khurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo-Yu; Bensmail, Halima; Mall, Raghvendra

    2018-03-15

    Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning based protein solubility predictor. The backbone of our framework is a Convolutional Neural Network (CNN) that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. DeepSol's best performing models and results are publicly deposited at https://doi. org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). skhurana@mit.edu and rmall@hbku.edu.qa. Supplementary data are available at Bioinformatics online.

  11. Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers.

    Science.gov (United States)

    Cao, Chang-Chang; Li, Cheng; Sun, Xiao

    2014-06-17

    Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost. Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5-97.9% variants with the variant frequency ranging from 0.5 to 1.5%. Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing

  12. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    Directory of Open Access Journals (Sweden)

    Brejnev Muhizi Muhire

    Full Text Available The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV. There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT, a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms.

  13. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    Science.gov (United States)

    Muhire, Brejnev Muhizi; Varsani, Arvind; Martin, Darren Patrick

    2014-01-01

    The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).

  14. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  15. Effect of sequences of ozone and nitrogen dioxide on plant dry ...

    African Journals Online (AJOL)

    Ozone (O3) is the most important gaseous air pollutant in the world because of its adverse effects on vegetation in general and crop plants in particular. Since nitrogen dioxide (NO2) is a precursor of ozone, studying the implication of sequences of these two gases is very important. Hence, the effects of sequences of ...

  16. effect of sequences of ozone and nitrogen dioxide on plant dry

    African Journals Online (AJOL)

    Prof. Adipala Ekwamu

    Ozone (O3) is the most important gaseous air pollutant in the world because of its adverse effects on vegetation in general and crop plants in particular. Since nitrogen dioxide (NO2) is a precursor of ozone, studying the implication of sequences of these two gases is very important. Hence, the effects of sequences of ...

  17. Effect of sequences of ozone and nitrogen dioxide on chlorophyll ...

    African Journals Online (AJOL)

    The sequences involved different combinations of exposures to NO2 from 06:00 to 10:00h and/or 18:00 to 22:00hr and O3 from 10:00 to 18:00hr. Relative to the control, early and early + late NO2 resulted in stimulations of quantum yield (Y) and photochemical quenching (qP), with late NO2 resulting in little or no change.

  18. A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures.

    Science.gov (United States)

    Bertolini, Francesca; Ghionda, Marco Ciro; D'Alessandro, Enrico; Geraci, Claudia; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine) for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon) as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43%) in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97) and lower for avian species (0.70). PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.

  19. OrchidBase: a collection of sequences of the transcriptome derived from orchids.

    Science.gov (United States)

    Fu, Chih-Hsiung; Chen, Yun-Wen; Hsiao, Yu-Yun; Pan, Zhao-Jun; Liu, Zhong-Jian; Huang, Yueh-Min; Tsai, Wen-Chieh; Chen, Hong-Hwa

    2011-02-01

    Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est.

  20. Negative Sequence Droop Method based Hierarchical Control for Low Voltage Ride-Through in Grid-Interactive Microgrids

    DEFF Research Database (Denmark)

    Zhao, Xin; Firoozabadi, Mehdi Savaghebi; Quintero, Juan Carlos Vasquez

    2015-01-01

    In highly microgrid (MG) integrated distribution systems, problems such as a sudden cut out of the MGs due to grid faults may lead to adverse effects to the grid. As a consequence, ancillary services provided by MGs are preferred since it can make the MG a contributor to ride through the faults....... In this paper, a voltage support strategy based on negative sequence droop control, which regulate the positive/negative sequence active and reactive power flow by means of sending proper voltage reference to the inner control loop, is proposed for the grid connected MGs to ride through voltage sags under...... complex line impedance conditions. In this case, the MGs should inject a certain amount of positive and negative sequence power to the grid so that the voltage quality at load side can be maintained at a satisfied level. A two layer hierarchical control strategy is proposed in this paper. The primary...

  1. Discrepancy between Hepatitis C Virus Genotypes and NS4-Based Serotypes: Association with Their Subgenomic Sequences

    Directory of Open Access Journals (Sweden)

    Nan Nwe Win

    2017-01-01

    Full Text Available Determination of hepatitis C virus (HCV genotypes plays an important role in the direct-acting agent era. Discrepancies between HCV genotyping and serotyping assays are occasionally observed. Eighteen samples with discrepant results between genotyping and serotyping methods were analyzed. HCV serotyping and genotyping were based on the HCV nonstructural 4 (NS4 region and 5′-untranslated region (5′-UTR, respectively. HCV core and NS4 regions were chosen to be sequenced and were compared with the genotyping and serotyping results. Deep sequencing was also performed for the corresponding HCV NS4 regions. Seventeen out of 18 discrepant samples could be sequenced by the Sanger method. Both HCV core and NS4 sequences were concordant with that of genotyping in the 5′-UTR in all 17 samples. In cloning analysis of the HCV NS4 region, there were several amino acid variations, but each sequence was much closer to the peptide with the same genotype. Deep sequencing revealed that minor clones with different subgenotypes existed in two of the 17 samples. Genotyping by genome amplification showed high consistency, while several false reactions were detected by serotyping. The deep sequencing method also provides accurate genotyping results and may be useful for analyzing discrepant cases. HCV genotyping should be correctly determined before antiviral treatment.

  2. Incorporation of guanosine gels into sieving matrices for length- and sequence-based separation of DNA in capillary electrophoresis.

    Science.gov (United States)

    Dong, Yingying; McGown, Linda B

    2011-05-01

    Sieving gels are used in capillary gel electrophoresis to resolve DNA strands of different lengths. For complex samples, however, such as those encountered in metagenomic analysis of microbial communities or biofilms, length-based separation may mask the true genetic diversity of the community since different organisms may contribute same-length DNA with different sequences. There is a need, therefore, for DNA separations based on both the length and sequence. Previous work has demonstrated the ability of guanosine gels (G-gels) to separate four single-stranded DNA 76-mers that differ by only a few A/G base substitutions. The goal of the present work is to determine whether G-gels could be combined with commercial sieving gels in order to simultaneously separate DNA based on both length and sequence. The results are given for the four 76-mers and for a standard dsDNA ladder. Commercial sieving gels were used alone and in combination with G-gels. For the 76-mers, the combined medium was less efficient than the G-gel alone but was able to achieve partial resolution. The combined medium was at least as effective as the sieving gel alone at resolving the denatured DNA ladder and showed indications of sequence-based resolution as well, as supported by MALDI-MS. The results show that the combined sieving gel/G-gel medium retains the selectivity of the individual media, providing a promising approach to simultaneous length- and sequence-based DNA separation for metagenomic analysis of complex systems. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  4. Improving protein structure similarity searches using domain boundaries based on conserved sequence information

    Directory of Open Access Journals (Sweden)

    Madej Tom

    2009-05-01

    Full Text Available Abstract Background The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST, the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD would affect the domain comparisons and structure similarity search performance of VAST. Results Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families. Our analysis indicates that domain boundaries disagree on roughly 8% of protein chains in the medium redundancy subset of the Molecular Modeling Database (MMDB. These conflicting sequence based domain boundaries perform slightly better than structure domains in structure similarity searches, and there are interesting cases when structure similarity search performance is markedly improved. Conclusion Structure similarity searches using domain boundaries based on conserved sequence information can provide an additional method for investigators to identify interesting similarities between proteins with known structures. Because of the improvement in performance of structure similarity searches using sequence domain boundaries, we are in the process of implementing their inclusion into the VAST search and MMDB resources in the NCBI Entrez system.

  5. Sequence Complexity Effects on Speech Production in Healthy Speakers and Speakers with Hypokinetic or Ataxic Dysarthria

    Science.gov (United States)

    Reilly, Kevin J.; Spencer, Kristie A.

    2013-01-01

    The present study investigated the effects of sequence complexity, defined in terms of phonemic similarity and phonotoactic probability, on the timing and accuracy of serial ordering for speech production in healthy speakers and speakers with either hypokinetic or ataxic dysarthria. Sequences were comprised of strings of consonant-vowel (CV) syllables with each syllable containing the same vowel, /a/, paired with a different consonant. High complexity sequences contained phonemically similar consonants, and sounds and syllables that had low phonotactic probabilities; low complexity sequences contained phonemically dissimilar consonants and high probability sounds and syllables. Sequence complexity effects were evaluated by analyzing speech error rates and within-syllable vowel and pause durations. This analysis revealed that speech error rates were significantly higher and speech duration measures were significantly longer during production of high complexity sequences than during production of low complexity sequences. Although speakers with dysarthria produced longer overall speech durations than healthy speakers, the effects of sequence complexity on error rates and speech durations were comparable across all groups. These findings indicate that the duration and accuracy of processes for selecting items in a speech sequence is influenced by their phonemic similarity and/or phonotactic probability. Moreover, this robust complexity effect is present even in speakers with damage to subcortical circuits involved in serial control for speech. PMID:24146997

  6. Sequence complexity effects on speech production in healthy speakers and speakers with hypokinetic or ataxic dysarthria.

    Directory of Open Access Journals (Sweden)

    Kevin J Reilly

    Full Text Available The present study investigated the effects of sequence complexity, defined in terms of phonemic similarity and phonotoactic probability, on the timing and accuracy of serial ordering for speech production in healthy speakers and speakers with either hypokinetic or ataxic dysarthria. Sequences were comprised of strings of consonant-vowel (CV syllables with each syllable containing the same vowel, /a/, paired with a different consonant. High complexity sequences contained phonemically similar consonants, and sounds and syllables that had low phonotactic probabilities; low complexity sequences contained phonemically dissimilar consonants and high probability sounds and syllables. Sequence complexity effects were evaluated by analyzing speech error rates and within-syllable vowel and pause durations. This analysis revealed that speech error rates were significantly higher and speech duration measures were significantly longer during production of high complexity sequences than during production of low complexity sequences. Although speakers with dysarthria produced longer overall speech durations than healthy speakers, the effects of sequence complexity on error rates and speech durations were comparable across all groups. These findings indicate that the duration and accuracy of processes for selecting items in a speech sequence is influenced by their phonemic similarity and/or phonotactic probability. Moreover, this robust complexity effect is present even in speakers with damage to subcortical circuits involved in serial control for speech.

  7. REMA: A computer-based mapping tool for analysis of restriction sites in multiple DNA sequences.

    Science.gov (United States)

    Szubert, Jan; Reiff, Caroline; Thorburn, Andrew; Singh, Brajesh K

    2007-05-01

    REMA is an interactive web-based program which predicts endonuclease cut sites in DNA sequences. It analyses multiple sequences simultaneously and predicts the number and size of fragments as well as provides restriction maps. The users can select single or paired combinations of all commercially available enzymes. Additionally, REMA permits prediction of multiple sequence terminal fragment sizes and suggests suitable restriction enzymes for maximally discriminatory results. REMA is an easy to use, web based program which will have a wide application in molecular biology research. REMA is written in Perl and is freely available for non-commercial use. Detailed information on installation can be obtained from Jan Szubert (jan.szubert@gmail.com) and the web based application is accessible on the internet at the URL http://www.macaulay.ac.uk/rema b.singh@macaulay.ac.uk.

  8. Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides

    Directory of Open Access Journals (Sweden)

    Yu Zu-Guo

    2008-02-01

    Full Text Available Abstract Background Promoter region plays an important role in determining where the transcription of a particular gene should be initiated. Computational prediction of eukaryotic Pol II promoter sequences is one of the most significant problems in sequence analysis. Existing promoter prediction methods are still far from being satisfactory. Results We attempt to recognize the human Pol II promoter sequences from the non-promoter sequences which are made up of exon and intron sequences. Four methods are used: two kinds of multifractal analysis performed on the numeric sequences obtained from the dinucleotide free energy, Z curve analysis and global descriptor of the promoter/non-promoter primary sequences. A total of 141 parameters are extracted from these methods and categorized into seven groups (methods. They are used to generate certain spaces and then each promoter/non-promoter sequence is represented by a point in the corresponding space. All the 120 possible combinations of the seven methods are tested. Based on Fisher's linear discriminant algorithm, with a relatively smaller number of parameters (96 and 117, we get satisfactory discriminant accuracies. Particularly, in the case of 117 parameters, the accuracies for the training and test sets reach 90.43% and 89.79%, respectively. A comparison with five other existing methods indicates that our methods have a better performance. Using the global descriptor method (36 parameters, 17 of the 18 experimentally verified promoter sequences of human chromosome 22 are correctly identified. Conclusion The high accuracies achieved suggest that the methods of this paper are useful for understanding the difficult problem of promoter prediction.

  9. DNA interaction with platinum-based cytostatics revealed by DNA sequencing.

    Science.gov (United States)

    Smerkova, Kristyna; Vaculovic, Tomas; Vaculovicova, Marketa; Kynicky, Jindrich; Brtnicky, Martin; Eckschlager, Tomas; Stiborova, Marie; Hubalek, Jaromir; Adam, Vojtech

    2017-12-15

    The main mechanism of action of platinum-based cytostatic drugs - cisplatin, oxaliplatin and carboplatin - is the formation of DNA cross-links, which restricts the transcription due to the disability of DNA to enter the active site of the polymerase. The polymerase chain reaction (PCR) was employed as a simplified model of the amplification process in the cell nucleus. PCR with fluorescently labelled dideoxynucleotides commonly employed for DNA sequencing was used to monitor the effect of platinum-based cytostatics on DNA in terms of decrease in labeling efficiency dependent on a presence of the DNA-drug cross-link. It was found that significantly different amounts of the drugs - cisplatin (0.21 μg/mL), oxaliplatin (5.23 μg/mL), and carboplatin (71.11 μg/mL) - were required to cause the same quenching effect (50%) on the fluorescent labelling of 50 μg/mL of DNA. Moreover, it was found that even though the amounts of the drugs was applied to the reaction mixture differing by several orders of magnitude, the amount of incorporated platinum, quantified by inductively coupled plasma mass spectrometry, was in all cases at the level of tenths of μg per 5 μg of DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Completion of HLA protein sequences by automated homology-based nearest-neighbor extrapolation of HLA database sequences

    NARCIS (Netherlands)

    Geneugelijk, K; Niemann, M; de Hoop, T; Spierings, E

    2016-01-01

    The IMGT/HLA database contains every publicly available HLA sequence. However, most of these HLA protein sequences are restricted to the alpha-1/alpha-2 domain for HLA class-I and alpha-1/beta-1 domain for HLA class-II. Nevertheless, also polymorphism outside these domains may play a role in

  11. Comparison of the effects of the CHESS sequence and the SPAIR sequence for fat saturation

    Science.gov (United States)

    Dong, Kyung-Rae; Goo, Eun-Hoe; Kweon, Dae-Cheol; Chung, Woon-Kwan; Lee, Jong-Woong

    2013-06-01

    This study compared the abilities of the chemical-shift selective saturation(CHESS) and the spectrally-adiabatic inversion recovery (SPAIR) fat-saturation techniques to resolve the recent problems in fat saturation caused by areas of changing volume such as the head and the neck and by metal artifacts when T1 fat-saturation techniques representing the anatomical images and T2 fat-saturation techniques representing pathological images are used. To compare the abilities of CHESS and SPAIR, we acquired images of the head and the neck and of the pelvis, and we compared the contrast-to-noise ratios (CNRs) and the signal-to-noise ratios (SNRs) of the signals from the flexed body parts. Images were taken of the abdomens, heads and necks, and pelvises of 15 men and 15 women (30 in total). In all scanning techniques, the SNRs and the CNRs were calculated based on a quantitative analysis method with a view to obtaining uniform data. According to the study results, the CNRs of the SPAIR and the CHESS techniques for the pelvis in the T1-weighted image were 55.10 and 67.23, respectively. The SNRs of the SPAIR technique were70.61 for muscle and 15.50 for fat whereas the SNRs of the CHESS technique were 79.23 for muscle and 12.00 for fat. For the pelvis in the T2-weighted image, the CNRs of the SPAIR and the CHESS technique were 12.50 and 16.66, respectively. The SNRs of the SPAIR technique were 16.98 for muscle and 5.14 for fat. In contrast, the SNRs of the CHESS technique were 27.90 for muscle and 11.23 for fat. Consequently, the signal intensity was higher in the CHESS than in the SPAIR technique. Nevertheless, with regard to the clinical usefulness, the image quality was higher in the SPAIR technique than in the CHESS technique.

  12. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  13. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  14. Transcriptome analysis for Caenorhabditis elegans based on novel expressed sequence tags

    Directory of Open Access Journals (Sweden)

    Moerman Donald G

    2008-07-01

    Full Text Available Abstract Background We have applied a high-throughput pyrosequencing technology for transcriptome profiling of Caenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data for expressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts and alternative splice variants that could be developmentally specific to the first larval stage. This work also demonstrates the successful and efficient application of a next generation sequencing methodology. Results We have generated over 30 million bases of novel expressed sequence tags from first larval stage worms utilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequenced expressed sequence tags map completely or partially to genomic regions where there are no annotated genes or splice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which map to intergenic (around 1000 and intronic regions (around 580, may represent novel transcribed regions, such as unannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressed sequence tags, which map across intron-exon boundaries (around 300, indicate possible alternative splice sites, while expressed sequence tags, which map near the ends of known transcripts (around 600, suggest extension of the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequence tags, which are well conserved across different nematode species, are likely to represent non-coding RNAs. Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stage worms, in order to predict novel transcripts that might be specifically or predominantly expressed in the first larval stage. Conclusion We have demonstrated the use of a high-throughput sequencing methodology to efficiently

  15. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    Directory of Open Access Journals (Sweden)

    Xin Yi Ng

    2015-01-01

    Full Text Available This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM- LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.

  16. Use of H19 Gene Regulatory Sequences in DNA-Based Therapy for Pancreatic Cancer

    Directory of Open Access Journals (Sweden)

    V. Scaiewicz

    2010-01-01

    Full Text Available Pancreatic cancer is the eighth most common cause of death from cancer in the world, for which palliative treatments are not effective and frequently accompanied by severe side effects. We propose a DNA-based therapy for pancreatic cancer using a nonviral vector, expressing the diphtheria toxin A chain under the control of the H19 gene regulatory sequences. The H19 gene is an oncofetal RNA expressed during embryo development and in several types of cancer. We tested the expression of H19 gene in patients, and found that 65% of human pancreatic tumors analyzed showed moderated to strong expression of the gene. In vitro experiments showed that the vector was effective in reducing Luciferase protein activity on pancreatic carcinoma cell lines. In vivo experiment results revealed tumor growth arrest in different animal models for pancreatic cancer. Differences in tumor size between control and treated groups reached a 75% in the heterotopic model (P=.037 and 50% in the orthotopic model (P=.007. In addition, no visible metastases were found in the treated group of the orthotopic model. These results indicate that the treatment with the vector DTA-H19 might be a viable new therapeutic option for patients with unresectable pancreatic cancer.

  17. Quasi-Coherent Noise Jamming to LFM Radar Based on Pseudo-random Sequence Phase-modulation

    Directory of Open Access Journals (Sweden)

    N. Tai

    2015-12-01

    Full Text Available A novel quasi-coherent noise jamming method is proposed against linear frequency modulation (LFM signal and pulse compression radar. Based on the structure of digital radio frequency memory (DRFM, the jamming signal is acquired by the pseudo-random sequence phase-modulation of sampled radar signal. The characteristic of jamming signal in time domain and frequency domain is analyzed in detail. Results of ambiguity function indicate that the blanket jamming effect along the range direction will be formed when jamming signal passes through the matched filter. By flexible controlling the parameters of interrupted-sampling pulse and pseudo-random sequence, different covering distances and jamming effects will be achieved. When the jamming power is equivalent, this jamming obtains higher process gain compared with non-coherent jamming. The jamming signal enhances the detection threshold and the real target avoids being detected. Simulation results and circuit engineering implementation validate that the jamming signal covers real target effectively.

  18. Taxonomy and phylogeny of the genus citrus based on the nuclear ribosomal dna its region sequence

    International Nuclear Information System (INIS)

    Sun, Y.L.

    2015-01-01

    The genus Citrus (Aurantioideae, Rutaceae) is the sole source of the citrus fruits of commerce showing high economic values. In this study, the taxonomy and phylogeny of Citrus species is evaluated using sequence analysis of the ITS region of nrDNA. This study is based on 26 plants materials belonging to 22 Citrus species having wild, domesticated, and cultivated species. Through DNA alignment of the ITS sequence, ITS1 and ITS2 regions showed relatively high variations of sequence length and nucleotide among these Citrus species. According to previous six-tribe discrimination theory by Swingle and Reece, the grouping in our ITS phylogenetic tree reconstructed by ITS sequences was not related to tribe discrimination but species discrimination. However, the molecular analysis could provide more information on citrus taxonomy. Combined with ITS sequences of other subgenera in then true citrus fruit tree group, the ITS phylogenetic tree indicated subgenera Citrus was monophyletic and nearer to Fortunella, Poncirus, and Clymenia compared to Microcitrus and Eremocitrus. Abundant sequence variations of the ITS region shown in this study would help species identification and tribe differentiation of the genus Citrus. (author)

  19. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  20. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer.

    Science.gov (United States)

    Álvarez-Martos, Isabel; Ferapontova, Elena E

    2017-08-05

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  2. Identification of DNA lesions using a third base pair for amplification and nanopore sequencing

    Science.gov (United States)

    Riedl, Jan; Ding, Yun; Fleming, Aaron M.; Burrows, Cynthia J.

    2015-01-01

    Damage to the genome is implicated in the progression of cancer and stress-induced diseases. DNA lesions exist in low levels, and cannot be amplified by standard PCR because they are frequently strong blocks to polymerases. Here, we describe a method for PCR amplification of lesion-containing DNA in which the site and identity could be marked, copied and sequenced. Critical for this method is installation of either the dNaM or d5SICS nucleotides at the lesion site after processing via the base excision repair process. These marker nucleotides constitute an unnatural base pair, allowing large quantities of marked DNA to be made by PCR amplification. Sanger sequencing confirms the potential for this method to locate lesions by marking, amplifying and sequencing a lesion in the KRAS gene. Detection using the α-hemolysin nanopore is also developed to analyse the markers in individual DNA strands with the potential to identify multiple lesions per strand. PMID:26542210

  3. Multi-modulus algorithm based on global artificial fish swarm intelligent optimization of DNA encoding sequences.

    Science.gov (United States)

    Guo, Y C; Wang, H; Wu, H P; Zhang, M Q

    2015-12-21

    Aimed to address the defects of the large mean square error (MSE), and the slow convergence speed in equalizing the multi-modulus signals of the constant modulus algorithm (CMA), a multi-modulus algorithm (MMA) based on global artificial fish swarm (GAFS) intelligent optimization of DNA encoding sequences (GAFS-DNA-MMA) was proposed. To improve the convergence rate and reduce the MSE, this proposed algorithm adopted an encoding method based on DNA nucleotide chains to provide a possible solution to the problem. Furthermore, the GAFS algorithm, with its fast convergence and global search ability, was used to find the best sequence. The real and imaginary parts of the initial optimal weight vector of MMA were obtained through DNA coding of the best sequence. The simulation results show that the proposed algorithm has a faster convergence speed and smaller MSE in comparison with the CMA, the MMA, and the AFS-DNA-MMA.

  4. STRait Razor: a length-based forensic STR allele-calling tool for use with second generation sequencing data.

    Science.gov (United States)

    Warshauer, David H; Lin, David; Hari, Kumar; Jain, Ravi; Davis, Carey; Larue, Bobby; King, Jonathan L; Budowle, Bruce

    2013-07-01

    Recent studies have demonstrated the capability of second generation sequencing (SGS) to provide coverage of short tandem repeats (STRs) found within the human genome. However, there are relatively few bioinformatic software packages capable of detecting these markers in the raw sequence data. The extant STR-calling tools are sophisticated, but are not always applicable to the analysis of the STR loci commonly used in forensic analyses. STRait Razor is a newly developed Perl-based software tool that runs on the Linux/Unix operating system and is designed to detect forensically-relevant STR alleles in FASTQ sequence data, based on allelic length. It is capable of analyzing STR loci with repeat motifs ranging from simple to complex without the need for extensive allelic sequence data. STRait Razor is designed to interpret both single-end and paired-end data and relies on intelligent parallel processing to reduce analysis time. Users are presented with a number of customization options, including variable mismatch detection parameters, as well as the ability to easily allow for the detection of alleles at new loci. In its current state, the software detects alleles for 44 autosomal and Y-chromosome STR loci. The study described herein demonstrates that STRait Razor is capable of detecting STR alleles in data generated by multiple library preparation methods and two Illumina(®) sequencing instruments, with 100% concordance. The data also reveal noteworthy concepts related to the effect of different preparation chemistries and sequencing parameters on the bioinformatic detection of STR alleles. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  5. Approach to include load sequence effects in the design of an offshore wind turbine substructure

    NARCIS (Netherlands)

    Dragt, R.C.; Allaix, D.L.; Maljaars, J.; Tuitman, J.T.; Salman, Y.; Otheguy, M.E.

    2017-01-01

    Fatigue is one of the main design drivers for offshore wind substructures. Using Fracture Mechanics methods, load sequence effects such as crack growth retardation due to large load peaks can be included in the fatigue damage estimation. Due to the sequence dependency, a method is required that

  6. The Effects of Delayed Reinforcement on Variability and Repetition of Response Sequences

    Science.gov (United States)

    Odum, Amy L.; Ward, Ryan D.; Burke, K. Anne; Barnes, Christopher A.

    2006-01-01

    Four experiments examined the effects of delays to reinforcement on key peck sequences of pigeons maintained under multiple schedules of contingencies that produced variable or repetitive behavior. In Experiments 1, 2, and 4, in the repeat component only the sequence right-right-left-left earned food, and in the vary component four-response…

  7. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    Science.gov (United States)

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly

  8. Sleep and memory consolidation: motor performance and proactive interference effects in sequence learning.

    Science.gov (United States)

    Borragán, Guillermo; Urbain, Charline; Schmitz, Rémy; Mary, Alison; Peigneux, Philippe

    2015-04-01

    That post-training sleep supports the consolidation of sequential motor skills remains debated. Performance improvement and sensitivity to proactive interference are both putative measures of long-term memory consolidation. We tested sleep-dependent memory consolidation for visuo-motor sequence learning using a proactive interference paradigm. Thirty-three young adults were trained on sequence A on Day 1, then had Regular Sleep (RS) or were Sleep Deprived (SD) on the night after learning. After two recovery nights, they were tested on the same sequence A, then had to learn a novel, potentially competing sequence B. We hypothesized that proactive interference effects on sequence B due to the prior learning of sequence A would be higher in the RS condition, considering that proactive interference is an indirect marker of the robustness of sequence A, which should be better consolidated over post-training sleep. Results highlighted sleep-dependent improvement for sequence A, with faster RTs overnight for RS participants only. Moreover, the beneficial impact of sleep was specific to the consolidation of motor but not sequential skills. Proactive interference effects on learning a new material at Day 4 were similar between RS and SD participants. These results suggest that post-training sleep contributes to optimizing motor but not sequential components of performance in visuo-motor sequence learning. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

    KAUST Repository

    Zhang, Zhang

    2010-11-08

    Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.

  10. Context based computational analysis and characterization of ARS consensus sequences (ACS of Saccharomyces cerevisiae genome

    Directory of Open Access Journals (Sweden)

    Vinod Kumar Singh

    2016-09-01

    Full Text Available Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS requires an essential consensus sequence (ACS for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC denoted as ORC-ACS and non-replicating ACS sequences (nrACS, that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  11. Implementation of an RFID-Based Sequencing-Error-Proofing System for Automotive Manufacturing Logistics

    Directory of Open Access Journals (Sweden)

    Yong-Shin Kang

    2018-01-01

    Full Text Available Serialized tracing provides the ability to track and trace the lifecycle of the products and parts. Unlike barcodes, Radio frequency identification (RFID, which is an important building block for internet of things (IoT, does not require a line of sight and has the advantages of recognizing many objects simultaneously and rapidly, and storing more information than barcodes. Therefore, RFID has been used in a variety of application domains such as logistics, distributions, and manufacturing, significantly improving traceability and process efficiency. In this study, we applied RFID to improve the just-in-sequence operation of an automotive inbound logistics process. First, we implemented an RFID-based visibility system for real-time traceability and control of part supply from the production lines of suppliers to the assembly line of a car manufacturer. Second, we developed an RFID-based sequence-error proofing system to avoid accidental line stops due to incorrect part sequencing. The whole system has been successfully installed in a rear-axle inbound logistics process of GM Korea. We achieved a significant amount of cost savings, especially due to the prevention of sequencing errors and part shortages, and the reduction of manual operations. Thorough cost-benefit analysis demonstrates the clear economic feasibility of using RFID technologies for the just-in-sequence inbound logistics in an automobile manufacturing environment.

  12. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  13. Identification of metal ion binding sites based on amino acid sequences

    Science.gov (United States)

    Cao, Xiaoyong; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html. PMID:28854211

  14. Optimal pseudorandom sequence selection for online c-VEP based BCI control applications

    DEFF Research Database (Denmark)

    Isaksen, Jonas L.; Mohebbi, Ali; Puthusserypady, Sadasivan

    2017-01-01

    predictor. Conclusions: The simple and fast method presented in this study as the Accuracy Score, allows c-VEP based BCI systems to support multiple pseudorandom sequences without increase in trial length. This allows for more personalized BCI systems with better performance to be tested without increased...

  15. Neural network predicts sequence of TP53 gene based on DNA chip

    DEFF Research Database (Denmark)

    Spicker, J.S.; Wikman, F.; Lu, M.L.

    2002-01-01

    We have trained an artificial neural network to predict the sequence of the human TP53 tumor suppressor gene based on a p53 GeneChip. The trained neural network uses as input the fluorescence intensities of DNA hybridized to oligonucleotides on the surface of the chip and makes between zero...

  16. Genome-based exome sequencing analysis identifies GYG1, DIS3L ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 96; Issue 6. Genome-based exome sequencing analysis identifies GYG1, DIS3L and DDRGK1 are associated with myocardial infarction in Koreans. JI-YOUNG LEE SANGHOON MOON YUN KYOUNG KIM SANG-HAK LEE BOK-SOO LEE MIN-YOUNG PARK JEONG EUY PARK ...

  17. Nucleic acid sequence-based amplification with oligochromatography for detection of Trypanosoma brucei in clinical samples

    NARCIS (Netherlands)

    Mugasa, Claire M.; Laurent, Thierry; Schoone, Gerard J.; Kager, Piet A.; Lubega, George W.; Schallig, Henk D. F. H.

    2009-01-01

    Molecular tools, such as real-time nucleic acid sequence-based amplification (NASBA) and PCR, have been developed to detect Trypanosoma brucei parasites in blood for the diagnosis of human African trypanosomiasis (HAT). Despite good sensitivity, these techniques are not implemented in HAT control

  18. Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikos; Vinther, Jeppe

    2015-01-01

    time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data...

  19. Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

    Science.gov (United States)

    Braguglia, Kay H.; Jackson, Kanata A.

    2012-01-01

    This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…

  20. Magnetism Teaching Sequences Based on an Inductive Approach for First-Year Thai University Science Students

    Science.gov (United States)

    Narjaikaew, Pattawan; Emarat, Narumon; Arayathanitkul, Kwan; Cowie, Bronwen

    2010-01-01

    The study investigated the impact on student motivation and understanding of magnetism of teaching sequences based on an inductive approach. The study was conducted in large lecture classes. A pre- and post-Conceptual Survey of Electricity and Magnetism was conducted with just fewer than 700 Thai undergraduate science students, before and after…

  1. Method for Generating Pseudorandom Sequences with the Assured Period Based on R-blocks

    Directory of Open Access Journals (Sweden)

    M. A. Ivanov

    2011-03-01

    Full Text Available The article describes the characteristics of a new class of fast-acting pseudorandom number generators, based on the use of stochastic adders or R-blocks. A new method for generating pseudorandom sequences with the assured length of period is offered.

  2. Comparison of ompP5 sequence-based typing and pulsed-filed gel ...

    African Journals Online (AJOL)

    Yomi

    2012-03-15

    Mar 15, 2012 ... In this study, comparison of the outer membrane protein P5 gene (ompP5) sequence-based typing with pulsed-field gel electrophoresis (PFGE) for the genotyping of Haemophilus parasuis, the 15 serovar reference strains and 43 isolates were investigated. When comparing the two methods, 31 ompP5.

  3. Evolution of EF-hand calcium-modulated proteins. III. Exon sequences confirm most dendrograms based on protein sequences: calmodulin dendrograms show significant lack of parallelism

    Science.gov (United States)

    Nakayama, S.; Kretsinger, R. H.

    1993-01-01

    In the first report in this series we presented dendrograms based on 152 individual proteins of the EF-hand family. In the second we used sequences from 228 proteins, containing 835 domains, and showed that eight of the 29 subfamilies are congruent and that the EF-hand domains of the remaining 21 subfamilies have diverse evolutionary histories. In this study we have computed dendrograms within and among the EF-hand subfamilies using the encoding DNA sequences. In most instances the dendrograms based on protein and on DNA sequences are very similar. Significant differences between protein and DNA trees for calmodulin remain unexplained. In our fourth report we evaluate the sequences and the distribution of introns within the EF-hand family and conclude that exon shuffling did not play a significant role in its evolution.

  4. Characterization of background noise in capture-based targeted sequencing data.

    Science.gov (United States)

    Park, Gahee; Park, Joo Kyung; Shin, Seung-Ho; Jeon, Hyo-Jeong; Kim, Nayoung K D; Kim, Yeon Jeong; Shin, Hyun-Tae; Lee, Eunjin; Lee, Kwang Hyuck; Son, Dae-Soon; Park, Woong-Yang; Park, Donghyun

    2017-07-21

    Targeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is therefore essential that errors that constitute baseline noise and impose a practical limit on detection are characterized. In the present study, we systematically evaluate the extent to which errors are incurred during specific steps of the capture-based targeted sequencing process. We removed most sequencing artifacts by filtering out low-quality bases and then analyze the remaining background noise. By recognizing that plasma DNA is naturally fragmented to be of a size comparable to that of mono-nucleosomal DNA, we were able to identify and characterize errors that are specifically associated with acoustic shearing. Two-thirds of C:G > A:T errors and one quarter of C:G > G:C errors were attributed to the oxidation of guanine during acoustic shearing, and this was further validated by comparative experiments conducted under different shearing conditions. The acoustic shearing step also causes A > G and A > T substitutions localized to the end bases of sheared DNA fragments, indicating a probable association of these errors with DNA breakage. Finally, the hybrid selection step contributes to one-third of the remaining C:G > A:T and one-fifth of the C > T errors. The results of this study provide a comprehensive summary of various errors incurred during targeted deep sequencing, and their underlying causes. This information will be invaluable to drive technical improvements in this sequencing method, and may increase the future usage of targeted deep sequencing methods for low-allelic fraction variant detection.

  5. CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L. methylation filtered genomic genespace sequences

    Directory of Open Access Journals (Sweden)

    Spraggins Thomas A

    2007-04-01

    Full Text Available Abstract Background Cowpea [Vigna unguiculata (L. Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI, funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource, and UniProtKB-TrEMBL. Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the

  6. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes

    OpenAIRE

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-01-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determi...

  7. High-resolution definition of the Vibrio cholerae essential gene set with hidden Markov model-based analyses of transposon-insertion sequencing data.

    Science.gov (United States)

    Chao, Michael C; Pritchard, Justin R; Zhang, Yanjia J; Rubin, Eric J; Livny, Jonathan; Davis, Brigid M; Waldor, Matthew K

    2013-10-01

    The coupling of high-density transposon mutagenesis to high-throughput DNA sequencing (transposon-insertion sequencing) enables simultaneous and genome-wide assessment of the contributions of individual loci to bacterial growth and survival. We have refined analysis of transposon-insertion sequencing data by normalizing for the effect of DNA replication on sequencing output and using a hidden Markov model (HMM)-based filter to exploit heretofore unappreciated information inherent in all transposon-insertion sequencing data sets. The HMM can smooth variations in read abundance and thereby reduce the effects of read noise, as well as permit fine scale mapping that is independent of genomic annotation and enable classification of loci into several functional categories (e.g. essential, domain essential or 'sick'). We generated a high-resolution map of genomic loci (encompassing both intra- and intergenic sequences) that are required or beneficial for in vitro growth of the cholera pathogen, Vibrio cholerae. This work uncovered new metabolic and physiologic requirements for V. cholerae survival, and by combining transposon-insertion sequencing and transcriptomic data sets, we also identified several novel noncoding RNA species that contribute to V. cholerae growth. Our findings suggest that HMM-based approaches will enhance extraction of biological meaning from transposon-insertion sequencing genomic data.

  8. Molecular diversification of Trichuris spp. from Sigmodontinae (Cricetidae) rodents from Argentina based on mitochondrial DNA sequences.

    Science.gov (United States)

    Callejón, Rocío; Robles, María Del Rosario; Panei, Carlos Javier; Cutillas, Cristina

    2016-08-01

    A molecular phylogenetic hypothesis is presented for the genus Trichuris based on sequence data from mitochondrial cytochrome c oxidase 1 (cox1) and cytochrome b (cob). The taxa consisted of nine populations of whipworm from five species of Sigmodontinae rodents from Argentina. Bayesian Inference, Maximum Parsimony, and Maximum Likelihood methods were used to infer phylogenies for each gene separately but also for the combined mitochondrial data and the combined mitochondrial and nuclear dataset. Phylogenetic results based on cox1 and cob mitochondrial DNA (mtDNA) revealed three clades strongly resolved corresponding to three different species (Trichuris navonae, Trichuris bainae, and Trichuris pardinasi) showing phylogeographic variation, but relationships among Trichuris species were poorly resolved. Phylogenetic reconstruction based on concatenated sequences had greater phylogenetic resolution for delimiting species and populations intra-specific of Trichuris than those based on partitioned genes. Thus, populations of T. bainae and T. pardinasi could be affected by geographical factors and co-divergence parasite-host.

  9. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...

  10. Security problems for a pseudorandom sequence generator based on the Chen chaotic system

    Science.gov (United States)

    Özkaynak, Fatih; Yavuz, Sırma

    2013-09-01

    Recently, a novel pseudorandom number generator scheme based on the Chen chaotic system was proposed. In this study, we analyze the security weaknesses of the proposed generator. By applying a brute force attack on a reduced key space, we show that 66% of the generated pseudorandom number sequences can be revealed. Executable C# code is given for the proposed attack. The computational complexity of this attack is O(n), where n is the sequence length. Both mathematical proofs and experimental results are presented to support the proposed attack.

  11. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

    Science.gov (United States)

    Garrido-Martín, Diego; Pazos, Florencio

    2018-02-27

    The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.

  12. A Comparison of the Outcomes of Three Early Childhood Programs Based upon Developmental Sequencing.

    Science.gov (United States)

    Nieman, Ronald H.; Gastright, Joseph F.

    The purpose of this study was to compare cognitive effects of two developmentally sequenced preschool curricula with outcomes of a traditional eclectic preschool curriculum. Specifically, pretest and posttest scores for children taught with the Brigance Diagnostic Inventory of Early Development and the Portage Guide to Early Education were…

  13. Haplotag: Software for Haplotype-Based Genotyping-by-Sequencing Analysis

    Directory of Open Access Journals (Sweden)

    Nicholas A. Tinker

    2016-04-01

    Full Text Available Genotyping-by-sequencing (GBS, and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1 operates without a reference genome; (2 can be used in a polyploid species; (3 provides a discovery mode, and a production mode; (4 discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5 reports SNPs as well as haplotype-based genotypes; and (6 provides an intuitive visual “passport” for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species.

  14. Tracing the Spread of Clostridium difficile Ribotype 027 in Germany Based on Bacterial Genome Sequences.

    Directory of Open Access Journals (Sweden)

    Matthias Steglich

    Full Text Available We applied whole-genome sequencing to reconstruct the spatial and temporal dynamics underpinning the expansion of Clostridium difficile ribotype 027 in Germany. Based on re-sequencing of genomes from 57 clinical C. difficile isolates, which had been collected from hospitalized patients at 36 locations throughout Germany between 1990 and 2012, we demonstrate that C. difficile genomes have accumulated sequence variation sufficiently fast to document the pathogen's spread at a regional scale. We detected both previously described lineages of fluoroquinolone-resistant C. difficile ribotype 027, FQR1 and FQR2. Using Bayesian phylogeographic analyses, we show that fluoroquinolone-resistant C. difficile 027 was imported into Germany at least four times, that it had been widely disseminated across multiple federal states even before the first outbreak was noted in 2007, and that it has continued to spread since.

  15. Tracing the Spread of Clostridium difficile Ribotype 027 in Germany Based on Bacterial Genome Sequences.

    Science.gov (United States)

    Steglich, Matthias; Nitsche, Andreas; von Müller, Lutz; Herrmann, Mathias; Kohl, Thomas A; Niemann, Stefan; Nübel, Ulrich

    2015-01-01

    We applied whole-genome sequencing to reconstruct the spatial and temporal dynamics underpinning the expansion of Clostridium difficile ribotype 027 in Germany. Based on re-sequencing of genomes from 57 clinical C. difficile isolates, which had been collected from hospitalized patients at 36 locations throughout Germany between 1990 and 2012, we demonstrate that C. difficile genomes have accumulated sequence variation sufficiently fast to document the pathogen's spread at a regional scale. We detected both previously described lineages of fluoroquinolone-resistant C. difficile ribotype 027, FQR1 and FQR2. Using Bayesian phylogeographic analyses, we show that fluoroquinolone-resistant C. difficile 027 was imported into Germany at least four times, that it had been widely disseminated across multiple federal states even before the first outbreak was noted in 2007, and that it has continued to spread since.

  16. A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.

    Science.gov (United States)

    Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi

    2017-01-01

    Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.

  17. A mapping of an ensemble of mitochondrial sequences for various organisms into 3D space based on the word composition.

    Science.gov (United States)

    Aita, Takuyo; Nishigaki, Koichi

    2012-11-01

    To visualize a bird's-eye view of an ensemble of mitochondrial genome sequences for various species, we recently developed a novel method of mapping a biological sequence ensemble into Three-Dimensional (3D) vector space. First, we represented a biological sequence of a species s by a word-composition vector x(s), where its length [absolute value]x(s)[absolute value] represents the sequence length, and its unit vector x(s)/[absolute value]x(s)[absolute value] represents the relative composition of the K-tuple words through the sequence and the size of the dimension, N=4(K), is the number of all possible words with the length of K. Second, we mapped the vector x(s) to the 3D position vector y(s), based on the two following simple principles: (1) [absolute value]y(s)[absolute value]=[absolute value]x(s)[absolute value] and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). The mitochondrial genome sequences for 311 species, including 177 Animalia, 85 Fungi and 49 Green plants, were mapped into 3D space by using K=7. The mapping was successful because the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.92-0.97). Interestingly, the Animalia kingdom is distributed along a single arc belt (just like the Milky Way on a Celestial Globe), and the Fungi and Green plant kingdoms are distributed in a similar arc belt. These two arc belts intersect at their respective middle regions and form a cross structure just like a jet aircraft fuselage and its wings. This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner. Copyright © 2012 Elsevier Inc. All rights reserved.

  18. Genomic sequencing-based detection of large deletions in Rhodococcus rhodochrous strain B-276.

    Science.gov (United States)

    Saitoh, Seikoh; Aoyama, Hiroaki; Akutsu, Masako; Nakano, Kazuma; Shinzato, Naoya; Matsui, Toru

    2013-09-01

    Bacteria of the genus Rhodococcus (Actinomycetes) have the ability to catabolize various organic compounds and are therefore considered potential genetic resources for applications such as bioremediation. We investigated a next-generation sequencing-based procedure to rapidly identify candidate functional gene(s) from rhodococci on the basis of their frequent genome recombination. The Rhodococcus rhodochrous strain B-276 and its alkene monooxygenase (AMO) gene cluster were the focus of our investigation. Firstly, 2 types of cultures of the R. rhodochrous strain B-276 were prepared, one of which was supplied with propene, which requires AMO genes for its assimilation, whereas the other was supplied with glucose as the sole energy source. The latter culture was anticipated to have a lower gene frequency of AMO genes because of their deletion during cultivation. We then conducted whole genome shotgun sequencing of the genomic DNA extracted from both cultures. Next, all sequence data were pooled and assembled into contiguous sequences (contigs). Finally, the abundance of each contig was quantified in order to detect contigs that were highly biased between the 2 cultures. We identified contigs that were overrepresented by 2 orders of magnitude in the AMO-required culture and successfully identified an AMO gene cluster among these contigs. We propose this procedure as an efficient method for the rapid detection and sequencing of deleted region, which contributes to identification of functional genes in rhodococci. Copyright © 2013 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  19. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    Science.gov (United States)

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  20. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Schierup, M.H.; Jorgensen, F.G.

    2005-01-01

    sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human......-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more...... on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific...

  1. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    Science.gov (United States)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  2. Investigation of next-generation sequencing data of Klebsiella pneumoniae using web-based tools.

    Science.gov (United States)

    Brhelova, Eva; Antonova, Mariya; Pardy, Filip; Kocmanova, Iva; Mayer, Jiri; Racil, Zdenek; Lengerova, Martina

    2017-11-01

    Rapid identification and characterization of multidrug-resistant Klebsiella pneumoniae strains is necessary due to the increasing frequency of severe infections in patients. The decreasing cost of next-generation sequencing enables us to obtain a comprehensive overview of genetic information in one step. The aim of this study is to demonstrate and evaluate the utility and scope of the application of web-based databases to next-generation sequenced (NGS) data. The whole genomes of 11 clinical Klebsiella pneumoniae isolates were sequenced using Illumina MiSeq. Selected web-based tools were used to identify a variety of genetic characteristics, such as acquired antimicrobial resistance genes, multilocus sequence types, plasmid replicons, and identify virulence factors, such as virulence genes, cps clusters, urease-nickel clusters and efflux systems. Using web-based tools hosted by the Center for Genomic Epidemiology, we detected resistance to 8 main antimicrobial groups with at least 11 acquired resistance genes. The isolates were divided into eight sequence types (ST11, 23, 37, 323, 433, 495 and 562, and a new one, ST1646). All of the isolates carried replicons of large plasmids. Capsular types, virulence factors and genes coding AcrAB and OqxAB efflux pumps were detected using BIGSdb-Kp, whereas the selected virulence genes, identified in almost all of the isolates, were detected using CLC Genomic Workbench software. Applying appropriate web-based online tools to NGS data enables the rapid extraction of comprehensive information that can be used for more efficient diagnosis and treatment of patients, while data processing is free of charge, easy and time-efficient.

  3. Comparison of Enzymes / Non-Enzymes Proteins Classification Models Based on 3D, Composition, Sequences and Topological Indices

    OpenAIRE

    Munteanu, Cristian Robert

    2014-01-01

    Comparison of Enzymes / Non-Enzymes Proteins Classification Models Based on 3D, Composition, Sequences and Topological Indices, German Conference on Bioinformatics (GCB), Potsdam, Germany (September, 2007)

  4. Uncertainty quantification of phase-based motion estimation on noisy sequence of images

    Science.gov (United States)

    Sarrafi, Aral; Mao, Zhu

    2017-04-01

    Optical measurement and motion estimation based on the acquired sequence of images is one of the most recent sensing techniques developed in the last decade or so. As a modern non-contact sensing technique, motion estimation and optical measurements provide a full-field awareness without any mass loading or change of stiffness in structures, which is unavoidable using other conventional transducers (e.g. accelerometers, strain gauges, and LVDTs). Among several motion estimation techniques prevalent in computer vision, phase-based motion estimation is one of the most reliable and accurate methods. However, contamination of the sequence of images with numerous sources of noise is inevitable, and the performance of the phase-based motion estimation could be affected due to the lighting changes, image acquisition noise, and the camera's intrinsic sensor noise. Within this context, the uncertainty quantification (UQ) of the phase-based motion estimation (PME) has been investigated in this paper. Based on a normality assumption, a framework has been provided in order to characterize the propagation of the uncertainty from the acquired images to the estimated motion. The established analytical solution is validated via Monte-Carlo simulations using a set of simulation data. The UQ model in the paper is able to predict the order statistics of the noise influence, in which the uncertainty bounds of the estimated motion are given, after processing the contaminated sequence of images.

  5. Genotype, phenotype and in silico pathogenicity analysis of HEXB mutations: Panel based sequencing for differential diagnosis of gangliosidosis.

    Science.gov (United States)

    Mahdieh, Nejat; Mikaeeli, Sahar; Tavasoli, Ali Reza; Rezaei, Zahra; Maleki, Majid; Rabbani, Bahareh

    2018-04-01

    Gangliosidosis is an inherited metabolic disorder causing neurodegeneration and motor regression. Preventive diagnosis is the first choice for the affected families due to lack of straightforward therapy. Genetic studies could confirm the diagnosis and help families for carrier screening and prenatal diagnosis. An update of HEXB gene variants concerning genotype, phenotype and in silico analysis are presented. Panel based next generation sequencing and direct sequencing of four cases were performed to confirm the clinical diagnosis and for reproductive planning. Bioinformatic analyses of the HEXB mutation database were also performed. Direct sequencing of HEXA and HEXB genes showed recurrent homozygous variants at c.509G>A (p.Arg170Gln) and c.850C>T (p.Arg284Ter), respectively. A novel variant at c.416T>A (p.Leu139Gln) was identified in the GLB1 gene. Panel based next generation sequencing was performed for an undiagnosed patient which showed a novel mutation at c.1602C>A (p.Cys534Ter) of HEXB gene. Bioinformatic analysis of the HEXB mutation database showed 97% consistency of in silico genotype analysis with the phenotype. Bioinformatic analysis of the novel variants predicted to be disease causing. In silico structural and functional analysis of the novel variants showed structural effect of HEXB and functional effect of GLB1 variants which would provide fast analysis of novel variants. Panel based studies could be performed for overlapping symptomatic patients. Consequently, genetic testing would help affected families for patients' management, carrier detection, and family planning's. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Effect of glass hybridization and staking sequence on mechanical ...

    Indian Academy of Sciences (India)

    Abstract. The interest in fibre-reinforced polymer composites is growing rapidly due to its high performance in terms of mechanical properties, significant processing advantages, excellent chemical resistance, low cost, and low density. The development of composite materials based on the reinforcement of two or more fibre ...

  7. Haematobia irritans dataset of raw sequence reads from Illumina-based transcriptome sequencing of specific tissues and life stages

    Science.gov (United States)

    Illumina HiSeq technology was used to sequence the transcriptome from various dissected tissues and life stages from the horn fly, Haematobia irritans. These samples include eggs (0, 2, 4, and 9 hours post-oviposition), adult fly gut, adult fly legs, adult fly malpighian tubule, adult fly ovary, adu...

  8. Feasibility of mini-sequencing schemes based on nucleotide polymorphisms for microbial identification and population analyses.

    Science.gov (United States)

    Araujo, Ricardo; Eusebio, Nadia; Caramalho, Rita

    2015-03-01

    Practical schemes based on single nucleotide polymorphisms (SNP) have been proposed as alternatives to simplify and replace the molecular methodologies based on the extensive sequencing analysis of genes. SNaPshot mini-sequencing has been progressively experienced during the last decade and represents a fast and robust strategy to analyze critical polymorphisms. Such assays have been proposed to characterize some bacteria and microbial eukaryotes, and its feasibility was now reviewed in the present manuscript. The mini-sequencing schemes showed high discriminatory power and competence for identification of microorganisms, but some specificity errors were still found, particularly for species of the Burkholderia cepacia complex and mycobacteria. SNP assays designed for other goals, e.g., comparison of strains, detection of serotypes, virulence, epidemic, and phylogenetic-related subgroups of isolates, can be very useful by facilitating the investigation of large collections of isolates. The next-generation of SNP assays might consider the inclusion of large number of markers to fully characterize microbial taxonomy and strains; nevertheless, these new technologies are still prone to errors and can largely benefit from integration with well-established mini-sequencing assays. Newly proposed molecular tools should be systematically tested in collections of isolates with high indexes of diversity and guarantee interlaboratorial validation.

  9. Multiplex amplicon sequencing for microbe identification in community-based culture collections.

    Science.gov (United States)

    Armanhi, Jaderson Silveira Leite; de Souza, Rafael Soares Correa; de Araújo, Laura Migliorini; Okura, Vagner Katsumi; Mieczkowski, Piotr; Imperial, Juan; Arruda, Paulo

    2016-07-12

    Microbiome analysis using metagenomic sequencing has revealed a vast microbial diversity associated with plants. Identifying the molecular functions associated with microbiome-plant interaction is a significant challenge concerning the development of microbiome-derived technologies applied to agriculture. An alternative to accelerate the discovery of the microbiome benefits to plants is to construct microbial culture collections concomitant with accessing microbial community structure and abundance. However, traditional methods of isolation, cultivation, and identification of microbes are time-consuming and expensive. Here we describe a method for identification of microbes in culture collections constructed by picking colonies from primary platings that may contain single or multiple microorganisms, which we named community-based culture collections (CBC). A multiplexing 16S rRNA gene amplicon sequencing based on two-step PCR amplifications with tagged primers for plates, rows, and columns allowed the identification of the microbial composition regardless if the well contains single or multiple microorganisms. The multiplexing system enables pooling amplicons into a single tube. The sequencing performed on the PacBio platform led to recovery near-full-length 16S rRNA gene sequences allowing accurate identification of microorganism composition in each plate well. Cross-referencing with plant microbiome structure and abundance allowed the estimation of diversity and abundance representation of microorganism in the CBC.

  10. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    Directory of Open Access Journals (Sweden)

    Francisco Alexandre P

    2012-05-01

    Full Text Available Abstract Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

  11. Readjoiner: a fast and memory efficient string graph-based sequence assembler

    Directory of Open Access Journals (Sweden)

    Gonnella Giorgio

    2012-05-01

    Full Text Available Abstract Background Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. Results Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only. Conclusions Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.

  12. Sequence-based typing of HLA-DQA1: comprehensive approach showed molecular heterogeneity.

    Science.gov (United States)

    Voorter, C E M; Lee, K W; Smillie, D; Tilanus, M G J; van den Berg-Loonen, E M

    2007-04-01

    Within the human leukocyte antigen-DQA1 workshop project the level of molecular heterogeneity of the DQA1 gene was investigated. An improved sequence-based typing protocol was used, enabling analysis of the complete coding sequence, comprising exons 1-4. The participating laboratories implemented the amplification and sequencing primers in their own sequence-based typing approach. The method proved to be sufficiently robust to handle the differences in protocols. All reference samples used for validation were correctly typed for DQA1 by all participating laboratories. Three different populations with a total of 736 individuals were investigated: a population of Korean origin (n= 467), a British Caucasian (n= 114), and a Dutch Caucasian (n= 155) population. Sixteen of the known 28 DQA1 alleles were detected and six new alleles were identified. All novel alleles showed a nucleotide substitution outside exon 2. Comparison of the calculated allele frequencies revealed major differences between the Korean and the Caucasian populations but also between Dutch and British Caucasians. A tight association between DQA1 and DRB1/DQB1 alleles was observed in all three populations.

  13. Reproducible Analysis of Sequencing-Based RNA Structure Probing Data with User-Friendly Tools.

    Science.gov (United States)

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikolaos; Vinther, Jeppe

    2015-01-01

    RNA structure-probing data can improve the prediction of RNA secondary and tertiary structure and allow structural changes to be identified and investigated. In recent years, massive parallel sequencing has dramatically improved the throughput of RNA structure probing experiments, but at the same time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data, we have made a collection of tools, which allow raw sequencing reads to be converted to normalized probing values using different published strategies. In addition, we also provide tools for visualization of the probing data in the UCSC Genome Browser and for converting RNA coordinates to genomic coordinates and vice versa. The collection is implemented as functions in the R statistical environment and as tools in the Galaxy platform, making them easily accessible for the scientific community. We demonstrate the usefulness of the collection by applying it to the analysis of sequencing-based hydroxyl radical probing data and comparing different normalization strategies. © 2015 Elsevier Inc. All rights reserved.

  14. Evaluation of the Bacterial Diversity in the Human Tongue Coating Based on Genus-Specific Primers for 16S rRNA Sequencing

    Directory of Open Access Journals (Sweden)

    Beili Sun

    2017-01-01

    Full Text Available The characteristics of tongue coating are very important symbols for disease diagnosis in traditional Chinese medicine (TCM theory. As a habitat of oral microbiota, bacteria on the tongue dorsum have been proved to be the cause of many oral diseases. The high-throughput next-generation sequencing (NGS platforms have been widely applied in the analysis of bacterial 16S rRNA gene. We developed a methodology based on genus-specific multiprimer amplification and ligation-based sequencing for microbiota analysis. In order to validate the efficiency of the approach, we thoroughly analyzed six tongue coating samples from lung cancer patients with different TCM types, and more than 600 genera of bacteria were detected by this platform. The results showed that ligation-based parallel sequencing combined with enzyme digestion and multiamplification could expand the effective length of sequencing reads and could be applied in the microbiota analysis.

  15. HIV-1 envelope sequence-based diversity measures for identifying recent infections.

    Directory of Open Access Journals (Sweden)

    Alexis Kafando

    Full Text Available Identifying recent HIV-1 infections is crucial for monitoring HIV-1 incidence and optimizing public health prevention efforts. To identify recent HIV-1 infections, we evaluated and compared the performance of 4 sequence-based diversity measures including percent diversity, percent complexity, Shannon entropy and number of haplotypes targeting 13 genetic segments within the env gene of HIV-1. A total of 597 diagnostic samples obtained in 2013 and 2015 from recently and chronically HIV-1 infected individuals were selected. From the selected samples, 249 (134 from recent versus 115 from chronic infections env coding regions, including V1-C5 of gp120 and the gp41 ectodomain of HIV-1, were successfully amplified and sequenced by next generation sequencing (NGS using the Illumina MiSeq platform. The ability of the four sequence-based diversity measures to correctly identify recent HIV infections was evaluated using the frequency distribution curves, median and interquartile range and area under the curve (AUC of the receiver operating characteristic (ROC. Comparing the median and interquartile range and evaluating the frequency distribution curves associated with the 4 sequence-based diversity measures, we observed that the percent diversity, number of haplotypes and Shannon entropy demonstrated significant potential to discriminate recent from chronic infections (p<0.0001. Using the AUC of ROC analysis, only the Shannon entropy measure within three HIV-1 env segments could accurately identify recent infections at a satisfactory level. The env segments were gp120 C2_1 (AUC = 0.806, gp120 C2_3 (AUC = 0.805 and gp120 V3 (AUC = 0.812. Our results clearly indicate that the Shannon entropy measure represents a useful tool for predicting HIV-1 infection recency.

  16. Detection of methylation in promoter sequences by melting curve analysis-based semiquantitative real time PCR

    Directory of Open Access Journals (Sweden)

    Lázcoz Paula

    2008-02-01

    Full Text Available Abstract Background We present two melting curve analysis (MCA-based semiquantitative real time PCR techniques to detect the promoter methylation status of genes. The first, MCA-MSP, follows the same principle as standard MSP but it is performed in a real time thermalcycler with results being visualized in a melting curve. The second, MCA-Meth, uses a single pair of primers designed with no CpGs in its sequence. These primers amplify both unmethylated and methylated sequences. In clinical applications the MSP technique has revolutionized methylation detection by simplifying the analysis to a PCR-based protocol. MCA-analysis based techniques may be able to further improve and simplify methylation analyses by reducing starting DNA amounts, by introducing an all-in-one tube reaction and by eliminating a final gel stage for visualization of the result. The current study aimed at investigating the feasibility of both MCA-MSP and MCA-Meth in the analysis of promoter methylation, and at defining potential advantages and shortcomings in comparison to currently implemented techniques, i.e. bisulfite sequencing and standard MSP. Methods The promoters of the RASSF1A (3p21.3, BLU (3p21.3 and MGMT (10q26 genes were analyzed by MCA-MSP and MCA-Meth in 13 astrocytoma samples, 6 high grade glioma cell lines and 4 neuroblastoma cell lines. The data were compared with standard MSP and validated by bisulfite sequencing. Results Both, MCA-MSP and MCA-Meth, successfully determined promoter methylation. MCA-MSP provided information similar to standard MSP analyses. However the analysis was possible in a single tube and avoided the gel stage. MCA-Meth proved to be useful in samples with intermediate methylation status, reflected by a melting curve position shift in dependence on methylation extent. Conclusion We propose MCA-MSP and MCA-Meth as alternative or supplementary techniques to MSP or bisulfite sequencing.

  17. The physiological effects of concurrent strength and endurance training sequence: A systematic review and meta-analysis.

    Science.gov (United States)

    Murlasits, Zsolt; Kneffel, Zsuzsanna; Thalib, Lukman

    2018-06-01

    We conducted a systematic literature review and meta-analysis to assess the chronic effects of the sequence of concurrent strength and endurance training on selected important physiological and performance parameters, namely lower body 1 repetition maximum (1RM) and maximal aerobic capacity (VO 2 max/peak). Based on predetermined eligibility criteria, chronic effect trials, comparing strength-endurance (SE) with endurance-strength (ES) training sequence in the same session were included. Data on effect sizes, sample size and SD as well other related study characteristics were extracted. The effect sizes were pooled using, Fixed or Random effect models as per level of heterogeneity between studies and a further sensitivity analyses was carried out using Inverse Variance Heterogeneity (IVHet) models to adjust for potential bias due to heterogeneity. Lower body 1RM was significantly higher when strength training preceded endurance with a pooled mean change of 3.96 kg (95%CI: 0.81 to 7.10 kg). However, the training sequence had no impact on aerobic capacity with a pooled mean difference of 0.39 ml.kg.min -1 (95%CI: -1.03 to 1.81 ml.kg.min -1 ). Sequencing strength training prior to endurance in concurrent training appears to be beneficial for lower body strength adaptations, while the improvement of aerobic capacity is not affected by training order.

  18. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing.

    Science.gov (United States)

    Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.

  19. Genome Survey Sequencing and Genetic Background Characterization of Gracilariopsis lemaneiformis (Rhodophyta) Based on Next-Generation Sequencing

    Science.gov (United States)

    Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008

  20. Effects of Aftershock Declustering in Risk Modeling: Case Study of a Subduction Sequence in Mexico

    Science.gov (United States)

    Kane, D. L.; Nyst, M.

    2014-12-01

    Earthquake hazard and risk models often assume that earthquake rates can be represented by a stationary Poisson process, and that aftershocks observed in historical seismicity catalogs represent a deviation from stationarity that must be corrected before earthquake rates are estimated. Algorithms for classifying individual earthquakes as independent mainshocks or as aftershocks vary widely, and analysis of a single catalog can produce considerably different earthquake rates depending on the declustering method implemented. As these rates are propagated through hazard and risk models, the modeled results will vary due to the assumptions implied by these choices. In particular, the removal of large aftershocks following a mainshock may lead to an underestimation of the rate of damaging earthquakes and potential damage due to a large aftershock may be excluded from the model. We present a case study based on the 1907 - 1911 sequence of nine 6.9 Mexico in order to illustrate the variability in risk under various declustering approaches. Previous studies have suggested that subduction zone earthquakes in Mexico tend to occur in clusters, and this particular sequence includes events that would be labeled as aftershocks in some declustering approaches yet are large enough to produce significant damage. We model the ground motion for each event, determine damage ratios using modern exposure data, and then compare the variability in the modeled damage from using the full catalog or one of several declustered catalogs containing only "independent" events. We also consider the effects of progressive damage caused by each subsequent event and how this might increase or decrease the total losses expected from this sequence.

  1. Molecular characterization of Fasciola gigantica from Mauritania based on mitochondrial and nuclear ribosomal DNA sequences.

    Science.gov (United States)

    Amor, Nabil; Farjallah, Sarra; Salem, Mohamed; Lamine, Dia Mamadou; Merella, Paolo; Said, Khaled; Ben Slimane, Badreddine

    2011-10-01

    Fasciolosis caused by Fasciola hepatica and Fasciola gigantica (Platyhelminthes: Trematoda: Digenea) is considered the most important helminth infection of ruminants in tropical countries, causing considerable socioeconomic problems. From Africa, F. gigantica has been previously characterized from Burkina Faso, Senegal, Kenya, Zambia and Mali, while F. hepatica has been reported from Morocco and Tunisia, and both species have been observed from Ethiopia and Egypt on the basis of morphometric differences, while the use of molecular markers is necessary to distinguish exactly between species. Samples identified morphologically as F. gigantica (n=60) from sheep and cattle from different geographical localities of Mauritania were genetically characterized by sequences of the first (ITS-1), the 5.8S, and second (ITS-2) Internal Transcribed Spacers (ITS) of nuclear ribosomal DNA (rDNA) genes and the mitochondrial Cytochrome c Oxidase I (COI) gene. Comparison of the sequences of the Mauritanian samples with sequences of Fasciola spp. from GenBank confirmed that all samples belong to the species F. gigantica. The nucleotide sequencing of ITS rDNA of F. gigantica showed no nucleotide variation in the ITS-1, 5.8S, and ITS-2 rDNA sequences among all samples examined and those from Burkina Faso, Kenya, Egypt and Iran. The phylogenetic trees based on the ITS-1 and ITS-2 sequences showed a close relationship of the Mauritanian samples with isolates of F. gigantica from different localities of Africa and Asia. The COI genotypes of the Mauritanian specimens of F. gigantica had a high level of diversity, and they belonged to the F. gigantica phylogenically distinguishable clade. The present study is the first molecular characterization of F. gigantica in sheep and cattle from Mauritania, allowing a reliable approach for the genetic differentiation of Fasciola spp. and providing basis for further studies on liver flukes in the African countries. Copyright © 2011 Elsevier Inc. All

  2. Authentication of Zanthoxylum Species Based on Integrated Analysis of Complete Chloroplast Genome Sequences and Metabolite Profiles.

    Science.gov (United States)

    Lee, Hyeon Ju; Koo, Hyun Jo; Lee, Jonghoon; Lee, Sang-Choon; Lee, Dong Young; Giang, Vo Ngoc Linh; Kim, Minjung; Shim, Hyeonah; Park, Jee Young; Yoo, Ki-Oug; Sung, Sang Hyun; Yang, Tae-Jin

    2017-11-29

    We performed chloroplast genome sequencing and comparative analysis of two Rutaceae species, Zanthoxylum schinifolium (Korean pepper tree) and Z. piperitum (Japanese pepper tree), which are medicinal and culinary crops in Asia. We identified more than 837 single nucleotide polymorphisms and 103 insertions/deletions (InDels) based on a comparison of the two chloroplast genomes and developed seven DNA markers derived from five tandem repeats and two InDel variations that discriminated between Korean Zanthoxylum species. Metabolite profile analysis pointed to three metabolic groups, one with Korean Z. piperitum samples, one with Korean Z. schinifolium samples, and the last containing all the tested Chinese Zanthoxylum species samples, which are considered to be Z. bungeanum based on our results. Two markers were capable of distinguishing among these three groups. The chloroplast genome sequences identified in this study represent a valuable genomics resource for exploring diversity in Rutaceae, and the molecular markers will be useful for authenticating dried Zanthoxylum berries in the marketplace.

  3. State of the art and challenges in sequence based T-cell epitope prediction

    DEFF Research Database (Denmark)

    Lundegaard, Claus; Hoof, Ilka; Lund, Ole

    2010-01-01

    Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway......, the field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreover have so-called pan-specific methods been developed, which allow for prediction of peptide binding...... to MHC alleles characterized by limited or no peptide binding data. Most of the developed methods are publicly available, and have proven to be very useful as a shortcut in epitope discovery. Here, we will go through some of the history of sequence-based predictions of helper as well as cytotoxic T cell...

  4. Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data

    Czech Academy of Sciences Publication Activity Database

    Macas, Jiří; Neumann, Pavel; Novák, Petr; Jiang, J.

    2010-01-01

    Roč. 26, č. 1797 (2010), s. 2101-2108 ISSN 1367-4803 R&D Projects: GA AV ČR KJB500960802; GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : next-generation sequencing * satellite repeats * K-mer analysis Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.877, year: 2010

  5. Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing

    Directory of Open Access Journals (Sweden)

    Li Wei

    2005-05-01

    Full Text Available Abstract Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.

  6. A group-specific sequence-based typing approach for HLA-DQA1.

    Science.gov (United States)

    Lemin, A J; Johnson, J; Darke, C

    2015-02-01

    An HLA-DQA1 sequence-based typing method reliant upon group-specific amplification to achieve an unambiguous second-field DQA1 typing assignment is presented. Method validation, using 51 reference DNA samples covering 21 different DQA1 alleles, showed 100% concordance with the reference types. This typing strategy has several important uses including identifying DQA1 mismatches in kidney donor/recipient pairs to inform patient DQ antibody assignments. © 2014 John Wiley & Sons Ltd.

  7. Nucleic and amino acid sequences support structure-based viral classification

    OpenAIRE

    Robert M., Sinclair; Janne J., Ravantti; Dennis H., Bamford

    2017-01-01

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome, capsid and between individual capsid proteins (i.e. “capsid architecture”) are intimate and expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made...

  8. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    OpenAIRE

    Sinclair, Robert M.; Ravantti, Janne J.; Bamford, Dennis H.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classifi...

  9. State of the art and challenges in sequence based T-cell epitope prediction

    OpenAIRE

    Lundegaard, Claus; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2010-01-01

    Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreov...

  10. Sequence-based separation of single-stranded DNA using nucleotides in capillary electrophoresis: focus on phosphate.

    Science.gov (United States)

    Zhang, Xueru; McGown, Linda B

    2013-06-01

    DNA analysis has widespread applicability in biology, medicine, biotechnology, and forensics. DNA separation by length is readily achieved using sieving gels in electrophoresis. Separation by sequence is less simple, generally requiring adequate differences in native or induced conformation or differences in thermal or chemical stability of the strands that are hybridized prior to measurement. We previously demonstrated separation of four single-stranded DNA 76-mers that differ by only a few A-G substitutions based solely on sequence using guanosine-5'-monophosphate (GMP) in the running buffer. We attributed separation to the unique self-assembly of GMP to form higher order structures. Here, we examine an expanded set of 76-mers designed to probe the mechanism of the separation and effects of experimental conditions. We were surprised to find that other ribonucleotides achieved the similar separation to GMP, and that some separation was achieved using sodium phosphate instead of GMP. Potassium phosphate achieved almost as good separations as the ribonucleotides. This suggests that the separation medium provides a physicochemical environment for the DNA that effects strand migration in a sequence-selective manner. Further investigation is needed to determine whether the mechanism involves specific interactions between the phosphates and the DNA strands or is a result of other properties of the separation medium. Phosphate generally has been avoided in DNA separations by capillary gel electrophoresis because its high ionic strength exacerbates Joule heating. Our results suggest that phosphate compounds should be examined for separation of DNA based on sequence. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  12. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  13. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    Science.gov (United States)

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  14. A genome sequence-based approach to taxonomy of the genus Nocardia.

    Science.gov (United States)

    Tamura, Tomohiko; Matsuzawa, Tetsuhiro; Oji, Syoko; Ichikawa, Natsuko; Hosoyama, Akira; Katsumata, Hiroshi; Yamazoe, Atsushi; Hamada, Moriyuki; Suzuki, Ken-ichiro; Gonoi, Toru; Fujita, Nobuyuki

    2012-10-01

    The genus Nocardia includes both pathogens and producers of useful secondary metabolites. Although 16S rRNA analysis is required to accurately discriminate among phylogenetic relationships of the Nocardia species, most branches of 16S rRNA-based phylogenetic trees are not reliable. In this study, we performed in silico analyses of the genome sequences of Nocardia species in order to understand their diversity and classification for their identification and applications. Draft genome sequences of 26 Nocardia strains were determined. Phylogenetic trees were prepared on the basis of multilocus sequence analysis of the concatenated sequences of 12 genes (atpD-dnaJ-groL1-groL2-gyrB-recA-rpoA-secA-secY-sodA-trpB-ychF) and a bidirectional best hit. To elucidate the evolutionary relationships of these genes, the genome-to-genome distance was investigated on the basis of the average nucleotide identity, DNA maximal unique matches index, and genome-to-genome distance calculator. The topologies of all phylogenetic trees were found to be essentially similar to each other. Furthermore, whole genome-derived and multiple gene-derived relationships were found to be suitable for extensive intra-genus assessment of the genus Nocardia.

  15. Synthesis and evaluation of sequence-specific DNA alkylating agents: effect of alkylation subunits.

    Science.gov (United States)

    Shimizu, Tatsuhiko; Sasaki, Shunta; Minoshima, Masafumi; Shinohara, Ken-ichi; Bando, Toshikazu; Sugiyama, Hiroshi

    2006-01-01

    We have demonstrated that hairpin pyrrole (Py)- imidazole (Im) polyamide-CBI conjugates selectively alkylate predetermined sequences. In this study, we investigated the effect of alkylation subunits, for example conjugates 1-4 with three types of DNA alkylating units, and Py-Im polyamides with indole linker. Conjugate 3 and 4 selectively alkylated the predetermined sequences as described previously, while conjugates 1 and 2 alkylate at mismatched sites.

  16. 16SPIP: a comprehensive analysis pipeline for rapid pathogen detection in clinical samples based on 16S metagenomic sequencing.

    Science.gov (United States)

    Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Tingting; Li, Xiuwen; Zhang, Wen

    2017-12-28

    Pathogen detection in clinical samples based on 16S metagenomic sequencing technology in microbiology laboratories is an important strategy for clinical diagnosis, public health surveillance, and investigations of outbreaks. However, the implementation of the technology is limited by its accuracy and the time required for bioinformatics analysis. Therefore, a simple, standardized, and rapid analysis pipeline from the receipt of clinical samples to the generation of a test report is needed to increase the use of metagenomic analyses in clinical settings. We developed a comprehensive bioinformatics analysis pipeline for the identification of pathogens in clinical samples based on 16S metagenomic sequencing data, named 16SPIP. This pipeline offers two analysis modes (fast and sensitive mode) for the rapid conversion of clinical 16S metagenomic data to test reports for pathogen detection. The pipeline includes tools for data conversion, quality control, merging of paired-end reads, alignment, and pathogen identification. We validated the feasibility and accuracy of the pipeline using a combination of culture and whole-genome shotgun (WGS) metagenomic analyses. 16SPIP may be effective for the analysis of 16S metagenomic sequencing data for real-time, rapid, and unbiased pathogen detection in clinical samples.

  17. Development and evaluation of a non-ribosomal random PCR and next-generation sequencing based assay for detection and sequencing of hand, foot and mouth disease pathogens.

    Science.gov (United States)

    Nguyen, Anh To; Tran, Thanh Tan; Hoang, Van Minh Tu; Nghiem, Ngoc My; Le, Nhu Nguyen Truc; Le, Thanh Thi My; Phan, Qui Tu; Truong, Khanh Huu; Le, Nhan Nguyen Thanh; Ho, Viet Lu; Do, Viet Chau; Ha, Tuan Manh; Nguyen, Hung Thanh; Nguyen, Chau Van Vinh; Thwaites, Guy; van Doorn, H Rogier; Le, Tan Van

    2016-07-07

    Hand, foot and mouth disease (HFMD) has become a major public health problem across the Asia-Pacific region, and is commonly caused by enterovirus A71 (EV-A71) and coxsackievirus A6 (CV-A6), CV-A10 and CV-A16. Generating pathogen whole-genome sequences is essential for understanding their evolutionary biology. The frequent replacements among EV serotypes and a limited numbers of available whole-genome sequences hinder the development of overlapping PCRs for whole-genome sequencing. We developed and evaluated a non-ribosomal random PCR (rPCR) and next-generation sequencing based assay for sequence-independent whole-genome amplification and sequencing of HFMD pathogens. A total of 16 EV-A71/CV-A6/CV-A10/CV-A16 PCR positive rectal/throat swabs (Cp values: 20.9-33.3) were used for assay evaluation. Our assay evidently outperformed the conventional rPCR in terms of the total number of EV-A71 reads and the percentage of EV-A71 reads: 2.6 % (1275/50,000 reads) vs. 0.1 % (31/50,000) and 6 % (3008/50,000) vs. 0.9 % (433/50,000) for two samples with Cp values of 30 and 26, respectively. Additionally the assay could generate genome sequences with the percentages of coverage of 94-100 % of 4 different enterovirus serotypes in 73 % of the tested samples, representing the first whole-genome sequences of CV-A6/10/16 from Vietnam, and could assign correctly serotyping results in 100 % of 24 tested specimens. In all but three the obtained consensuses of two replicates from the same sample were 100 % identical, suggesting that our assay is highly reproducible. In conclusion, we have successfully developed a non-ribosomal rPCR and next-generation sequencing based assay for sensitive detection and direct whole-genome sequencing of HFMD pathogens from clinical samples.

  18. Prosodic effects on glide-vowel sequences in three Romance languages

    Science.gov (United States)

    Chitoran, Ioana

    2004-05-01

    Glide-vowel sequences occur in many Romance languages. In some they can vary in production, ranging from diphthongal pronunciation [ja,je] to hiatus [ia,ie]. According to native speakers' impressionistic perceptions, Spanish and Romanian both exhibit this variation, but to different degrees. Spanish favors glide-vowel sequences, while Romanian favors hiatus, occasionally resulting in different pronunciations of the same items: Spanish (b[j]ela, ind[j]ana), Romanian (b[i]ela, ind[i]ana). The third language, French, has glide-vowel sequences consistently (b[j]elle). This study tests the effect of position in the word on the acoustic duration of the sequences. Shorter duration indicates diphthong production [jV], while longer duration, hiatus [iV]. Eleven speakers (4 Spanish, 4 Romanian, 3 French), were recorded. Spanish and Romanian showed a word position effect. Word-initial sequences were significantly longer than word-medial ones (p0.05). In the Spanish and Romanian sentences, V in the sequence bears pitch accent, but not in French. It is therefore possible that duration is sensitive not to the presence/absence of the word boundary, but to its position relative to pitch accent. The results suggest that the word position effect is crucially enhanced by pitch accent on V.

  19. Inter-familial relationships of the shorebirds (Aves: Charadriiformes based on nuclear DNA sequence data

    Directory of Open Access Journals (Sweden)

    Irestedt Martin

    2003-07-01

    Full Text Available Abstract Background Phylogenetic hypotheses of higher-level relationships in the order Charadriiformes based on morphological data, partly disagree with those based on DNA-DNA hybridisation data. So far, these relationships have not been tested by analysis of DNA sequence data. Herein we utilize 1692 bp of aligned, nuclear DNA sequences obtained from 23 charadriiform species, representing 15 families. We also test earlier suggestions that bustards and sandgrouses may be nested with the charadriiforms. The data is analysed with methods based on the parsimony and maximum-likelihood criteria. Results Several novel phylogenetic relationships were recovered and strongly supported by the data, regardless of which method of analysis was employed. These include placing the gulls and allied groups as a sistergroup to the sandpiper-like birds, and not to the plover-like birds. The auks clearly belong to the clade with the gulls and allies, and are not basal to most other charadriiform birds as suggested in analyses of morphological data. Pluvialis, which has been supposed to belong to the plover family (Charadriidae, represents a basal branch that constitutes the sister taxon to a clade with plovers, oystercatchers and avocets. The thick-knees and sheathbills unexpectedly cluster together. Conclusion The DNA sequence data contains a strong phylogenetic signal that results in a well-resolved phylogenetic tree with many strongly supported internodes. Taxonomically it is the most inclusive study of shorebird families that relies on nucleotide sequences. The presented phylogenetic hypothesis provides a solid framework for analyses of macroevolution of ecological, morphological and behavioural adaptations observed within the order Charadriiformes.

  20. Compensation of negative sequence stator flux of doubly-fed induction generator using polar voltage control-based direct torque control under unbalanced grid voltage condition

    Directory of Open Access Journals (Sweden)

    Badrinarayan Bansilal Pimple

    2015-02-01

    Full Text Available This study proposes a polar voltage control-based direct torque control method to reduce the effects of unbalanced grid voltage on doubly-fed induction generator (DFIG-based wind turbine system. Under unbalanced grid voltage, the stator flux has a negative sequence component which leads to second harmonic pulsation in torque, stator active power, stator reactive power, stator current and rotor current. In the control scheme, the negative sequence rotor voltage vector is controlled to compensate the negative sequence stator flux by negative sequence rotor flux. Simulation study is carried out on a 2 MW DFIG system using MATLAB/SIMULINK. Feasibility of the proposed control strategy is experimentally verified on a 1.5 kW DFIG system.

  1. The effect of attentional load on implicit sequence learning in children and young adults

    Directory of Open Access Journals (Sweden)

    Daphné eCoomans

    2014-05-01

    Full Text Available We investigated the effect of a secondary task on implicit sequence learning in children and young adults. A serial reaction time task was administered to 8-to-10 year old children and 18-to-22 year old adults. Participants reacted to the location of a target presented in one of four locations on the screen with a spatially corresponding response key. Unknown to participants, the location at which the target appeared was structured according to a deterministic sequence. Occasionally, the black target dot was replaced by a red target dog. To assess the effect of attentional load on implicit sequence learning, half of the participants of each age group was assigned to the single task condition, while the other half executed the task under dual task conditions. Whereas participants in the single task condition could ignore the change in target identity, dual task participants additionally had to count the number of times the black dot was replaced by a red dog to increase the attentional load. Sequence learning was tested under single task conditions in both conditions. Z-transformed results indicate that young adults generally showed more sequence learning than children. Importantly, the secondary task had no effect on sequence learning in children, since children learned as much under dual task conditions as under single task conditions. Adults, on the other hand, showed a different result pattern, as they displayed more sequence learning under single task than under dual task conditions. We surmise that this result is due to the vainly attempt of adults, but not children, to integrate both sequences.

  2. Refinement of the Diatom Episome Maintenance Sequence and Improvement of Conjugation-based DNA Delivery Methods

    Directory of Open Access Journals (Sweden)

    Rachel E Diner

    2016-08-01

    Full Text Available Conjugation of episomal plasmids from bacteria to diatoms advances diatom genetic manipulation by simplifying transgene delivery and providing a stable and consistent gene expression platform. To reach its full potential, this nascent technology requires new optimized expression vectors and a deeper understanding of episome maintenance. Here we present the development of an additional diatom vector (pPtPBR1, based on the parent plasmid pBR322, to add a plasmid maintained at medium copy number in E. coli to the diatom genetic toolkit. Using this new vector, we evaluated the contribution of individual yeast DNA elements comprising the 1.4-kb tripartite CEN6-ARSH4-HIS3 sequence that enables episome maintenance in P. tricornutum. While various combinations of these individual elements enable efficient conjugation and high ex-conjugant yield in P. tricornutum, individual elements alone do not. Conjugation of episomes containing CEN6-ARSH4 and a small sequence from the low GC content 3’ end of HIS3 produced the highest number of diatom ex-conjugant colonies, resulting in a smaller and more efficient vector design. Our findings suggest that the CEN6 and ARSH4 sequences function differently in yeast and diatoms, and that low GC content regions of greater than ~500 bp are a potential indicator of a functional diatom episome maintenance sequence. Additionally, we have developed improvements to the conjugation protocol including a higher-throughput option utilizing 12-well plates, and plating methods that improve ex-conjugant yield and reduce time and materials required for the conjugation protocol. The data presented offer additional information regarding the mechanism by which the yeast-derived sequence enables diatom episome maintenance, and demonstrate options for flexible vector design.

  3. Refinement of the Diatom Episome Maintenance Sequence and Improvement of Conjugation-Based DNA Delivery Methods.

    Science.gov (United States)

    Diner, Rachel E; Bielinski, Vincent A; Dupont, Christopher L; Allen, Andrew E; Weyman, Philip D

    2016-01-01

    Conjugation of episomal plasmids from bacteria to diatoms advances diatom genetic manipulation by simplifying transgene delivery and providing a stable and consistent gene expression platform. To reach its full potential, this nascent technology requires new optimized expression vectors and a deeper understanding of episome maintenance. Here, we present the development of an additional diatom vector (pPtPBR1), based on the parent plasmid pBR322, to add a plasmid maintained at medium copy number in Escherichia coli to the diatom genetic toolkit. Using this new vector, we evaluated the contribution of individual yeast DNA elements comprising the 1.4-kb tripartite CEN6-ARSH4-HIS3 sequence that enables episome maintenance in Phaeodactylum tricornutum. While various combinations of these individual elements enable efficient conjugation and high exconjugant yield in P. tricornutum, individual elements alone do not. Conjugation of episomes containing CEN6-ARSH4 and a small sequence from the low GC content 3' end of HIS3 produced the highest number of diatom exconjugant colonies, resulting in a smaller and more efficient vector design. Our findings suggest that the CEN6 and ARSH4 sequences function differently in yeast and diatoms, and that low GC content regions of greater than ~500 bp are a potential indicator of a functional diatom episome maintenance sequence. Additionally, we have developed improvements to the conjugation protocol including a high-throughput option utilizing 12-well plates and plating methods that improve exconjugant yield and reduce time and materials required for the conjugation protocol. The data presented offer additional information regarding the mechanism by which the yeast-derived sequence enables diatom episome maintenance and demonstrate options for flexible vector design.

  4. Performance of amplicon-based next generation DNA sequencing for diagnostic gene mutation profiling in oncopathology.

    Science.gov (United States)

    Sie, Daoud; Snijders, Peter J F; Meijer, Gerrit A; Doeleman, Marije W; van Moorsel, Marinda I H; van Essen, Hendrik F; Eijk, Paul P; Grünberg, Katrien; van Grieken, Nicole C T; Thunnissen, Erik; Verheul, Henk M; Smit, Egbert F; Ylstra, Bauke; Heideman, Daniëlle A M

    2014-10-01

    Next generation DNA sequencing (NGS) holds promise for diagnostic applications, yet implementation in routine molecular pathology practice requires performance evaluation on DNA derived from routine formalin-fixed paraffin-embedded (FFPE) tissue specimens. The current study presents a comprehensive analysis of TruSeq Amplicon Cancer Panel-based NGS using a MiSeq Personal sequencer (TSACP-MiSeq-NGS) for somatic mutation profiling. TSACP-MiSeq-NGS (testing 212 hotspot mutation amplicons of 48 genes) and a data analysis pipeline were evaluated in a retrospective learning/test set approach (n = 58/n = 45 FFPE-tumor DNA samples) against 'gold standard' high-resolution-melting (HRM)-sequencing for the genes KRAS, EGFR, BRAF and PIK3CA. Next, the performance of the validated test algorithm was assessed in an independent, prospective cohort of FFPE-tumor DNA samples (n = 75). In the learning set, a number of minimum parameter settings was defined to decide whether a FFPE-DNA sample is qualified for TSACP-MiSeq-NGS and for calling mutations. The resulting test algorithm revealed 82% (37/45) compliance to the quality criteria and 95% (35/37) concordant assay findings for KRAS, EGFR, BRAF and PIK3CA with HRM-sequencing (kappa = 0.92; 95% CI = 0.81-1.03) in the test set. Subsequent application of the validated test algorithm to the prospective cohort yielded a success rate of 84% (63/75), and a high concordance with HRM-sequencing (95% (60/63); kappa = 0.92; 95% CI = 0.84-1.01). TSACP-MiSeq-NGS detected 77 mutations in 29 additional genes. TSACP-MiSeq-NGS is suitable for diagnostic gene mutation profiling in oncopathology.

  5. Molecular Characterization of Five Potyviruses Infecting Korean Sweet Potatoes Based on Analyses of Complete Genome Sequences

    Directory of Open Access Journals (Sweden)

    Hae-Ryun Kwak

    2015-12-01

    Full Text Available Sweet potatoes (Ipomea batatas L. are grown extensively, in tropical and temperate regions, and are important food crops worldwide. In Korea, potyviruses, including Sweet potato feathery mottle virus (SPFMV, Sweet potato virus C (SPVC, Sweet potato virus G (SPVG, Sweet potato virus 2 (SPV2, and Sweet potato latent virus (SPLV, have been detected in sweet potato fields at a high (~95% incidence. In the present work, complete genome sequences of 18 isolates, representing the five potyviruses mentioned above, were compared with previously reported genome sequences. The complete genomes consisted of 10,081 to 10,830 nucleotides, excluding the poly-A tails. Their genomic organizations were typical of the Potyvirus genus, including one target open reading frame coding for a putative polyprotein. Based on phylogenetic analyses and sequence comparisons, the Korean SPFMV isolates belonged to the strains RC and O with >98% nucleotide sequence identity. Korean SPVC isolates had 99% identity to the Japanese isolate SPVC-Bungo and 70% identity to the SPFMV isolates. The Korean SPVG isolates showed 99% identity to the three previously reported SPVG isolates. Korean SPV2 isolates had 97% identity to the SPV2 GWB-2 isolate from the USA. Korean SPLV isolates had a relatively low (88% nucleotide sequence identity with the Taiwanese SPLV-TW isolates, and they were phylogenetically distantly related to SPFMV isolates. Recombination analysis revealed that possible recombination events occurred in the P1, HC-Pro and NIa-NIb regions of SPFMV and SPLV isolates and these regions were identified as hotspots for recombination in the sweet potato potyviruses.

  6. Estimation of physiological parameters using knowledge-based factor analysis of dynamic nuclear medicine image sequences

    International Nuclear Information System (INIS)

    Yap, J.T.; Chen, C.T.; Cooper, M.

    1995-01-01

    The authors have previously developed a knowledge-based method of factor analysis to analyze dynamic nuclear medicine image sequences. In this paper, the authors analyze dynamic PET cerebral glucose metabolism and neuroreceptor binding studies. These methods have shown the ability to reduce the dimensionality of the data, enhance the image quality of the sequence, and generate meaningful functional images and their corresponding physiological time functions. The new information produced by the factor analysis has now been used to improve the estimation of various physiological parameters. A principal component analysis (PCA) is first performed to identify statistically significant temporal variations and remove the uncorrelated variations (noise) due to Poisson counting statistics. The statistically significant principal components are then used to reconstruct a noise-reduced image sequence as well as provide an initial solution for the factor analysis. Prior knowledge such as the compartmental models or the requirement of positivity and simple structure can be used to constrain the analysis. These constraints are used to rotate the factors to the most physically and physiologically realistic solution. The final result is a small number of time functions (factors) representing the underlying physiological processes and their associated weighting images representing the spatial localization of these functions. Estimation of physiological parameters can then be performed using the noise-reduced image sequence generated from the statistically significant PCs and/or the final factor images and time functions. These results are compared to the parameter estimation using standard methods and the original raw image sequences. Graphical analysis was performed at the pixel level to generate comparable parametric images of the slope and intercept (influx constant and distribution volume)

  7. High resolution profiling of human exon methylation by liquid hybridization capture-based bisulfite sequencing

    Directory of Open Access Journals (Sweden)

    Wang Junwen

    2011-12-01

    Full Text Available Abstract Background DNA methylation plays important roles in gene regulation during both normal developmental and disease states. In the past decade, a number of methods have been developed and applied to characterize the genome-wide distribution of DNA methylation. Most of these methods endeavored to screen whole genome and turned to be enormously costly and time consuming for studies of the complex mammalian genome. Thus, they are not practical for researchers to study multiple clinical samples in biomarker research. Results Here, we display a novel strategy that relies on the selective capture of target regions by liquid hybridization followed by bisulfite conversion and deep sequencing, which is referred to as liquid hybridization capture-based bisulfite sequencing (LHC-BS. To estimate this method, we utilized about 2 μg of native genomic DNA from YanHuang (YH whole blood samples and a mature dendritic cell (mDC line, respectively, to evaluate their methylation statuses of target regions of exome. The results indicated that the LHC-BS system was able to cover more than 97% of the exome regions and detect their methylation statuses with acceptable allele dropouts. Most of the regions that couldn't provide accurate methylation information were distributed in chromosomes 6 and Y because of multiple mapping to those regions. The accuracy of this strategy was evaluated by pair-wise comparisons using the results from whole genome bisulfite sequencing and validated by bisulfite specific PCR sequencing. Conclusions In the present study, we employed a liquid hybridisation capture system to enrich for exon regions and then combined with bisulfite sequencing to examine the methylation statuses for the first time. This technique is highly sensitive and flexible and can be applied to identify differentially methylated regions (DMRs at specific genomic locations of interest, such as regulatory elements or promoters.

  8. A phylogenetic framework for the kingdom Fungi based on 18S rRNA gene sequences.

    Science.gov (United States)

    Yarza, Pablo; Yilmaz, Pelin; Panzer, Katrin; Glöckner, Frank Oliver; Reich, Marlis

    2017-12-01

    The usage of molecular phylogenetic approaches is critical to advance the understanding of systematics and community processes in the kingdom Fungi. Among the possible phylogenetic markers (or combinations of them), the 18S rRNA gene appears currently as the most prominent candidate due to its large availability in public databases and informative content. The purpose of this work was the creation of a reference phylogenetic framework that can serve as ready-to-use package for its application on fungal classification and community analysis. The current database contains 9329 representative 18S rRNA gene sequences covering the whole fungal kingdom, a manually curated alignment, an annotated and revised phylogenetic tree with all the sequence entries, updated information on current taxonomy, and recommendations of use. Out of 201 total fungal taxa with more than two sequences in the dataset, 179 were monophyletic. From another perspective, 66% of the entries had a tree-derived classification identical to that obtained from the NCBI taxonomy, whereas 34% differed in one or the other rank. Most of the differences were associated to missing taxonomic assignments in NCBI taxonomy, or the unexpected position of sequences that positioned out of their theoretically corresponding clades. The strong correlation observed with current fungal taxonomy evidences that 18S rRNA gene sequence-based phylogenies are adequate to reflect genealogy of Fungi at the levels of order and above, and justify their further usage and exploration. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Next-generation Sequencing-based genomic profiling: Fostering innovation in cancer care?

    Science.gov (United States)

    Fernandes, Gustavo S; Marques, Daniel F; Girardi, Daniel M; Braghiroli, Maria Ignez F; Coudry, Renata A; Meireles, Sibele I; Katz, Artur; Hoff, Paulo M

    2017-10-01

    With the development of next-generation sequencing (NGS) technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0%) were female, and 91 (58.0%) were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6%) had at least one identified gene alteration. Twenty-four patients (15.2%) underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7%) had partial responses, two (8.3%) had stable disease, and 17 (70.8%) had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.

  10. Next-generation Sequencing-based genomic profiling: Fostering innovation in cancer care?

    Directory of Open Access Journals (Sweden)

    Gustavo S. Fernandes

    Full Text Available OBJECTIVES: With the development of next-generation sequencing (NGS technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. METHODS: We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. RESULTS: From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0% were female, and 91 (58.0% were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6% had at least one identified gene alteration. Twenty-four patients (15.2% underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7% had partial responses, two (8.3% had stable disease, and 17 (70.8% had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. CONCLUSION: We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.

  11. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes.

    Science.gov (United States)

    Ruppitsch, Werner; Pietzka, Ariane; Prior, Karola; Bletz, Stefan; Fernandez, Haizpea Lasa; Allerberger, Franz; Harmsen, Dag; Mellmann, Alexander

    2015-09-01

    Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ≥90% sequence similarity. Overall, ≥99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ≤10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks. Copyright © 2015 Ruppitsch et al.

  12. Effects of cloning and root-tip size on observations of fungal ITS sequences from Picea glauca roots

    Science.gov (United States)

    Daniel L. Lindner; Mark T. Banik

    2009-01-01

    To better understand the effects of cloning on observations of fungal ITS sequences from Picea glauca (white spruce) roots two techniques were compared: (i) direct sequencing of fungal ITS regions from individual root tips without cloning and (ii) cloning and sequencing of fungal ITS regions from individual root tips. Effect of root tip size was...

  13. Motor sequencing in older adulthood: relationships with executive functioning and effects of complexity.

    Science.gov (United States)

    Niermeyer, Madison A; Suchy, Yana; Ziemnik, Rosemary E

    2017-04-01

    Older adults' motor sequencing performance is more reliant on executive functioning (EF) and more susceptible to complexity than that of younger adults. This study examined for which aspects of motor sequencing performance these relationships hold. Fifty-seven younger and 90 non-demented, community-dwelling, older adults completed selected subtests from the Delis-Kaplan Executive Function System as indices of EF and component processes (CP; graphomotor speed; visual scanning; etc.), as well as a computerized motor sequencing task (Push Turn Taptap task; PTT). The PTT requires participants to perform motor sequences that become progressively more complex across the task's four blocks, and is designed to assess action planning, action learning, and motor control speed and accuracy. Hierarchical regressions using each discrete aspect of performance as the dependent variable revealed that action planning is the only aspect of motor sequencing that is uniquely related to EF (beyond the CP composite) for both age groups. Action learning and motor control accuracy are uniquely associated with EF for older adults only, and only if the sequences are complex. Component processes do not fully account for the unique relationships between motor sequencing and EF in older adults. These results clarify prior findings by showing (a) more aspects of motor sequencing relate to EF for older compared to younger adults and (b) for these unique relationships, EF is only related to action during the generation of sequences that are complex. These findings further our understanding of how aging shapes the links between EF and motor actions, and can be used in evidence-based and theoretically driven intervention programs that promote healthy aging.

  14. Automated Clustering Analysis of Immunoglobulin Sequences in Chronic Lymphocytic Leukemia Based on 3D Structural Descriptors

    DEFF Research Database (Denmark)

    Marcatili, Paolo; Mochament, Konstantinos; Agathangelidis, Andreas

    2016-01-01

    study, we used the structure prediction tools PIGS and I-TASSER for creating the 3D models and the TM-align algorithm to superpose them. The innovation of the current methodology resides in the usage of methods adapted from 3D content-based search methodologies to determine the local structural...... determine it are extremely laborious and demanding. Hence, the ability to gain insight into the structure of Igs at large relies on the availability of tools and algorithms for producing accurate Ig structural models based on their primary sequence alone. These models can then be used to determine...

  15. Automated Clustering Analysis of Immunoglobulin Sequences in Chronic Lymphocytic Leukemia Based on 3D Structural Descriptors

    DEFF Research Database (Denmark)

    Marcatili, Paolo; Mochament, Konstantinos; Agathangelidis, Andreas

    2016-01-01

    (4.5%) subset #4 model (subsets #4 and #8 concern IgG CLL, in itself a rarity for CLL). These findings support that the innovative workflow described here enables robust clustering of 3D models produced from Ig sequences from patients with CLL. Furthermore, they indicate that CLL classification based...... study, we used the structure prediction tools PIGS and I-TASSER for creating the 3D models and the TM-align algorithm to superpose them. The innovation of the current methodology resides in the usage of methods adapted from 3D content-based search methodologies to determine the local structural...

  16. A Chaos-Based Secure Direct-Sequence/Spread-Spectrum Communication System

    Directory of Open Access Journals (Sweden)

    Nguyen Xuan Quyen

    2013-01-01

    Full Text Available This paper proposes a chaos-based secure direct-sequence/spread-spectrum (DS/SS communication system which is based on a novel combination of the conventional DS/SS and chaos techniques. In the proposed system, bit duration is varied according to a chaotic behavior but is always equal to a multiple of the fixed chip duration in the communication process. Data bits with variable duration are spectrum-spread by multiplying directly with a pseudonoise (PN sequence and then modulated onto a sinusoidal carrier by means of binary phase-shift keying (BPSK. To recover exactly the data bits, the receiver needs an identical regeneration of not only the PN sequence but also the chaotic behavior, and hence data security is improved significantly. Structure and operation of the proposed system are analyzed in detail. Theoretical evaluation of bit-error rate (BER performance in presence of additive white Gaussian noise (AWGN is provided. Parameter choice for different cases of simulation is also considered. Simulation and theoretical results are shown to verify the reliability and feasibility of the proposed system. Security of the proposed system is also discussed.

  17. 3D surface reconstruction based on image stitching from gastric endoscopic video sequence

    Science.gov (United States)

    Duan, Mengyao; Xu, Rong; Ohya, Jun

    2013-09-01

    This paper proposes a method for reconstructing 3D detailed structures of internal organs such as gastric wall from endoscopic video sequences. The proposed method consists of the four major steps: Feature-point-based 3D reconstruction, 3D point cloud stitching, dense point cloud creation and Poisson surface reconstruction. Before the first step, we partition one video sequence into groups, where each group consists of two successive frames (image pairs), and each pair in each group contains one overlapping part, which is used as a stitching region. Fist, the 3D point cloud of each group is reconstructed by utilizing structure from motion (SFM). Secondly, a scheme based on SIFT features registers and stitches the obtained 3D point clouds, by estimating the transformation matrix of the overlapping part between different groups with high accuracy and efficiency. Thirdly, we select the most robust SIFT feature points as the seed points, and then obtain the dense point cloud from sparse point cloud via a depth testing method presented by Furukawa. Finally, by utilizing Poisson surface reconstruction, polygonal patches for the internal organs are obtained. Experimental results demonstrate that the proposed method achieves a high accuracy and efficiency for 3D reconstruction of gastric surface from an endoscopic video sequence.

  18. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

    Science.gov (United States)

    Harris, R. Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P.; Hong, Chibo; Downey, Sara L.; Johnson, Brett E.; Fouse, Shaun D.; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J.; Gu, Junchen; Echipare, Lorigail; O’Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B.; Bernstein, Bradley E.; Hawkins, R. David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q.; Haussler, David; Ecker, Joseph; Li, Wei; Farnham, Peggy J.; Waterland, Robert A.; Meissner, Alexander; Marra, Marco A.; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F.

    2010-01-01

    Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635

  19. Segmentation of Fetal Left Ventricle in Echocardiographic Sequences Based on Dynamic Convolutional Neural Networks.

    Science.gov (United States)

    Yu, Li; Guo, Yi; Wang, Yuanyuan; Yu, Jinhua; Chen, Ping

    2017-08-01

    Segmentation of fetal left ventricle (LV) in echocardiographic sequences is important for further quantitative analysis of fetal cardiac function. However, image gross inhomogeneities and fetal random movements make the segmentation a challenging problem. In this paper, a dynamic convolutional neural networks (CNN) based on multiscale information and fine-tuning is proposed for fetal LV segmentation. The CNN is pretrained by amount of labeled training data. In the segmentation, the first frame of each echocardiographic sequence is delineated manually. The dynamic CNN is fine-tuned by deep tuning with the first frame and shallow tuning with the rest of frames, respectively, to adapt to the individual fetus. Additionally, to separate the connection region between LV and left atrium (LA), a matching approach, which consists of block matching and line matching, is used for mitral valve (MV) base points tracking. Advantages of our proposed method are compared with an active contour model (ACM), a dynamical appearance model (DAM), and a fixed multiscale CNN method. Experimental results in 51 echocardiographic sequences show that the segmentation results agree well with the ground truth, especially in the cases with leakage, blurry boundaries, and subject-to-subject variations. The CNN architecture can be simple, and the dynamic fine-tuning is efficient.

  20. Distinctively variable sequence-based nuclear DNA markers for multilocus phylogeography of the soybean- and rice-infecting fungal pathogen Rhizoctonia solani AG-1 IA

    Science.gov (United States)

    2009-01-01

    A series of multilocus sequence-based nuclear DNA markers was developed to infer the phylogeographical history of the Basidiomycetous fungal pathogen Rhizoctonia solani AG-1 IA infecting rice and soybean worldwide. The strategy was based on sequencing of cloned genomic DNA fragments (previously used as RFLP probes) and subsequent screening of fungal isolates to detect single nucleotide polymorphisms (SNPs). Ten primer pairs were designed based on these sequences, which resulted in PCR amplification of 200-320 bp size products and polymorphic sequences in all markers analyzed. By direct sequencing we identified both homokaryon and heterokaryon (i.e. dikaryon) isolates at each marker. Cloning the PCR products effectively estimated the allelic phase from heterokaryotic isolates. Information content varied among markers from 0.5 to 5.9 mutations per 100 bp. Thus, the former RFLP codominant probes were successfully converted into six distinctively variable sequence-based nuclear DNA markers. Rather than discarding low polymorphism loci, the combination of these distinctively variable anonymous nuclear markers would constitute an asset for the unbiased estimate of the phylogeographical parameters such as population sizes and divergent times, providing a more reliable species history that shaped the current population structure of R. solani AG-1 IA. PMID:21637462

  1. Multiple ECG Fiducial Points-Based Random Binary Sequence Generation for Securing Wireless Body Area Networks.

    Science.gov (United States)

    Zheng, Guanglou; Fang, Gengfa; Shankaran, Rajan; Orgun, Mehmet A; Zhou, Jie; Qiao, Li; Saleem, Kashif

    2017-05-01

    Generating random binary sequences (BSes) is a fundamental requirement in cryptography. A BS is a sequence of N bits, and each bit has a value of 0 or 1. For securing sensors within wireless body area networks (WBANs), electrocardiogram (ECG)-based BS generation methods have been widely investigated in which interpulse intervals (IPIs) from each heartbeat cycle are processed to produce BSes. Using these IPI-based methods to generate a 128-bit BS in real time normally takes around half a minute. In order to improve the time efficiency of such methods, this paper presents an ECG multiple fiducial-points based binary sequence generation (MFBSG) algorithm. The technique of discrete wavelet transforms is employed to detect arrival time of these fiducial points, such as P, Q, R, S, and T peaks. Time intervals between them, including RR, RQ, RS, RP, and RT intervals, are then calculated based on this arrival time, and are used as ECG features to generate random BSes with low latency. According to our analysis on real ECG data, these ECG feature values exhibit the property of randomness and, thus, can be utilized to generate random BSes. Compared with the schemes that solely rely on IPIs to generate BSes, this MFBSG algorithm uses five feature values from one heart beat cycle, and can be up to five times faster than the solely IPI-based methods. So, it achieves a design goal of low latency. According to our analysis, the complexity of the algorithm is comparable to that of fast Fourier transforms. These randomly generated ECG BSes can be used as security keys for encryption or authentication in a WBAN system.

  2. ITS-2 sequences-based identification of Trichogramma species in South America

    Directory of Open Access Journals (Sweden)

    R. P. Almeida

    Full Text Available Abstract ITS2 (Internal transcribed spacer 2 sequences have been used in systematic studies and proved to be useful in providing a reliable identification of Trichogramma species. DNAr sequences ranged in size from 379 to 632 bp. In eleven T. pretiosum lines Wolbachia-induced parthenogenesis was found for the first time. These thelytokous lines were collected in Peru (9, Colombia (1 and USA (1. A dichotomous key for species identification was built based on the size of the ITS2 PCR product and restriction analysis using three endonucleases (EcoRI, MseI and MaeI. This molecular technique was successfully used to distinguish among seventeen native/introduced Trichogramma species collected in South America.

  3. Intergeneric Classification of Genus Bulbophyllum from Peninsular Malaysia Based on Combined Morphological and RBCL Sequence Data

    International Nuclear Information System (INIS)

    Hosseini, S.; Dadkhah, K.

    2016-01-01

    Bulbophyllum Thou. is largest genus in Orchidaceae family and a well-known plant of tropical area. The present study provides a comparative morphological study of 38 Bulbophyllum spp. as well as molecular sequence analysis of large subunit of rubisco (rbcL), to infer the intergeneric classification for studied taxa of genus Bulbophyllum. Thirty morphological characters were coded in a data matrix, and used in phenetic analysis. Morphological result was strongly consistent with earlier classification, with exception of B. auratum, B. gracillimum, B. mutabile and B. limbatum status. Furthermore Molecular data analysis of rbcL was congruent with morphological data in some aspects. Species interrelationships specified using combination of rbcL sequence data with morphological data. The results revealed close affiliation in 11 sections of Bulbophyllum from Peninsular Malaysia. Consequently, based on this study generic status of sections Cirrhopetalum and Epicrianthes cannot longer be supported, as they are deeply embedded within the genus Bulbophyllum. (author)

  4. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  5. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...... in an automated fashion. Our analysis demonstrates the benefits and limitations of the clustering of proteins with low sequence similarity indicating that each protein family requires its own distinct set of tools and parameters. All results, a tool prediction service, and additional supporting material is also...

  6. Validation of risk stratification models in acute myeloid leukemia using sequencing-based molecular profiling.

    Science.gov (United States)

    Wang, M; Lindberg, J; Klevebring, D; Nilsson, C; Mer, A S; Rantalainen, M; Lehmann, S; Grönberg, H

    2017-10-01

    Risk stratification of acute myeloid leukemia (AML) patients needs improvement. Several AML risk classification models based on somatic mutations or gene-expression profiling have been proposed. However, systematic and independent validation of these models is required for future clinical implementation. We performed whole-transcriptome RNA-sequencing and panel-based deep DNA sequencing of 23 genes in 274 intensively treated AML patients (Clinseq-AML). We also utilized the The Cancer Genome Atlas (TCGA)-AML study (N=142) as a second validation cohort. We evaluated six previously proposed molecular-based models for AML risk stratification and two revised risk classification systems combining molecular- and clinical data. Risk groups stratified by five out of six models showed different overall survival in cytogenetic normal-AML patients in the Clinseq-AML cohort (P-value0.5). Risk classification systems integrating mutational or gene-expression data were found to add prognostic value to the current European Leukemia Net (ELN) risk classification. The prognostic value varied between models and across cohorts, highlighting the importance of independent validation to establish evidence of efficacy and general applicability. All but one model replicated in the Clinseq-AML cohort, indicating the potential for molecular-based AML risk models. Risk classification based on a combination of molecular and clinical data holds promise for improved AML patient stratification in the future.

  7. Including load sequence effects in the fatigue damage estimation of an offshore wind turbine substructure

    NARCIS (Netherlands)

    Dragt, R.C.; Maljaars, J.; Tuitman, J.T.; Salman, Y.; Otheguy, M.E.

    2016-01-01

    Retardation is a load sequence effect, which causes a reduced fatigue crack growth rate after an overload is encountered. Retardation can be cancelled when the overload is followed by an underload. The net effect is beneficial to the fatigue lifetime of Offshore Wind Turbines (OWTs). To be able to

  8. Molecular Phylogeny of Triticum and Aegilops Genera Based on ITS and MATK Sequence Data

    International Nuclear Information System (INIS)

    Dizkirici, A.; Kansu, C.; Onde, S.

    2016-01-01

    Understanding the phylogenetic relationship between Triticum and Aegilops species, which form a vast gene pool of wheat, is very important for breeding new cultivated wheat varieties. In the present study, phylogenetic relationships between Triticum (12 samples from 4 species) and Aegilops (24 samples from 8 species) were investigated using sequences of the nuclear ITS rDNA gene and partial sequences of the matK gene of chloroplast genome. The phylogenetic relationships among species were reconstructed using Maximum Likelihood method. The constructed tree based on the sequences of the nuclear component (ITS) displayed a close relationship between polyploid wheats and Aegilops speltoides species which provided new evidence for the source of the enigmatic B genome donor as Ae. speltoides. Concurrent clustering of Ae. cylindrica and Ae. tauschii and their close positioning to polyploid wheats pointed the source of the D genome as one of these species. As reported before, diploid Triticum species (i.e. T. urartu) were identified as the A genome donors and the positioning of these diploid wheats on the constructed tree are meaningful. The constructed tree based on the chloroplastic matK sequences displayed same relationship between polyploid wheats and Ae. speltoides species providing evidence for the later species being the chloroplast donors for polyploid wheats. Therefore, our results supported the idea of coinheritance of nuclear and chloroplast genomes where Ae. speltoides was the maternal donor. For both trees the remaining Aegilops species produced a distinct cluster whereas with the exception of T. urartu, diploid Triticum species displayed a monophyletic structure. (author)

  9. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  10. Genotyping of human neutrophil antigens by polymerase chain reaction sequence-based typing

    Science.gov (United States)

    He, Junjun; Zhang, Wei; Wang, Wei; Chen, Nanying; Han, Zhedong; He, Ji; Zhu, Faming; Lv, Hangjun

    2014-01-01

    Background Genotyping for human neutrophil antigen (HNA) systems is required in the investigation of disorders involving alloimmunisation to HNA. We established a polymerase chain reaction sequence-based typing method for genotyping HNA and determined the genotype and allele frequencies of HNA in the Zhejiang Han population of China. Materials and methods Four hundred, healthy unrelated Zhejiang Han individuals were recruited. Specific primers for HNA were designed and the polymerase chain reaction amplification conditions were optimised. Amplification amplicons were purified with enzyme digestion and then sequenced. Results The frequencies of the FCGR3B*01 and FCGR3B*02 alleles were 0.613 and 0.387; no FCGR3B*03 allele was found. The frequencies of the SLC44A2*1 and SLC44A2*2 alleles were 0.654 and 0.346, respectively, while the frequencies of the ITGAL*1 (HNA-5a) and ITGAL*2 (HNA-5b) alleles were 0.896 and 0.104. Only ITGAM*1 (HNA-4a) allele was found in this study. Six single nucleotide polymorphisms were confirmed on sequenced regions separate from HNA polymorphisms, including FCGR3B (IVS3+39G >A and IVS3+52G >A), CD177(172A >G), SLC44A2 (IVS5-44A >G and IVS7-15T >C) and ITGAM (IVS3+118T >C). Discussion The polymerase chain reaction sequence-based typing method for genotyping HNA is reliable. These data of HNA alleles frequencies could contribute to the analysis of alloimmunisation to HNA in China. PMID:23867183

  11. The Pawnee Sequence: Poroelastic Effects from Injection in Osage County, Oklahoma

    Science.gov (United States)

    Barbour, A. J.; Rubinstein, J. L.

    2016-12-01

    Aggregate multi-year records of wastewater injection in Oklahoma show that the strongest change in injection within 20 km of the 2016 M5.8 Pawnee strike-slip earthquake was in Osage County, where injection rates increased rapidly in late-2012 by nearly a factor of three above previous levels. After this increase, rates there declined steadily over two years to an average rate characteristic of all other injection wells in Pawnee and Noble Counties, remaining relatively constant until the beginning of the earthquake sequence. Here we test if poroelastic effects associated with this injection-rate transient can help explain the relative timing between peak injection rates and the beginning of the Pawnee sequence. Although the alternative hypothesis that regional-scale faults and fractures in critically stressed rock serve as fast-pathways for fluid diffusion cannot be ruled out, it appears to be difficult to reconcile based solely on injection data and space-time patterns for this seismic sequence. We simulate the cylindrically symmetric, transient strain and pore pressure fields for an injection-source time function emulating the injection history in a layered half-space in accordance with linear poroelasticity. In the simulation domain, injection occurs at depths of 1300 - 1900 m, into a homogeneous basal sedimentary reservoir representing the Arbuckle Group, overlying a semi-infinite layer representing granitic basement; we determined the hydraulic, elastic, and poroelastic properties of these layers from published literature. At the mainshock hypocenter, this numerical model predicts a delay between peak injection rates and pore pressure increase that is strongly dependent on hydraulic diffusivity; however, the duration is also controlled by the bulk elastic properties and the undrained Skempton's coefficient of the rock. Furthermore, because of fluid-strain coupling, pore pressures in the basement rock decrease during this delay period, which would tend to

  12. Identifying and calling insertions, deletions, and single-base mutations efficiently from sequence data

    Science.gov (United States)

    Whole genome sequencing studies can directly identify causative mutations for subsequent use in genomic evaluations, but sequence variant identification is a lengthy and sometimes inaccurate process. The speed and accuracy of identifying small insertions and deletions of sequence, collectively terme...

  13. RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences

    OpenAIRE

    Rubino, Francesco; Attimonelli, Marcella

    2009-01-01

    Background One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by...

  14. Genome sequence-based species delimitation with confidence intervals and improved distance functions

    Science.gov (United States)

    2013-01-01

    Background For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept. Results Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions. Conclusions Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the

  15. Sequence-dependent base-stacking stabilities guide tRNA folding energy landscapes.

    Science.gov (United States)

    Li, Rongzhong; Ge, Heming W; Cho, Samuel S

    2013-10-24

    The folding of bacterial tRNAs with disparate sequences has been observed to proceed in distinct folding mechanisms despite their structural similarity. To explore the folding landscapes of tRNA, we performed ion concentration-dependent coarse-grained TIS model MD simulations of several E. coli tRNAs to compare their thermodynamic melting profiles to the classical absorbance spectra of Crothers and co-workers. To independently validate our findings, we also performed atomistic empirical force field MD simulations of tRNAs, and we compared the base-to-base distances from coarse-grained and atomistic MD simulations to empirical base-stacking free energies. We then projected the free energies to the secondary structural elements of tRNA, and we observe distinct, parallel folding mechanisms whose differences can be inferred on the basis of their sequence-dependent base-stacking stabilities. In some cases, a premature, nonproductive folding intermediate corresponding to the Ψ hairpin loop must backtrack to the unfolded state before proceeding to the folded state. This observation suggests a possible explanation for the fast and slow phases observed in tRNA folding kinetics.

  16. The application of MutMap in forward genetic studies based on whole-genome sequencing.

    Science.gov (United States)

    Yuan, Jin Hong; Li, Jun Hua; Yuan, Jiao Jiao; Jia, Ke Li; Li, Shu Fen; Deng, Chuan Liang; Gao, Wu Jun

    2017-12-20

    Classical forward genetic analysis relies on construction of complicated progeny populations and development of many molecular markers for linkage analysis in genetic mapping, which is both time- and cost-consuming. The recently developed MutMap is a new forward genetic approach based on high-throughput next-generation sequencing technologies. It is more efficient and affordable than traditional methods. Moreover, new extended methods based on MutMap have been developed: MutMap+, which is based on self-crossing; MutMap-Gap, which is used to recognize the causative variations occurring in genome gap regions; QTL-seq, a method similar to MutMap for mapping quantitative trait loci. These methods are free from constructing complicated mapping population, genetic hybridization and linkage information. They have greatly accelerated the identification of genetic elements associated with interested phenotypic variation. Here, we review the basic principles of MutMap, and discuss their future applications in next generation sequencing-based forward genetic mapping and crop improvement.

  17. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...... on transcriptional evidence. Analysis of repetitive sequences suggests that they are underrepresented in the reference assembly, reflecting an enrichment of gene-rich regions in the current assembly. Characterization of Lotus natural variation by resequencing of L. japonicus accessions and diploid Lotus species...... is currently ongoing, facilitated by the MG20 reference sequence...

  18. Development of a graphene oxide-based assay for the sequence-specific detection of double-stranded DNA molecules.

    Directory of Open Access Journals (Sweden)

    Anna Maria Giuliodori

    Full Text Available Graphene oxide (GO is a promising material for the development of cost-effective detection systems. In this work, we have devised a simple and rapid GO-based method for the sequence-specific identification of DNA molecules generated by PCR amplification. The csp genes of Escherichia coli, which share a high degree of sequence identity, were selected as paradigm DNA templates. All tested csp genes were amplified with unlabelled primers, which can be rapidly removed at the end of the PCR taking advantage of the preferential binding to GO of single-stranded versus duplex DNA molecules. The amplified DNAs (targets were heat-denatured and hybridized to a fluorescently-labelled single strand oligonucleotide (probe, which recognizes a region of the target DNAs displaying sequence variability. This interaction is extremely specific, taking place with high efficiency only when target and probe show perfect or near perfect matching. Upon GO addition, the unbound fraction of the probe was captured and its fluorescence quenched by the GO's molecular properties. On the other hand, the probe-target complexes remained in solution and emitted a fluorescent signal whose intensity was related to their degree of complementarity.

  19. Feasibility of a RARE-based sequence for quantitative diffusion-weighted MRI of the spine

    International Nuclear Information System (INIS)

    Raya, J.G.; Dietrich, O.; Sommer, J.; Reiser, M.F.; Baur-Melnyk, A.; Birkenmaier, C.

    2007-01-01

    The feasibility of a diffusion-weighted single-shot fast-spin-echo sequence for the diagnostic work-up of bone marrow diseases was assessed. Twenty healthy controls and 16 patients with various bone marrow pathologies of the spine (bone marrow edema, tumor and inflammation) were examined with a diffusion-weighted single-shot sequence based on a modified rapid acquisition with relaxation enhancement (mRARE) technique; four diffusion weightings (b-values: 50, 250, 500 and 750 s/mm 2 ) in three orthogonal orientations were applied. Apparent diffusion coefficients (ADCs) were determined in the bone marrow and in the intervertebral discs of healthy volunteers and in diseased bone marrow. Ten of the 20 volunteers were repeatedly scanned within 30 min to examine short-time reproducibility. Spatial reproducibility was assessed by measuring ADCs in two different slices including the same lesion in 12 patients. The ADCs of the lesions exhibited significantly higher values, (1.27 ± 0.32) x 10 -3 mm 2 /s, compared with healthy bone marrow, (0.21 ± 0.10) x 10 -3 mm 2 /s. Short-time and spatial reproducibility had a mean coefficient of variation of 2.1% and 6.4%, respectively. The diffusion-weighted mRARE sequence provides a reliable tool for determining quantitative ADCs in vertebral bone marrow with adequate image quality. (orig.)

  20. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Angelova, Angelina [University of Arizona; Park, Sang-Hycuk [University of Arizona; Kyndt, John [Bellevue University; Fitzsimmons, Kevin [University of Arizona; Brown, Judith K [University of Arizona

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.

  1. The Teaching of Biochemistry: An Innovative Course Sequence Based on the Logic of Chemistry

    Science.gov (United States)

    Jakubowski, Henry V.; Owen, Whyte G.

    1998-06-01

    An innovative course sequence for the teaching of biochemistry is offered, which more truly reflects the common philosophy found in biochemistry texts: that the foundation of biological phenomena can best be understood through the logic of chemistry. Topic order is chosen to develop an emerging understanding that is based on chemical principles. Preeminent biological questions serve as a framework for the course. Lipid and lipid-aggregate structures are introduced first, since it is more logical to discuss the intermolecular association of simple amphiphiles to form micelle and bilayer formations than to discuss the complexities of protein structure/folding. Protein, nucleic acid, and carbohydrate structures are studied next. Binding, a noncovalent process and the simplest expression of macromolecular function, follows. The physical (noncovalent) transport of solute molecules across a biological membrane is studied next, followed by the chemical transformation of substrates by enzymes. These are logical extensions of the expression of molecular function, first involving a simpler (physical transport) and second, a more complex (covalent transformation) process. The final sequence involves energy and signal transduction. This unique course sequence emerges naturally when chemical logic is used as an organizing paradigm for structuring a biochemistry course. Traditional order, which seems to reflect historic trends in research, or even an order derived from the central dogma of biology can not provide this logical framework.

  2. Genotyping of B. licheniformis based on a novel multi-locus sequence typing (MLST scheme

    Directory of Open Access Journals (Sweden)

    Madslien Elisabeth H

    2012-10-01

    Full Text Available Abstract Background Bacillus licheniformis has for many years been used in the industrial production of enzymes, antibiotics and detergents. However, as a producer of dormant heat-resistant endospores B. licheniformis might contaminate semi-preserved foods. The aim of this study was to establish a robust and novel genotyping scheme for B. licheniformis in order to reveal the evolutionary history of 53 strains of this species. Furthermore, the genotyping scheme was also investigated for its use to detect food-contaminating strains. Results A multi-locus sequence typing (MLST scheme, based on the sequence of six house-keeping genes (adk, ccpA, recF, rpoB, spo0A and sucC of 53 B. licheniformis strains from different sources was established. The result of the MLST analysis supported previous findings of two different subgroups (lineages within this species, named “A” and “B” Statistical analysis of the MLST data indicated a higher rate of recombination within group “A”. Food isolates were widely dispersed in the MLST tree and could not be distinguished from the other strains. However, the food contaminating strain B. licheniformis NVH1032, represented by a unique sequence type (ST8, was distantly related to all other strains. Conclusions In this study, a novel and robust genotyping scheme for B. licheniformis was established, separating the species into two subgroups. This scheme could be used for further studies of evolution and population genetics in B. licheniformis.

  3. Mu-seq: Sequence-Based Mapping and Identification of Transposon Induced Mutations

    Science.gov (United States)

    McCarty, Donald R.; Latshaw, Sue; Wu, Shan; Suzuki, Masaharu; Hunter, Charles T.; Avigne, Wayne T.; Koch, Karen E.

    2013-01-01

    Mutations tagged by transposon insertions can be readily mapped and identified in organisms with sequenced genomes. Collections of such mutants allow a systematic analysis of gene function, and can be sequence-indexed to build invaluable resources. Here we present Mu-seq (Mutant-seq), a high-throughput NextGen sequencing method for harnessing high-copy transposons. We illustrate the efficacy of Mu-seq by applying it to the Robertson’s Mutator system in a large population of maize plants. A single Mu-seq library, for example, constructed from 576 different families (2304 plants), enabled 4, 723 novel, germinal, transposon insertions to be detected, identified, and mapped with single base-pair resolution. In addition to the specificity, efficiency, and reproducibility of Mu-seq, a key feature of this method is its adjustable scale that can accomodate simultaneous profiling of transposons in thousands of individuals. We also describe a Mu-seq bioinformatics framework tailored to high-throughput, genome-wide, and population-wide analysis of transposon insertions. PMID:24194867

  4. Moving target detection based on temporal-spatial information fusion for infrared image sequences

    Science.gov (United States)

    Toing, Wu-qin; Xiong, Jin-yu; Zeng, An-jun; Wu, Xiao-ping; Xu, Hao-peng

    2009-07-01

    Moving target detection and localization is one of the most fundamental tasks in visual surveillance. In this paper, through analyzing the advantages and disadvantages of the traditional approaches about moving target detection, a novel approach based on temporal-spatial information fusion is proposed for moving target detection. The proposed method combines the spatial feature in single frame and the temporal properties within multiple frames of an image sequence of moving target. First, the method uses the spatial image segmentation for target separation from background and uses the local temporal variance for extracting targets and wiping off the trail artifact. Second, the logical "and" operator is used to fuse the temporal and spatial information. In the end, to the fusion image sequence, the morphological filtering and blob analysis are used to acquire exact moving target. The algorithm not only requires minimal computation and memory but also quickly adapts to the change of background and environment. Comparing with other methods, such as the KDE, the Mixture of K Gaussians, etc., the simulation results show the proposed method has better validity and higher adaptive for moving target detection, especially in infrared image sequences with complex illumination change, noise change, and so on.

  5. Speech Motor Sequence Learning: Effect of Parkinson Disease and Normal Aging on Dual-Task Performance.

    Science.gov (United States)

    Whitfield, Jason A; Goberman, Alexander M

    2017-06-22

    Everyday communication is carried out concurrently with other tasks. Therefore, determining how dual tasks interfere with newly learned speech motor skills can offer insight into the cognitive mechanisms underlying speech motor learning in Parkinson disease (PD). The current investigation examines a recently learned speech motor sequence under dual-task conditions. A previously learned sequence of 6 monosyllabic nonwords was examined using a dual-task paradigm. Participants repeated the sequence while concurrently performing a visuomotor task, and performance on both tasks was measured in single- and dual-task conditions. The younger adult group exhibited little to no dual-task interference on the accuracy and duration of the sequence. The older adult group exhibited variability in dual-task costs, with the group as a whole exhibiting an intermediate, though significant, amount of dual-task interference. The PD group exhibited the largest degree of bidirectional dual-task interference among all the groups. These data suggest that PD affects the later stages of speech motor learning, as the dual-task condition interfered with production of the recently learned sequence beyond the effect of normal aging. Because the basal ganglia is critical for the later stages of motor sequence learning, the observed deficits may result from the underlying neural dysfunction associated with PD.

  6. Methanol-based fixation is superior to buffered formalin for next-generation sequencing of DNA from clinical cancer samples.

    Science.gov (United States)

    Piskorz, A M; Ennis, D; Macintyre, G; Goranova, T E; Eldridge, M; Segui-Gracia, N; Valganon, M; Hoyle, A; Orange, C; Moore, L; Jimenez-Linan, M; Millan, D; McNeish, I A; Brenton, J D

    2016-03-01

    Next-generation sequencing (NGS) of tumour samples is a critical component of personalised cancer treatment, but it requires high-quality DNA samples. Routine neutral-buffered formalin (NBF) fixation has detrimental effects on nucleic acids, causing low yields, as well as fragmentation and DNA base changes, leading to significant artefacts. We have carried out a detailed comparison of DNA quality from matched samples isolated from high-grade serous ovarian cancers from 16 patients fixed in methanol and NBF. These experiments use tumour fragments and mock biopsies to simulate routine practice, ensuring that results are applicable to standard clinical biopsies. Using matched snap-frozen tissue as gold standard comparator, we show that methanol-based fixation has significant benefits over NBF, with greater DNA yield, longer fragment size and more accurate copy-number calling using shallow whole-genome sequencing (WGS). These data also provide a new approach to understand and quantify artefactual effects of fixation using non-negative matrix factorisation to analyse mutational spectra from targeted and WGS data. We strongly recommend the adoption of methanol fixation for sample collection strategies in new clinical trials. This approach is immediately available, is logistically simple and can offer cheaper and more reliable mutation calling than traditional NBF fixation. © The Author 2015. Published by Oxford University Press on behalf of the European Society for Medical Oncology.

  7. HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.

    Science.gov (United States)

    Le, Thanh; Altman, Tom; Gardiner, Katheleen

    2010-02-01

    Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.

  8. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

    Science.gov (United States)

    Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

    2004-10-01

    Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and

  9. Armillaria phylogeny based on tef-1α sequences suggests ongoing divergent speciation within the boreal floristic kingdom

    Science.gov (United States)

    Ned B. Klopfenstein; John W. Hanna; Amy L. Ross-Davis; Jane E. Stewart; Yuko Ota; Rosario Medel-Ortiz; Miguel Armando Lopez-Ramirez; Ruben Damian Elias-Roman; Dionicio Alvarado-Rosales; Mee-Sook Kim

    2013-01-01

    Armillaria plays diverse ecological roles in forests worldwide, which has inspired interest in understanding phylogenetic relationships within and among species of this genus. Previous rDNA sequence-based phylogenetic analyses of Armillaria have shown general relationships among widely divergent taxa, but rDNA sequences were not reliable for separating closely related...

  10. Detection and quantification of Plasmodium falciparum in blood samples using quantitative nucleic acid sequence-based amplification

    NARCIS (Netherlands)

    Schoone, G. J.; Oskam, L.; Kroon, N. C.; Schallig, H. D.; Omar, S. A.

    2000-01-01

    A quantitative nucleic acid sequence-based amplification (QT-NASBA) assay for the detection of Plasmodium parasites has been developed. Primers and probes were selected on the basis of the sequence of the small-subunit rRNA gene. Quantification was achieved by coamplification of the RNA in the

  11. Next Generation Sequencing-Based Analysis of Repetitive DNA in the Model Dioceous Plant Silene latifolia

    Czech Academy of Sciences Publication Activity Database

    Macas, Jiří; Kejnovský, Eduard; Neumann, Pavel; Novák, Petr; Koblížková, Andrea; Vyskot, Boris

    2011-01-01

    Roč. 6, č. 11 (2011), e27335 E-ISSN 1932-6203 R&D Projects: GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004; GA MŠk(CZ) LH11058; GA ČR(CZ) GAP501/10/0102; GA ČR(CZ) GAP305/10/0930 Institutional research plan: CEZ:AV0Z50510513; CEZ:AV0Z50040702 Keywords : Plant genome * Sequencing-Based Analyses * Repetitive DNA * Silene latifolia Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.092, year: 2011

  12. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  13. Automated family-based naming of small RNAs for next generation sequencing data using a modified MD5-digest algorithm

    OpenAIRE

    Liu, Guodong; Li, Zhihua; Lin, Yuefeng; John, Bino

    2012-01-01

    We developed NameMyGene, a web tool and a stand alone program to easily generate putative family-based names for small RNA sequences so that laboratories can easily organize, analyze, and observe patterns from, the massive amount of data generated by next-generation sequencers. NameMyGene, also applicable to other emerging methods such as RNA-Seq, and Chip-Seq, solely uses the input small RNA sequence and does not require any additional data such as other sequence data sets. The web server an...

  14. Sequence-based analysis of the microbial composition of water kefir from multiple sources.

    Science.gov (United States)

    Marsh, Alan J; O'Sullivan, Orla; Hill, Colin; Ross, R Paul; Cotter, Paul D

    2013-11-01

    Water kefir is a water-sucrose-based beverage, fermented by a symbiosis of bacteria and yeast to produce a final product that is lightly carbonated, acidic and that has a low alcohol percentage. The microorganisms present in water kefir are introduced via water kefir grains, which consist of a polysaccharide matrix in which the microorganisms are embedded. We aimed to provide a comprehensive sequencing-based analysis of the bacterial population of water kefir beverages and grains, while providing an initial insight into the corresponding fungal population. To facilitate this objective, four water kefirs were sourced from the UK, Canada and the United States. Culture-independent, high-throughput, sequencing-based analyses revealed that the bacterial fraction of each water kefir and grain was dominated by Zymomonas, an ethanol-producing bacterium, which has not previously been detected at such a scale. The other genera detected were representatives of the lactic acid bacteria and acetic acid bacteria. Our analysis of the fungal component established that it was comprised of the genera Dekkera, Hanseniaspora, Saccharomyces, Zygosaccharomyces, Torulaspora and Lachancea. This information will assist in the ultimate identification of the microorganisms responsible for the potentially health-promoting attributes of these beverages. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  15. An exponential combination procedure for set-based association tests in sequencing studies.

    Science.gov (United States)

    Chen, Lin S; Hsu, Li; Gamazon, Eric R; Cox, Nancy J; Nicolae, Dan L

    2012-12-07

    State-of-the-art next-generation-sequencing technologies can facilitate in-depth explorations of the human genome by investigating both common and rare variants. For the identification of genetic factors that are associated with disease risk or other complex phenotypes, methods have been proposed for jointly analyzing variants in a set (e.g., all coding SNPs in a gene). Variants in a properly defined set could be associated with risk or phenotype in a concerted fashion, and by accumulating information from them, one can improve power to detect genetic risk factors. Many set-based methods in the literature are based on statistics that can be written as the summation of variant statistics. Here, we propose taking the summation of the exponential of variant statistics as the set summary for association testing. From both Bayesian and frequentist perspectives, we provide theoretical justification for taking the sum of the exponential of variant statistics because it is particularly powerful for sparse alternatives-that is, compared with the large number of variants being tested in a set, only relatively few variants are associated with disease risk-a distinctive feature of genetic data. We applied the exponential combination gene-based test to a sequencing study in anticancer pharmacogenomics and uncovered mechanistic insights into genes and pathways related to chemotherapeutic susceptibility for an important class of oncologic drugs. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  16. Molecular phylogeny of Toxoplasmatinae: comparison between inferences based on mitochondrial and apicoplast genetic sequences

    Directory of Open Access Journals (Sweden)

    Michelle Klein Sercundes

    2016-03-01

    Full Text Available Abstract Phylogenies within Toxoplasmatinae have been widely investigated with different molecular markers. Here, we studied molecular phylogenies of the Toxoplasmatinae subfamily based on apicoplast and mitochondrial genes. Partial sequences of apicoplast genes coding for caseinolytic protease (clpC and beta subunit of RNA polymerase (rpoB, and mitochondrial gene coding for cytochrome B (cytB were analyzed. Laboratory-adapted strains of the closely related parasites Sarcocystis falcatula and Sarcocystis neurona were investigated, along with Neospora caninum, Neospora hughesi, Toxoplasma gondii (strains RH, CTG and PTG, Besnoitia akodoni, Hammondia hammondiand two genetically divergent lineages of Hammondia heydorni. The molecular analysis based on organellar genes did not clearly differentiate between N. caninum and N. hughesi, but the two lineages of H. heydorni were confirmed. Slight differences between the strains of S. falcatula and S. neurona were encountered in all markers. In conclusion, congruent phylogenies were inferred from the three different genes and they might be used for screening undescribed sarcocystid parasites in order to ascertain their phylogenetic relationships with organisms of the family Sarcocystidae. The evolutionary studies based on organelar genes confirm that the genusHammondia is paraphyletic. The primers used for amplification of clpC and rpoB were able to amplify genetic sequences of organisms of the genus Sarcocystisand organisms of the subfamily Toxoplasmatinae as well.

  17. Implicit Structured Sequence Learning: An FMRI Study of the Structural Mere-Exposure Effect

    Directory of Open Access Journals (Sweden)

    Vasiliki eFolia

    2014-02-01

    Full Text Available In this event-related FMRI study we investigated the effect of five days of implicit acquisition on preference classification by means of an artificial grammar learning (AGL paradigm based on the structural mere-exposure effect and preference classification using a simple right-linear unification grammar. This allowed us to investigate implicit AGL in a proper learning design by including baseline measurements prior to grammar exposure. After 5 days of implicit acquisition, the FMRI results showed activations in a network of brain regions including the inferior frontal (centered on BA 44/45 and the medial prefrontal regions (centered on BA 8/32. Importantly, and central to this study, the inclusion of a naive preference FMRI baseline measurement allowed us to conclude that these FMRI findings were the intrinsic outcomes of the learning process itself and not a reflection of a preexisting functionality recruited during classification, independent of acquisition. Support for the implicit nature of the knowledge utilized during preference classification on day 5 come from the fact that the basal ganglia, associated with implicit procedural learning, were activated during classification, while the medial temporal lobe system, associated with explicit declarative memory, was consistently deactivated. Thus, preference classification in combination with structural mere-exposure can be used to investigate structural sequence processing (syntax in unsupervised AGL paradigms with proper learning designs.

  18. A sampling and metagenomic sequencing-based methodology for monitoring antimicrobial resistance in swine herds

    DEFF Research Database (Denmark)

    Munk, Patrick; Dalhoff Andersen, Vibe; de Knegt, Leonardo

    2016-01-01

    Objectives Reliable methods for monitoring antimicrobial resistance (AMR) in livestock and other reservoirs are essential to understand the trends, transmission and importance of agricultural resistance. Quantification of AMR is mostly done using culture-based techniques, but metagenomic read...... on known antimicrobial consumption in 10 Danish integrated slaughter pig herds. In addition, we evaluated whether fresh or manure floor samples constitute suitable proxies for intestinal sampling, using cfu counting, qPCR and metagenomic shotgun sequencing. Results Metagenomic read-mapping outperformed...... cultivation-based techniques in terms of predicting expected tetracycline resistance based on antimicrobial consumption. Our metagenomic approach had sufficient resolution to detect antimicrobial-induced changes to individual resistance gene abundances. Pen floor manure samples were found to represent rectal...

  19. Estimation of contemporary effective population size and population declines using RAD sequence data.

    Science.gov (United States)

    Nunziata, Schyler O; Weisrock, David W

    2018-03-01

    Large genomic data sets generated with restriction site-associated DNA sequencing (RADseq), in combination with demographic inference methods, are improving our ability to gain insights into the population history of species. We used a simulation approach to examine the potential for RADseq data sets to accurately estimate effective population size (N e ) over the course of stable and declining population trends, and we compare the ability of two methods of analysis to accurately distinguish stable from steadily declining populations over a contemporary time scale (20 generations). Using a linkage disequilibrium-based analysis, individual sampling (i.e., n ≥ 30) had the greatest effect on N e estimation and the detection of population size declines, with declines reliably detected across scenarios ~10 generations after they began. Coalescent-based inference required fewer sampled individuals (i.e., n = 15), and instead was most influenced by the size of the SNP data set, with 25,000-50,000 SNPs required for accurate detection of population trends and at least 20 generations after decline began. The number of samples available and targeted number of RADseq loci are important criteria when choosing between these methods. Neither method suffered any apparent bias due to the effects of allele dropout typical of RAD data. With an understanding of the limitations and biases of these approaches, researchers can make more informed decisions when designing their sampling and analyses. Overall, our results reveal that demographic inference using RADseq data can be successfully applied to infer recent population size change and may be an important tool for population monitoring and conservation biology.

  20. Genomic clones of bovine parvovirus: Construction and effect of deletions and terminal sequence inversions on infectivity

    Energy Technology Data Exchange (ETDEWEB)

    Shull, B.C.; Chen, K.C.; Lederman, M.; Stout, E.R.; Bates, R.C. (Virginia Polytechnic Institute and State Univ., Blacksburg (USA))

    1988-02-01

    Genomic clones of the autonomous parvovirus bovine parvovirus (BPV) were constructed by blunt-end ligation of reannealed virion plus and minus DNA strands into the plasmid pUC8. These clones were stable during propagation in Escherichia coli JM107. All clones tested were found to be infectious by the criteria of plaque titer and progressive cytophathic effect after transfection into bovine fetal lung cells. Sequencing of the recombinant plasmids demonstrated that all of the BPV inserts had left-end (3{prime})-terminal deletions of up to 34 bases. Defective genomes could also be detected in the progeny DNA even though the infection was initiated with homogeneous, cloned DNA. Full-length genomic clones with 3{prime} flip and 3{prime} flop conformations were constructed and were found to have equal infectivity. Expression of capsid proteins from tranfected genomes was demonstrated by hemagglutination, indirect immunofluorescence, and immunoprecipitation of ({sup 35}S)methionine-labeled cell lysates. Use of appropriate antiserum for immunoprecipitation showed the synthesis of BPV capsid and noncapsid proteins after transfection. Independently, a series of genomic clones with increasingly larger 3{prime}-terminal deletions was prepared from separately subcloned 3{prime}-terminal fragments. Transfection of these clones into bovine fetal lung cells revealed that deletions of up to 34 bases at the 3{prime} end lowered but did not abolish infectivity, while deletions of greater than 52 bases were lethal. End-label analysis showed that the 34-base deletion was repaired to wild-type length in the progeny virus.

  1. Real sequence effects on the search dynamics of transcription factors on DNA

    DEFF Research Database (Denmark)

    Bauer, Maximilian; Rasmussen, Emil S.; Lomholt, Michael A.

    2015-01-01

    analysis we study the TF-sliding motion for a large section of the DNA-sequence of a common E. coli strain, based on the two-state TF-model with a fast-sliding search state and a recognition state enabling target detection. For the probability to detect the target before dissociating from DNA the TF-search...... times self-consistently depend heavily on whether or not an auxiliary operator (an accessible sequence similar to the main operator) is present in the genome section. Importantly, within our model the extent to which the interconversion rates between search and recognition states depend...

  2. Predicting sumoylation sites using support vector machines based on various sequence features, conformational flexibility and disorder.

    Science.gov (United States)

    Yavuz, Ahmet Sinan; Sezerman, Osman Ugur

    2014-01-01

    Sumoylation, which is a reversible and dynamic post-translational modification, is one of the vital processes in a cell. Before a protein matures to perform its function, sumoylation may alter its localization, interactions, and possibly structural conformation. Abberations in protein sumoylation has been linked with a variety of disorders and developmental anomalies. Experimental approaches to identification of sumoylation sites may not be effective due to the dynamic nature of sumoylation, laborsome experiments and their cost. Therefore, computational approaches may guide experimental identification of sumoylation sites and provide insights for further understanding sumoylation mechanism. In this paper, the effectiveness of using various sequence properties in predicting sumoylation sites was investigated with statistical analyses and machine learning approach employing support vector machines. These sequence properties were derived from windows of size 7 including position-specific amino acid composition, hydrophobicity, estimated sub-window volumes, predicted disorder, and conformational flexibility. 5-fold cross-validation results on experimentally identified sumoylation sites revealed that our method successfully predicts sumoylation sites with a Matthew's correlation coefficient, sensitivity, specificity, and accuracy equal to 0.66, 73%, 98%, and 97%, respectively. Additionally, we have showed that our method compares favorably to the existing prediction methods and basic regular expressions scanner. By using support vector machines, a new, robust method for sumoylation site prediction was introduced. Besides, the possible effects of predicted conformational flexibility and disorder on sumoylation site recognition were explored computationally for the first time to our knowledge as an additional parameter that could aid in sumoylation site prediction.

  3. iTriplet, a rule-based nucleic acid sequence motif finder

    Directory of Open Access Journals (Sweden)

    Gunderson Samuel I

    2009-10-01

    Full Text Available Abstract Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.

  4. A new trilocus sequence-based multiplex-PCR to detect major Acinetobacter baumannii clones.

    Science.gov (United States)

    Martins, Natacha; Picão, Renata Cristina; Cerqueira-Alves, Morgana; Uehara, Aline; Barbosa, Lívia Carvalho; Riley, Lee W; Moreira, Beatriz Meurer

    2016-08-01

    A collection of 163 Acinetobacter baumannii isolates detected in a large Brazilian hospital, was potentially related with the dissemination of four clonal complexes (CC): 113/79, 103/15, 109/1 and 110/25, defined by University of Oxford/Institut Pasteur multilocus sequence typing (MLST) schemes. The urge of a simple multiplex-PCR scheme to specify these clones has motivated the present study. The established trilocus sequence-based typing (3LST, for ompA, csuE and blaOXA-51-like genes) multiplex-PCR rapidly identifies international clones I (CC109/1), II (CC118/2) and III (CC187/3). Thus, the system detects only one (CC109/1) out of four main CC in Brazil. We aimed to develop an alternative multiplex-PCR scheme to detect these clones, known to be present additionally in Africa, Asia, Europe, USA and South America. MLST, performed in the present study to complement typing our whole collection of isolates, confirmed that all isolates belonged to the same four CC detected previously. When typed by 3LST-based multiplex-PCR, only 12% of the 163 isolates were classified into groups. By comparative sequence analysis of ompA, csuE and blaOXA-51-like genes, a set of eight primers was designed for an alternative multiplex-PCR to distinguish the five CC 113/79, 103/15, 109/1, 110/25 and 118/2. Study isolates and one CC118/2 isolate were blind-tested with the new alternative PCR scheme; all were correctly clustered in groups of the corresponding CC. The new multiplex-PCR, with the advantage of fitting in a single reaction, detects five leading A. baumannii clones and could help preventing the spread in healthcare settings. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Effects of Representation Sequences and Spatial Ability on Students' Scientific Understandings about the Mechanism of Breathing

    Science.gov (United States)

    Wu, Hsin-Kai; Lin, Yu-Fen; Hsu, Ying-Shao

    2013-01-01

    The purpose of this study was to investigate the effects of representation sequences and spatial ability on students' scientific understandings about the mechanism of breathing in human beings. 130 seventh graders were assigned to two groups with different sequential combinations of static and dynamic representations: SD group (i.e., viewing…

  6. Picture or Text First? Explaining Sequence Effects When Learning with Pictures and Text

    Science.gov (United States)

    Eitel, Alexander; Scheiter, Katharina

    2015-01-01

    The present article reviews 42 studies investigating the role of sequencing of text and pictures for learning outcomes. Whereas several of the reviewed studies revealed better learning outcomes from presenting the picture before the text rather than after it, other studies demonstrated the opposite effect. Against the backdrop of theories on…

  7. Effect of stacking sequence on the erosive wear behavior of jute and ...

    African Journals Online (AJOL)

    Effect of stacking sequence on the erosive wear behavior of jute and juteglass fabric reinforced epoxy composite. ... morphology of the eroded surface was examined by SEM.It is conclude from the study that the erosive wear behavior of natural fiber jute can be improved significantly by hybridizing with synthetic fiber glass.

  8. Diesel engine exhaust initiates a sequence of pulmonary and cardiovascular effects in rats

    NARCIS (Netherlands)

    Kooter, I.M.; Gerlofs-Nijland, M.E.; Boere, A.J.F.; Leseman, D.L.A.C.; Fokkens, P.H.B.; Spronk, H.M.H.; Frederix, K.; Ten Cate, H.; Knaapen, A.M.; Vreman, H.J.; Cassee, F.R.

    2010-01-01

    This study was designed to determine the sequence of events leading to cardiopulmonary effects following acute inhalation of diesel engine exhaust in rats. Rats were exposed for 2h to diesel engine exhaust (1.9mg/m3), and biological parameters related to antioxidant defense, inflammation,

  9. Changes in DNA base sequence induced by gamma-ray mutagenesis of lambda phage and prophage

    Energy Technology Data Exchange (ETDEWEB)

    Tindall, K.R.; Stein, J.; Hutchinson, F.

    1988-04-01

    Mutations in the cI (repressor) gene were induced by gamma-ray irradiation of lambda phage and of prophage, and 121 mutations were sequenced. Two-thirds of the mutations in irradiated phage assayed in recA host cells (no induction of the SOS response) were G:C to A:T transitions; it is hypothesized that these may arise during DNA replication from adenine mispairing with a cytosine product deaminated by irradiation. For irradiated phage assayed in host cells in which the SOS response had been induced, 85% of the mutations were base substitutions, and in 40 of the 41 base changes, a preexisting base pair had been replaced by an A:T pair; these might come from damaged bases acting as AP (apurinic or apyrimidinic) sites. The remaining mutations were 1 and 2 base deletions. In irradiated prophage, base change mutations involved the substitution of both A:T and of G:C pairs for the preexisting pairs; the substitution of G:C pairs shows that some base substitution mechanism acts on the cell genome but not on the phage. In the irradiated prophage, frameshifts and a significant number of gross rearrangements were also found.

  10. The first Illumina-based de novo transcriptome sequencing and analysis of safflower flowers.

    Directory of Open Access Journals (Sweden)

    Huang Lulin

    Full Text Available BACKGROUND: The safflower, Carthamus tinctorius L., is a worldwide oil crop, and its flowers, which have a high flavonoid content, are an important medicinal resource against cardiovascular disease in traditional medicine. Because the safflower has a large and complex genome, the development of its genomic resources has been delayed. Second-generation Illumina sequencing is now an efficient route for generating an enormous volume of sequences that can represent a large number of genes and their expression levels. METHODOLOGY/PRINCIPAL FINDINGS: To investigate the genes and pathways that might control flavonoids and other secondary metabolites in the safflower, we used Illumina sequencing to perform a de novo assembly of the safflower tubular flower tissue transcriptome. We obtained a total of 4.69 Gb in clean nucleotides comprising 52,119,104 clean sequencing reads, 195,320 contigs, and 120,778 unigenes. Based on similarity searches with known proteins, we annotated 70,342 of the unigenes (about 58% of the identified unigenes with cut-off E-values of 10(-5. In total, 21,943 of the safflower unigenes were found to have COG classifications, and BLAST2GO assigned 26,332 of the unigenes to 1,754 GO term annotations. In addition, we assigned 30,203 of the unigenes to 121 KEGG pathways. When we focused on genes identified as contributing to flavonoid biosynthesis and the biosynthesis of unsaturated fatty acids, which are important pathways that control flower and seed quality, respectively, we found that these genes were fairly well conserved in the safflower genome compared to those of other plants. CONCLUSIONS/SIGNIFICANCE: Our study provides abundant genomic data for Carthamus tinctorius L. and offers comprehensive sequence resources for studying the safflower. We believe that these transcriptome datasets will serve as an important public information platform to accelerate studies of the safflower genome, and may help us define the mechanisms of

  11. Effects of mass loss on the evolution of massive stars. I. Main-sequence evolution

    International Nuclear Information System (INIS)

    Dearborn, D.S.P.; Blake, J.B.; Hainebach, K.L.; Schramm, D.N.

    1978-01-01

    The effect of mass loss on the evolution and surface composition of massive stars during main-sequence evolution are examined. While some details of the evolutionary track depend on the formula used for the mass loss, the results appear most sensitive to the total mass removed during the main-sequence lifetime. It was found that low mass-loss rates have very little effect on the evolution of a star; the track is slightly subluminous, but the lifetime is almost unaffected. High rates of mass loss lead to a hot, high-luminosity stellar model with a helium core surrounded by a hydrogen-deficient (Xapprox.0.1) envelope. The main-sequence lifetime is extended by a factor of 2--3. These models may be identified with Wolf-Rayet stars. Between these mass-loss extremes are intermediate models which appear as OBN stars on the main sequence. The mass-loss rates required for significant observable effects range from 8 x 10 -7 to 10 -5 M/sub sun/ yr -1 , depending on the initial stellar mass. It is found that observationally consistent mass-loss rates for stars with M> or =30 M/sub sun/ may be sufficiently high that these stars lose mass on a time scale more rapidly than their main-sequence core evolution time. This result implies that the helium cores resulting from the main-sequence evolution of these massive stars may all be very similar to that of a star of Mapprox.30 M/sub sun/ regardless of the zero-age mass

  12. Carbon nanotube-based lateral flow biosensor for sensitive and rapid detection of DNA sequence.

    Science.gov (United States)

    Qiu, Wanwei; Xu, Hui; Takalkar, Sunitha; Gurung, Anant S; Liu, Bin; Zheng, Yafeng; Guo, Zebin; Baloda, Meenu; Baryeh, Kwaku; Liu, Guodong

    2015-02-15

    In this article, we describe a carbon nanotube (CNT)-based lateral flow biosensor (LFB) for rapid and sensitive detection of DNA sequence. Amine-modified DNA detection probe was covalently immobilized on the shortened multi-walled carbon nanotubes (MWCNTs) via diimide-activated amidation between the carboxyl groups on the CNT surface and amine groups on the detection DNA probes. Sandwich-type DNA hybridization reactions were performed on the LFB and the captured MWCNTs on test zone and control zone of LFB produced the characteristic black bands, enabling visual detection of DNA sequences. Combining the advantages of lateral flow chromatographic separation with unique physical properties of MWCNT (large surface area), the optimized LFB was capable of detecting of 0.1 nM target DNA without instrumentation. Quantitative detection could be realized by recording the intensity of the test line with the Image J software, and the detection limit of 40 pM was obtained. This detection limit is 12.5 times lower than that of gold nanoparticle (GNP)-based LFB (0.5 nM, Mao et al. Anal. Chem. 2009, 81, 1660-1668). Another important feature is that the preparation of MWCNT-DNA conjugates was robust and the use of MWCNT labels avoided the aggregation of conjugates and tedious preparation time, which were often met in the traditional GNP-based nucleic acid LFB. The applications of MWCNT-based LFB can be extended to visually detect protein biomarkers using MWCNT-antibody conjugates. The MWCNT-based LFB thus open a new door to prepare a new generation of LFB, and shows great promise for in-field and point-of-care diagnosis of genetic diseases and for the detection of infectious agents. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Application of Sequence-based Methods in Human MicrobialEcology

    Energy Technology Data Exchange (ETDEWEB)

    Weng, Li; Rubin, Edward M.; Bristow, James

    2005-08-29

    Ecologists studying microbial life in the environment have recognized the enormous complexity of microbial diversity for many years, and the development of a variety of culture-independent methods, many of them coupled with high-throughput DNA sequencing, has allowed this diversity to be explored in ever greater detail. Despite the widespread application of these new techniques to the characterization of uncultivated microbes and microbial communities in the environment, their application to human health and disease has lagged behind. Because DNA based-techniques for defining uncultured microbes allow not only cataloging of microbial diversity, but also insight into microbial functions, investigators are beginning to apply these tools to the microbial communities that abound on and within us, in what has aptly been called the second Human Genome Project. In this review we discuss the sequence-based methods for microbial analysis that are currently available and their application to identify novel human pathogens, improve diagnosis of known infectious diseases, and to advance understanding of our relationship with microbial communities that normally reside in and on the human body.

  14. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  15. MuffinInfo: HTML5-Based Statistics Extractor from Next-Generation Sequencing Data.

    Science.gov (United States)

    Alic, Andy S; Blanquer, Ignacio

    2016-09-01

    Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics.

  16. Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach.

    Science.gov (United States)

    Pai, Priyadarshini P; Dash, Tirtharaj; Mondal, Sukanta

    2017-04-07

    Protein interactions with ribonucleic acids (RNA) are well-known to be crucial for a wide range of cellular processes such as transcriptional regulation, protein synthesis or translation, and post-translational modifications. Identification of the RNA-interacting residues can provide insights into these processes and aid in relevant biotechnological manipulations. Owing to their eventual potential in combating diseases and industrial production, several computational attempts have been made over years using sequence- and structure-based information. Recent comparative studies suggest that despite these developments, many problems are faced with respect to the usability, prerequisites, and accessibility of various tools, thereby calling for an alternative approach and perspective supplementation in the prediction scenario. With this motivation, in this paper, we propose the use of a simple-yet-efficient conditional probabilistic approach based on the application of local occurrence of amino acids in the interacting region in a non-numeric sequence feature space, for discriminating between RNA interacting and non-interacting residues. The proposed method has been meticulously tested for robustness using a cross-estimation method showing MCC of 0.341 and F- measure of 66.84%. Upon exploring large scale applications using benchmark datasets available to date, this approach showed an encouraging performance comparable with the state-of-art. The software is available at https://github.com/ABCgrp/DORAEMON. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  18. Iteration and superposition encryption scheme for image sequences based on multi-dimensional keys

    Science.gov (United States)

    Han, Chao; Shen, Yuzhen; Ma, Wenlin

    2017-12-01

    An iteration and superposition encryption scheme for image sequences based on multi-dimensional keys is proposed for high security, big capacity and low noise information transmission. Multiple images to be encrypted are transformed into phase-only images with the iterative algorithm and then are encrypted by different random phase, respectively. The encrypted phase-only images are performed by inverse Fourier transform, respectively, thus new object functions are generated. The new functions are located in different blocks and padded zero for a sparse distribution, then they propagate to a specific region at different distances by angular spectrum diffraction, respectively and are superposed in order to form a single image. The single image is multiplied with a random phase in the frequency domain and then the phase part of the frequency spectrums is truncated and the amplitude information is reserved. The random phase, propagation distances, truncated phase information in frequency domain are employed as multiple dimensional keys. The iteration processing and sparse distribution greatly reduce the crosstalk among the multiple encryption images. The superposition of image sequences greatly improves the capacity of encrypted information. Several numerical experiments based on a designed optical system demonstrate that the proposed scheme can enhance encrypted information capacity and make image transmission at a highly desired security level.

  19. Functional diversity of microbial communities in pristine aquifers inferred by PLFA- and sequencing-based approaches

    Science.gov (United States)

    Schwab, Valérie F.; Herrmann, Martina; Roth, Vanessa-Nina; Gleixner, Gerd; Lehmann, Robert; Pohnert, Georg; Trumbore, Susan; Küsel, Kirsten; Totsche, Kai U.

    2017-05-01

    Microorganisms in groundwater play an important role in aquifer biogeochemical cycles and water quality. However, the mechanisms linking the functional diversity of microbial populations and the groundwater physico-chemistry are still not well understood due to the complexity of interactions between surface and subsurface. Within the framework of Hainich (north-western Thuringia, central Germany) Critical Zone Exploratory of the Collaborative Research Centre AquaDiva, we used the relative abundances of phospholipid-derived fatty acids (PLFAs) to link specific biochemical markers within the microbial communities to the spatio-temporal changes of the groundwater physico-chemistry. The functional diversities of the microbial communities were mainly correlated with groundwater chemistry, including dissolved O2, Fet and NH4+ concentrations. Abundances of PLFAs derived from eukaryotes and potential nitrite-oxidizing bacteria (11Me16:0 as biomarker for Nitrospira moscoviensis) were high at sites with elevated O2 concentration where groundwater recharge supplies bioavailable substrates. In anoxic groundwaters more rich in Fet, PLFAs abundant in sulfate-reducing bacteria (SRB), iron-reducing bacteria and fungi increased with Fet and HCO3- concentrations, suggesting the occurrence of active iron reduction and the possible role of fungi in meditating iron solubilization and transport in those aquifer domains. In more NH4+-rich anoxic groundwaters, anammox bacteria and SRB-derived PLFAs increased with NH4+ concentration, further evidencing the dependence of the anammox process on ammonium concentration and potential links between SRB and anammox bacteria. Additional support of the PLFA-based bacterial communities was found in DNA- and RNA-based Illumina MiSeq amplicon sequencing of bacterial 16S rRNA genes, which showed high predominance of nitrite-oxidizing bacteria Nitrospira, e.g. Nitrospira moscoviensis, in oxic aquifer zones and of anammox bacteria in more NH4+-rich

  20. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    Directory of Open Access Journals (Sweden)

    Manzini Giovanni

    2007-07-01

    Full Text Available Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity, NCD (Normalized Compression Dissimilarity and CD (Compression Dissimilarity. Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC

  1. Comparison of nucleic acid sequence-based amplification and loop-mediated isothermal amplification for diagnosis of human African trypanosomiasis

    NARCIS (Netherlands)

    Mugasa, Claire M.; Katiti, Diana; Boobo, Alex; Lubega, George W.; Schallig, Henk D. F. H.; Matovu, Enock

    2014-01-01

    Diagnosis of human African trypanosomiasis (HAT) using molecular tests should ideally achieve high sensitivity without compromising specificity. This study compared 2 simplified tests, nucleic acid sequence-based amplification (NASBA) combined with oligochromatography (OC) and loop-mediated

  2. Retrospective Evaluations of Sequences: Testing the Predictions of a Memory-Based Analysis.

    Science.gov (United States)

    Aldrovandi, Silvio; Poirier, Marie; Kusev, Petko; Ayton, Peter

    2015-01-01

    Retrospective evaluation (RE) of event sequences is known to be biased in various ways. The present paper presents a series of studies that examined the suggestion that the moments that are the most accessible in memory at the point of RE contribute to these biases. As predicted by this memory-based analysis, Experiment 1 showed that pleasantness ratings of word lists were biased by the presentation position of a negative item and by how easy the negative information was to retrieve. Experiment 2 ruled out the hypothesis that these findings were due to the dual nature of the task called upon. Experiment 3 further manipulated the memorability of the negative items--and corresponding changes in RE were as predicted. Finally, Experiment 4 extended the findings to more complex stimuli involving event narratives. Overall, the results suggest that assessments were adjusted based on the retrieval of the most readily available information.

  3. Software and Hardware Solutions for Channel Estimation based on Cyclic Golay Sequences

    Directory of Open Access Journals (Sweden)

    B. Csuka

    2016-12-01

    Full Text Available This paper presents channel estimation methods based on cyclic complementary Golay sequences. First, the conventional Golay correlator is investigated, then a frequency domain approach using Discrete Fourier Transform (DFT is provided. A complex valued fast Golay correlator is introduced which can be used for the estimation of complex valued channel impulse response. Furthermore, this paper presents the Recursive DFT (R-DFT, a signal processing architecture which may be beneficial compared to the well-known Fast Fourier Transform (FFT. The R-DFT is able to efficiently calculate a point-by-point block spectra of the input signal, which makes it suitable for hardware implementation. Throughout the paper, the R-DFT is applied and it is compared to the conventional estimation methods. Finally, the efficiency of the proposed schemes is compared through simulations based on the 60 GHz WiGig and the COST 207 standard, applying various channel models.

  4. Primer effect in the detection of mitochondrial DNA point heteroplasmy by automated sequencing.

    Science.gov (United States)

    Calatayud, Marta; Ramos, Amanda; Santos, Cristina; Aluja, Maria Pilar

    2013-06-01

    The correct detection of mitochondrial DNA (mtDNA) heteroplasmy by automated sequencing presents methodological constraints. The main goals of this study are to investigate the effect of sense and distance of primers in heteroplasmy detection and to test if there are differences in the accurate determination of heteroplasmy involving transitions or transversions. A gradient of the heteroplasmy levels was generated for mtDNA positions 9477 (transition G/A) and 15,452 (transversion C/A). Amplification and subsequent sequencing with forward and reverse primers, situated at 550 and 150 bp from the heteroplasmic positions, were performed. Our data provide evidence that there is a significant difference between the use of forward and reverse primers. The forward primer is the primer that seems to give a better approximation to the real proportion of the variants. No significant differences were found concerning the distance at which the sequencing primers were placed neither between the analysis of transitions and transversions. The data collected in this study are a starting point that allows to glimpse the importance of the sequencing primers in the accurate detection of point heteroplasmy, providing additional insight into the overall automated sequencing strategy.

  5. Targeted genetic testing for familial hypercholesterolaemia using next generation sequencing: a population-based study

    Science.gov (United States)

    2014-01-01

    Background Familial hypercholesterolaemia (FH) is a common Mendelian condition which, untreated, results in premature coronary heart disease. An estimated 88% of FH cases are undiagnosed in the UK. We previously validated a method for FH mutation detection in a lipid clinic population using next generation sequencing (NGS), but this did not address the challenge of identifying index cases in primary care where most undiagnosed patients receive healthcare. Here, we evaluate the targeted use of NGS as a potential route to diagnosis of FH in a primary care population subset selected for hypercholesterolaemia. Methods We used microfluidics-based PCR amplification coupled with NGS and multiplex ligation-dependent probe amplification (MLPA) to detect mutations in LDLR, APOB and PCSK9 in three phenotypic groups within the Generation Scotland: Scottish Family Health Study including 193 individuals with high total cholesterol, 232 with moderately high total cholesterol despite cholesterol-lowering therapy, and 192 normocholesterolaemic controls. Results Pathogenic mutations were found in 2.1% of hypercholesterolaemic individuals, in 2.2% of subjects on cholesterol-lowering therapy and in 42% of their available first-degree relatives. In addition, variants of uncertain clinical significance (VUCS) were detected in 1.4% of the hypercholesterolaemic and cholesterol-lowering therapy groups. No pathogenic variants or VUCS were detected in controls. Conclusions We demonstrated that population-based genetic testing using these protocols is able to deliver definitive molecular diagnoses of FH in individuals with high cholesterol or on cholesterol-lowering therapy. The lower cost and labour associated with NGS-based testing may increase the attractiveness of a population-based approach to FH detection compared to genetic testing with conventional sequencing. This could provide one route to increasing the present low percentage of FH cases with a genetic diagnosis. PMID:24956927

  6. Going, going, gone: characterizing the time-course of congruency sequence effects

    Directory of Open Access Journals (Sweden)

    Tobias eEgner

    2010-09-01

    Full Text Available Performance on traditional selective attention tasks, like the Stroop and flanker protocols, is subject to modulation by trial history, whereby the magnitude of congruency (or conflict effects is often found to decrease following an incongruent trial compared to a congruent one. These ‘congruency sequence effects’ (CSEs typically appear to reflect a mesh of memory- and attention-based processes. The current study aimed to shed new light on the nature of the attention-based contribution to CSEs, by characterizing the shape of the CSE time-course while controlling for mnemonic influences. Existing attention-based accounts of CSEs are either ambiguous in their predictions of CSE time-courses, or predict CSEs to persist or grow over the post-stimulus/response interval in anticipation of an upcoming stimulus. We gauged CSE time-courses by systematically varying inter-stimulus (Experiment 1 and response-to-stimulus (Experiment 2 intervals across a wide temporal range, in a face-word Stroop task. In spite of a an exponential increase in the likelihood of stimulus appearance with increasing interval duration (i.e., an exponential hazard function, results from both experiments showed CSEs to be most pronounced at the shortest intervals, to quickly decay in magnitude with increasing interval length, and to be absent at longer intervals. These data refute the idea that attentional contributions to CSEs remain static over post-stimulus/response intervals and are incompatible with the notion that CSEs reflect expectation-guided preparatory biasing in anticipation of a forthcoming stimulus. The data are compatible, however, with the notion that attentional contributions to CSEs reflect a short-lived, phasic enhancement of attentional set in reaction to processing conflict.

  7. Modeling and optimizing periodically inspected software rejuvenation policy based on geometric sequences

    International Nuclear Information System (INIS)

    Meng, Haining; Liu, Jianjun; Hei, Xinhong

    2015-01-01

    Software aging is characterized by an increasing failure rate, progressive performance degradation and even a sudden crash in a long-running software system. Software rejuvenation is an effective method to counteract software aging. A periodically inspected rejuvenation policy for software systems is studied. The consecutive inspection intervals are assumed to be a decreasing geometric sequence, and upon the inspection times of software system and its failure features, software rejuvenation or system recovery is performed. The system availability function and cost rate function are obtained, and the optimal inspection time and rejuvenation interval are both derived to maximize system availability and minimize cost rate. Then, boundary conditions of the optimal rejuvenation policy are deduced. Finally, the numeric experiment result shows the effectiveness of the proposed policy. Further compared with the existing software rejuvenation policy, the new policy has higher system availability. - Highlights: • A periodically inspected rejuvenation policy for software systems is studied. • A decreasing geometric sequence is used to denote the consecutive inspection intervals. • The optimal inspection times and rejuvenation interval are found. • The new policy is capable of reducing average cost and improving system availability

  8. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes

    DEFF Research Database (Denmark)

    Jespersen, Martin Closter; Peters, Bjoern; Nielsen, Morten

    2017-01-01

    for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from......Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex...... and immunology community....

  9. Research on lock-in thermography for aerospace materials of nondestructive test based on image sequence processing

    Science.gov (United States)

    Liu, Junyan; Dai, Jingmin; Wang, Yang

    2008-11-01

    IR Lock in thermography is an active thermography technology based on thermal wave signal processing, especially, it has many advantages for nondestructive test of composite materials and compound structure application and has been applied on aerospace, automotive, mechanics and electric fields. In lock in thermography, given sufficient time for periodic heating, the surface temperature will evolve periodically in a sinusoidal pattern form the transient state to the steady state. In this paper, the principle of lock in thermography is introduced and the heat transferring process is analyzed by the sinusoidal variation heating flow transferred in materials by means of FEM method. In experiment, the modulating optical stimulation is applied to sample, and image sequences are collected by Jade MWIR 550 FPA IR camera. The digital filter algorithm which is Savitzky-Golay digital smoothness filters is used to remove the effects of high frequency noise. A phase image at the frequency of periodic heating can be calculated using a Fourier transform of the periodic heating frequency in transient state for defect detection. The IR lock in thermography processing software is developed by using of visual C++ programmed based image sequence collected. The experimental results show that the developed system reached up to high level of conventional steady state Lock in method.

  10. Congruency sequence effects are driven by previous-trial congruency, not previous-trial response conflict

    OpenAIRE

    Weissman, Daniel H.; Carp, Joshua

    2013-01-01

    Congruency effects in distracter interference tasks are often smaller after incongruent trials than after congruent trials. However, the sources of such congruency sequence effects (CSEs) are controversial. The conflict monitoring model of cognitive control links CSEs to the detection and resolution of response conflict. In contrast, competing theories attribute CSEs to attentional or affective processes that vary with previous-trial congruency (incongruent vs. congruent). The present study s...

  11. Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization.

    Science.gov (United States)

    Nikumbh, Sarvesh; Pfeifer, Nico

    2017-04-18

    Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist approaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between specific genomic regions - the promoters and enhancers, neglecting other possibilities, for instance, the so-called structural interactions involving intervening chromatin. We present a method that can be trained on 5C data using the genetic sequence of the candidate loci to predict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific support vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The method shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC) curve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did not have sufficient candidate interaction partners for model training, we employed multitask learning to share knowledge between models of different loci. In this scenario, across the three cell lines, the method attained an average performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding prediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on average. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence signals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence signals suggests a potential general role of short tandem repeat sequences in genome organization. We demonstrated how our approach can 1) provide insights into sequence features of locus-specific interaction partners, and 2) also identify their cell

  12. Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus.

    Directory of Open Access Journals (Sweden)

    Lei Pan

    Full Text Available Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination.Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt. Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO, Clusters of Orthologous Groups (KOG and Kyoto Encyclopedia of Genes and Genomes (KEGG, we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit.The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development.

  13. Detection, Validation, and Application of Genotyping-by-Sequencing Based Single Nucleotide Polymorphisms in Upland Cotton

    Directory of Open Access Journals (Sweden)

    M. Sariful Islam

    2015-03-01

    Full Text Available The presence of two closely related subgenomes in the allotetraploid Upland cotton, combined with a narrow genetic base of the cultivated varieties, has hindered the identification of polymorphic genetic markers and their use in improving this important crop. Genotyping-by-sequencing (GBS is a rapid way to identify single nucleotide polymorphism (SNP markers; however, these SNPs may be specific to the sequenced cotton lines. Our objective was to obtain a large set of polymorphic SNPs with broad applicability to the cultivated cotton germplasm. We selected 11 diverse cultivars and their random-mated recombinant inbred progeny for SNP marker development via GBS. Two different GBS methodologies were used by Data2Bio (D2B and the Institute for Genome Diversity (IGD to identify 4441 and 1176 polymorphic SNPs with minor allele frequency of ≥0.1, respectively. We further filtered the SNPs and aligned their sequences to the diploid reference genome. We were able to use homeologous SNPs to assign 1071 SNP loci to the At subgenome and 1223 to the Dt subgenome. These filtered SNPs were located in genic regions about twice as frequently as expected by chance. We tested 111 of the SNPs in 154 diverse Upland cotton lines, which confirmed the utility of the SNP markers developed in such approach. Not only were the SNPs identified in the 11 cultivars present in the 154 cotton lines, no two cultivars had identical SNP genotypes. We conclude that GBS can be easily used to discover SNPs in Upland cotton, which can be converted to functional genotypic assays for use in breeding and genetic studies.

  14. Clinical Sequencing Contributes to a BRCA-Associated Cancer Rediagnosis That Guides an Effective Therapeutic Course.

    Science.gov (United States)

    Chapman, Jocelyn S; Asthana, Saurabh; Cade, Lindsay; Chang, Matthew T; Wang, Zhen; Zaloudek, Charles J; Ueda, Stefanie; Collisson, Eric A; Taylor, Barry S

    2015-07-01

    Cancer is currently classified and treated using an approach based on tissue of origin. Ambiguous or incorrect diagnoses, however, are common and often go unnoticed. Clinical cancer sequencing can provide diagnostic precision, therapeutic direction, and hereditary cancer risk assessment. This report presents a patient with an initial diagnosis of metastatic pancreatic adenocarcinoma (PDA), a disease with a dismal prognosis. Tumor sequencing revealed genomic abnormalities inconsistent with PDA, instead suggesting serous ovarian cancer. This molecular rediagnosis was further refined by the identification of a BRCA2 truncating mutation in the tumor, subsequently confirmed to be a germline event. These findings prompted the initiation of platinum-based chemotherapy, which produced a life-altering response, and referral to genetic counseling for her offspring. These results suggest that clinical tumor sequencing can simultaneously clarify diagnoses, guide therapy, and inform familial risk, even in patients with end-stage metastatic disease, making the case for the development of specific strategies to deploy sequencing coupled with big data in oncology to improve clinical cancer management. Copyright © 2015 by the National Comprehensive Cancer Network.

  15. Individual SWCNT based ionic field effect transistor

    Science.gov (United States)

    Pang, Pei; He, Jin; Park, Jae Hyun; Krstic, Predrag; Lindsay, Stuart

    2011-03-01

    Here we report that the ionic current through a single-walled carbon nanotube (SWCNT) can be effectively gated by a perpendicular electrical field from a top gate electrode, working as ionic field effect transistor. Both our experiment and simulation confirms that the electroosmotic current (EOF) is the main component in the ionic current through the SWCNT and is responsible for the gating effect. We also studied the gating efficiency as a function of solution concentration and pH and demonstrated that the device can work effectively in the physiological relevant condition. This work opens the door to use CNT based nanofluidics for ion and molecule manipulation. This work was supported by the DNA Sequencing Technology Program of the National Human Genome Research Institute (1RC2HG005625-01, 1R21HG004770-01), Arizona Technology Enterprises and the Biodesign Institute.

  16. Effects of sleep loss, time of day, and extended mental work on implicit and explicit learning of sequences

    Science.gov (United States)

    Heuer, H.; Spijkers, W.; Kiesswetter, E.; Schmidtke, V.

    1998-01-01

    Tacit knowledge is part of many professional skills and can be studied experimentally with implicit-learning paradigms. The authors explored the effects of 2 different stressors, loss of sleep and mental fatigue, on implicit learning in a serial-response time (RT) task. In the 1st experiment, 1 night of sleep deprivation was shown to impair implicit but not explicit sequence learning. In the 2nd experiment, no impairment of both types of sequence learning was found after 1.5 hr of mental work. Serial-RT performance, in contrast, suffered from both stressors. These findings suggest that sleep deprivation induces specific risks for automatic, skill-based behavior that are not present in consciously controlled performance.

  17. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2011-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts. Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases. Design/methodology/approach—Usually rostering problems are highly constrained.Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1. Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration. During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering

  18. Absence of congruency sequence effects reveals neurocognitive inflexibility in Parkinson's disease.

    Science.gov (United States)

    Rustamov, Nabi; Rodriguez-Raecke, Rea; Timm, Lydia; Agrawal, Deepashri; Dressler, Dirk; Schrader, Christoph; Tacik, Pawel; Wegner, Florian; Dengler, Reinhard; Wittfoth, Matthias; Kopp, Bruno

    2013-12-01

    The effects of Parkinson's disease (PD) on action selection in conflictual situations were examined in an experiment using the flanker task in combination with event-related brain potentials (ERPs). More specifically, we investigated the effects of PD on behavioral and neuronal indicators of both instantaneous (within-trial flanker congruency effects) and sequence-dependent (between-trial congruency sequence effects) distractor interference. Consistent with the existing literature, congruency-sensitive ERP components (i.e., fronto-central N2 and positive 'dips' of the lateralized readiness potential, LRP) were observed over medial-frontal and lateral-central regions, respectively. For situations requiring instantaneous action control, patients with PD and healthy controls showed similar congruency effects on reaction time, as well as on N2 and LRP 'dip' amplitudes. As expected, controls showed reliable congruency sequence effects on reaction time, as well as on N2 and LRP 'dip' amplitudes. However, patients with PD were completely unaffected by the congruence sequence across consecutive trials, as revealed by reaction time, as well as by N2 and LRP 'dip' amplitudes. The data imply that the effects of PD on action selection are largely restricted to a lack of adaptive modulation in time which we refer to as neurocognitive inflexibility, in the context of relatively spared abilities to instantaneously exert control over action selection. The findings are discussed in terms of basal ganglia dysfunction induced by PD which results primarily either in executive function deficits or in aberrant habit formation. © 2013 Published by Elsevier Ltd.

  19. Association studies using family pools of outcrossing crops based on allele-frequency estimates from DNA sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Jensen, Just; Asp, Torben

    2014-01-01

    from sequence read-counts for mapping. We show that, under additivity assumptions, there is a linear relationship between the family phenotype and family allele frequency, and that a regression of family phenotype on family allele frequency will estimate twice the allele substitution effect at a locus....... However, medium-to-low sequencing depth causes underestimation of the true allele substitution effect. An expression for this underestimation is derived for the case that parents are diploid, such that F2 families have up to four dosages of every allele. Using simulation studies, estimation of the allele...... effect from F2-family pools was verified and it was shown that the underestimation of the allele effect is correctly described. The optimal design for an association study when sequencing budget would be fixed is obtained using large sample size and lower sequence depth, and using higher SNP density...

  20. Digital Sequences and a Time Reversal-Based Impact Region Imaging and Localization Method

    Science.gov (United States)

    Qiu, Lei; Yuan, Shenfang; Mei, Hanfei; Qian, Weifeng

    2013-01-01

    To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs) is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half. PMID:24084123

  1. Digital sequences and a time reversal-based impact region imaging and localization method.

    Science.gov (United States)

    Qiu, Lei; Yuan, Shenfang; Mei, Hanfei; Qian, Weifeng

    2013-10-01

    To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs) is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half.

  2. Digital Sequences and a Time Reversal-Based Impact Region Imaging and Localization Method

    Directory of Open Access Journals (Sweden)

    Weifeng Qian

    2013-10-01

    Full Text Available To reduce time and cost of damage inspection, on-line impact monitoring of aircraft composite structures is needed. A digital monitor based on an array of piezoelectric transducers (PZTs is developed to record the impact region of impacts on-line. It is small in size, lightweight and has low power consumption, but there are two problems with the impact alarm region localization method of the digital monitor at the current stage. The first one is that the accuracy rate of the impact alarm region localization is low, especially on complex composite structures. The second problem is that the area of impact alarm region is large when a large scale structure is monitored and the number of PZTs is limited which increases the time and cost of damage inspections. To solve the two problems, an impact alarm region imaging and localization method based on digital sequences and time reversal is proposed. In this method, the frequency band of impact response signals is estimated based on the digital sequences first. Then, characteristic signals of impact response signals are constructed by sinusoidal modulation signals. Finally, the phase synthesis time reversal impact imaging method is adopted to obtain the impact region image. Depending on the image, an error ellipse is generated to give out the final impact alarm region. A validation experiment is implemented on a complex composite wing box of a real aircraft. The validation results show that the accuracy rate of impact alarm region localization is approximately 100%. The area of impact alarm region can be reduced and the number of PZTs needed to cover the same impact monitoring region is reduced by more than a half.

  3. Analytical framework for identifying and differentiating recent hitchhiking and severe bottleneck effects from multi-locus DNA sequence data.

    Science.gov (United States)

    Sargsyan, Ori

    2012-01-01

    Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This paper develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction with constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50,000 or greater in contrast to 10,000, and the estimates of the recent homogenization events are agree with the "Out of Africa" hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. The results show that significant discrepancies can exist between the estimates.

  4. Analytical framework for identifying and differentiating recent hitchhiking and severe bottleneck effects from multi-locus DNA sequence data.

    Directory of Open Access Journals (Sweden)

    Ori Sargsyan

    Full Text Available Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This paper develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction with constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50,000 or greater in contrast to 10,000, and the estimates of the recent homogenization events are agree with the "Out of Africa" hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. The results show that significant discrepancies can exist between the estimates.

  5. A time series based sequence prediction algorithm to detect activities of daily living in smart home.

    Science.gov (United States)

    Marufuzzaman, M; Reaz, M B I; Ali, M A M; Rahman, L F

    2015-01-01

    The goal of smart homes is to create an intelligent environment adapting the inhabitants need and assisting the person who needs special care and safety in their daily life. This can be reached by collecting the ADL (activities of daily living) data and further analysis within existing computing elements. In this research, a very recent algorithm named sequence prediction via enhanced episode discovery (SPEED) is modified and in order to improve accuracy time component is included. The modified SPEED or M-SPEED is a sequence prediction algorithm, which modified the previous SPEED algorithm by using time duration of appliance's ON-OFF states to decide the next state. M-SPEED discovered periodic episodes of inhabitant behavior, trained it with learned episodes, and made decisions based on the obtained knowledge. The results showed that M-SPEED achieves 96.8% prediction accuracy, which is better than other time prediction algorithms like PUBS, ALZ with temporal rules and the previous SPEED. Since human behavior shows natural temporal patterns, duration times can be used to predict future events more accurately. This inhabitant activity prediction system will certainly improve the smart homes by ensuring safety and better care for elderly and handicapped people.

  6. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    Science.gov (United States)

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Tracking Algorithm of Multiple Pedestrians Based on Particle Filters in Video Sequences

    Science.gov (United States)

    Liu, Yun; Wang, Chuanxu; Zhang, Shujun; Cui, Xuehong

    2016-01-01

    Pedestrian tracking is a critical problem in the field of computer vision. Particle filters have been proven to be very useful in pedestrian tracking for nonlinear and non-Gaussian estimation problems. However, pedestrian tracking in complex environment is still facing many problems due to changes of pedestrian postures and scale, moving background, mutual occlusion, and presence of pedestrian. To surmount these difficulties, this paper presents tracking algorithm of multiple pedestrians based on particle filters in video sequences. The algorithm acquires confidence value of the object and the background through extracting a priori knowledge thus to achieve multipedestrian detection; it adopts color and texture features into particle filter to get better observation results and then automatically adjusts weight value of each feature according to current tracking environment. During the process of tracking, the algorithm processes severe occlusion condition to prevent drift and loss phenomena caused by object occlusion and associates detection results with particle state to propose discriminated method for object disappearance and emergence thus to achieve robust tracking of multiple pedestrians. Experimental verification and analysis in video sequences demonstrate that proposed algorithm improves the tracking performance and has better tracking results. PMID:27847514

  8. Bayesian analysis of gene essentiality based on sequencing of transposon insertion libraries

    Science.gov (United States)

    DeJesus, Michael A.; Zhang, Yanjia J.; Sassetti, Christopher M.; Rubin, Eric J.; Sacchettini, James C.; Ioerger, Thomas R.

    2013-01-01

    Motivation: Next-generation sequencing affords an efficient analysis of transposon insertion libraries, which can be used to identify essential genes in bacteria. To analyse this high-resolution data, we present a formal Bayesian framework for estimating the posterior probability of essentiality for each gene, using the extreme-value distribution to characterize the statistical significance of the longest region lacking insertions within a gene. We describe a sampling procedure based on the Metropolis–Hastings algorithm to calculate posterior probabilities of essentiality while simultaneously integrating over unknown internal parameters. Results: Using a sequence dataset from a transposon library for Mycobacterium tuberculosis, we show that this Bayesian approach predicts essential genes that correspond well with genes shown to be essential in previous studies. Furthermore, we show that by using the extreme-value distribution to characterize genomic regions lacking transposon insertions, this method is capable of identifying essential domains within genes. This approach can be used for analysing transposon libraries in other organisms and augmenting essentiality predictions with statistical confidence scores. Availability: A python script implementing the method described is available for download from http://saclab.tamu.edu/essentiality/. Contact: michael.dejesus@tamu.edu or ioerger@cs.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23361328

  9. Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method.

    Science.gov (United States)

    Yao, Yuhua; Li, Xianhong; Liao, Bo; Huang, Li; He, Pingan; Wang, Fayou; Yang, Jiasheng; Sun, Hailiang; Zhao, Yulong; Yang, Jialiang

    2017-05-08

    Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses.

  10. Internal Transcribed Spacer 1 (ITS1 based sequence typing reveals phylogenetically distinct Ascaris population

    Directory of Open Access Journals (Sweden)

    Koushik Das

    2015-01-01

    Full Text Available Taxonomic differentiation among morphologically identical Ascaris species is a debatable scientific issue in the context of Ascariasis epidemiology. To explain the disease epidemiology and also the taxonomic position of different Ascaris species, genome information of infecting strains from endemic areas throughout the world is certainly crucial. Ascaris population from human has been genetically characterized based on the widely used genetic marker, internal transcribed spacer1 (ITS1. Along with previously reported and prevalent genotype G1, 8 new sequence variants of ITS1 have been identified. Genotype G1 was significantly present among female patients aged between 10 to 15 years. Intragenic linkage disequilibrium (LD analysis at target locus within our study population has identified an incomplete LD value with potential recombination events. A separate cluster of Indian isolates with high bootstrap value indicate their distinct phylogenetic position in comparison to the global Ascaris population. Genetic shuffling through recombination could be a possible reason for high population diversity and frequent emergence of new sequence variants, identified in present and other previous studies. This study explores the genetic organization of Indian Ascaris population for the first time which certainly includes some fundamental information on the molecular epidemiology of Ascariasis.

  11. Internal Transcribed Spacer 1 (ITS1) based sequence typing reveals phylogenetically distinct Ascaris population

    Science.gov (United States)

    Das, Koushik; Chowdhury, Punam; Ganguly, Sandipan

    2015-01-01

    Taxonomic differentiation among morphologically identical Ascaris species is a debatable scientific issue in the context of Ascariasis epidemiology. To explain the disease epidemiology and also the taxonomic position of different Ascaris species, genome information of infecting strains from endemic areas throughout the world is certainly crucial. Ascaris population from human has been genetically characterized based on the widely used genetic marker, internal transcribed spacer1 (ITS1). Along with previously reported and prevalent genotype G1, 8 new sequence variants of ITS1 have been identified. Genotype G1 was significantly present among female patients aged between 10 to 15 years. Intragenic linkage disequilibrium (LD) analysis at target locus within our study population has identified an incomplete LD value with potential recombination events. A separate cluster of Indian isolates with high bootstrap value indicate their distinct phylogenetic position in comparison to the global Ascaris population. Genetic shuffling through recombination could be a possible reason for high population diversity and frequent emergence of new sequence variants, identified in present and other previous studies. This study explores the genetic organization of Indian Ascaris population for the first time which certainly includes some fundamental information on the molecular epidemiology of Ascariasis. PMID:26504510

  12. Internal Transcribed Spacer 1 (ITS1) based sequence typing reveals phylogenetically distinct Ascaris population.

    Science.gov (United States)

    Das, Koushik; Chowdhury, Punam; Ganguly, Sandipan

    2015-01-01

    Taxonomic differentiation among morphologically identical Ascaris species is a debatable scientific issue in the context of Ascariasis epidemiology. To explain the disease epidemiology and also the taxonomic position of different Ascaris species, genome information of infecting strains from endemic areas throughout the world is certainly crucial. Ascaris population from human has been genetically characterized based on the widely used genetic marker, internal transcribed spacer1 (ITS1). Along with previously reported and prevalent genotype G1, 8 new sequence variants of ITS1 have been identified. Genotype G1 was significantly present among female patients aged between 10 to 15 years. Intragenic linkage disequilibrium (LD) analysis at target locus within our study population has identified an incomplete LD value with potential recombination events. A separate cluster of Indian isolates with high bootstrap value indicate their distinct phylogenetic position in comparison to the global Ascaris population. Genetic shuffling through recombination could be a possible reason for high population diversity and frequent emergence of new sequence variants, identified in present and other previous studies. This study explores the genetic organization of Indian Ascaris population for the first time which certainly includes some fundamental information on the molecular epidemiology of Ascariasis.

  13. REMap: Operon map of M. tuberculosis based on RNA sequence data.

    Science.gov (United States)

    Pelly, Shaaretha; Winglee, Kathryn; Xia, Fang Fang; Stevens, Rick L; Bishai, William R; Lamichhane, Gyanu

    2016-07-01

    A map of the transcriptional organization of genes of an organism is a basic tool that is necessary to understand and facilitate a more accurate genetic manipulation of the organism. Operon maps are largely generated by computational prediction programs that rely on gene conservation and genome architecture and may not be physiologically relevant. With the widespread use of RNA sequencing (RNAseq), the prediction of operons based on actual transcriptome sequencing rather than computational genomics alone is much needed. Here, we report a validated operon map of Mycobacterium tuberculosis, developed using RNAseq data from both the exponential and stationary phases of growth. At least 58.4% of M. tuberculosis genes are organized into 749 operons. Our prediction algorithm, REMap (RNA Expression Mapping of operons), considers the many cases of transcription coverage of intergenic regions, and avoids dependencies on functional annotation and arbitrary assumptions about gene structure. As a result, we demonstrate that REMap is able to more accurately predict operons, especially those that contain long intergenic regions or functionally unrelated genes, than previous operon prediction programs. The REMap algorithm is publicly available as a user-friendly tool that can be readily modified to predict operons in other bacteria. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Identifications of Putative PKA Substrates with Quantitative Phosphoproteomics and Primary-Sequence-Based Scoring.

    Science.gov (United States)

    Imamura, Haruna; Wagih, Omar; Niinae, Tomoya; Sugiyama, Naoyuki; Beltrao, Pedro; Ishihama, Yasushi

    2017-04-07

    Protein kinase A (PKA or cAMP-dependent protein kinase) is a serine/threonine kinase that plays essential roles in the regulation of proliferation, differentiation, and apoptosis. To better understand the functions of PKA, it is necessary to elucidate the direct interplay between PKA and their substrates in living human cells. To identify kinase target substrates in a high-throughput manner, we first quantified the change of phosphoproteome in the cells of which PKA activity was perturbed by drug stimulations. LC-MS/MS analyses identified 2755 and 3191 phosphopeptides from experiments with activator or inhibitor of PKA. To exclude potential indirect targets of PKA, we built a computational model to characterize the kinase sequence specificity toward the substrate target site based on known kinase-substrate relationships. Finally, by combining the sequence recognition model with the quantitative changes in phosphorylation measured in the two drug perturbation experiments, we identified 29 reliable candidates of PKA targeting residues in living cells including 8 previously known substrates. Moreover, 18 of these sites were confirmed to be site-specifically phosphorylated in vitro. Altogether this study proposed a confident list of PKA substrate candidates, expanding our knowledge of PKA signaling network.

  15. DEEPre: sequence-based enzyme EC number prediction by deep learning

    KAUST Repository

    Li, Yu

    2017-10-20

    Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre\\'s ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.

  16. Molecular phylogenetic analysis of Indonesia Solanaceae based on DNA sequences of internal transcribed spacer region

    Science.gov (United States)

    Hidayat, Topik; Priyandoko, Didik; Islami, Dina Karina; Wardiny, Putri Yunitha

    2016-02-01

    Solanaceae is one of largest family in Angiosperm group with highly diverse in morphological character. In Indonesia, this group of plant is very popular due to its usefulness as food, ornamental and medicinal plants. However, investigation on phylogenetic relationship among the member of this family in Indonesia remains less attention. The purpose of this study was to evaluate the phylogenetics relationship of the family especially distributed in Indonesia. DNA sequences of Internal Transcribed Spacer (ITS) region of 19 species of Solanaceae and three species of outgroup, which belongs to family Convolvulaceae, Apocynaceae, and Plantaginaceae, were isolated, amplified, and sequenced. Phylogenetic tree analysis based on parsimony method was conducted with using data derived from the ITS-1, 5.8S, and ITS-2, separately, and the combination of all. Results indicated that the phylogenetic tree derived from the combined data established better pattern of relationship than separate data. Thus, three major groups were revealed. Group 1 consists of tribe Datureae, Cestreae, and Petunieae, whereas group 2 is member of tribe Physaleae. Group 3 belongs to tribe Solaneae. The use of the ITS region as a molecular markers, in general, support the global Solanaceae relationship that has been previously reported.

  17. Molecular phylogeny and evolution of Scomber (Teleostei: Scombridae) based on mitochondrial and nuclear DNA sequences

    Science.gov (United States)

    Cheng, Jiao; Gao, Tianxiang; Miao, Zhenqing; Yanagimoto, Takashi

    2011-03-01

    A molecular phylogenetic analysis of the genus Scomber was conducted based on mitochondrial (COI, Cyt b and control region) and nuclear (5S rDNA) DNA sequence data in multigene perspective. A variety of phylogenetic analytic methods were used to clarify the current taxonomic Classification and to assess phylogenetic relationships and the evolutionary history of this genus. The present study produced a well-resolved phylogeny that strongly supported the monophyly of Scomber. We confirmed that S. japonicus and S. colias were genetically distinct. Although morphologically and ecologically similar to S. colias, the molecular data showed that S. japonicus has a greater molecular affinity with S. australasicus, which conflicts with the traditional taxonomy. This phylogenetic pattern was corroborated by the mtDNA data, but incompletely by the nuclear DNA data. Phylogenetic concordance between the mitochondrial and nuclear DNA regions for the basal nodes Supports an Atlantic origin for Scomber. The present-day geographic ranges of the species were compared with the resultant molecular phylogeny derived from partition Bayesian analyses of the combined data sets to evaluate possible dispersal routes of the genus. The present-day geographic distribution of Scomber species might be best ascribed to multiple dispersal events. In addition, our results suggest that phylogenies derived from multiple genes and long sequences exhibited improved phylogenetic resolution, from which we conclude that the phylogenetic reconstruction is a reliable representation of the evolutionary history of Scomber.

  18. Time-stretch microscopy based on time-wavelength sequence reconstruction from wideband incoherent source

    International Nuclear Information System (INIS)

    Zhang, Chi; Xu, Yiqing; Wei, Xiaoming; Tsia, Kevin K.; Wong, Kenneth K. Y.

    2014-01-01

    Time-stretch microscopy has emerged as an ultrafast optical imaging concept offering the unprecedented combination of the imaging speed and sensitivity. However, dedicated wideband and coherence optical pulse source with high shot-to-shot stability has been mandated for time-wavelength mapping—the enabling process for ultrahigh speed wavelength-encoded image retrieval. From the practical point of view, exploiting methods to relax the stringent requirements (e.g., temporal stability and coherence) for the source of time-stretch microscopy is thus of great value. In this paper, we demonstrated time-stretch microscopy by reconstructing the time-wavelength mapping sequence from a wideband incoherent source. Utilizing the time-lens focusing mechanism mediated by a narrow-band pulse source, this approach allows generation of a wideband incoherent source, with the spectral efficiency enhanced by a factor of 18. As a proof-of-principle demonstration, time-stretch imaging with the scan rate as high as MHz and diffraction-limited resolution is achieved based on the wideband incoherent source. We note that the concept of time-wavelength sequence reconstruction from wideband incoherent source can also be generalized to any high-speed optical real-time measurements, where wavelength is acted as the information carrier

  19. Phylogenetic relationships of Palaearctic Formica species (Hymenoptera, Formicidae based on mitochondrial cytochrome B sequences.

    Directory of Open Access Journals (Sweden)

    Anna V Goropashnaya

    Full Text Available Ants of genus Formica demonstrate variation in social organization and represent model species for ecological, behavioral, evolutionary studies and testing theoretical implications of the kin selection theory. Subgeneric division of the Formica ants based on morphology has been questioned and remained unclear after an allozyme study on genetic differentiation between 13 species representing all subgenera was conducted. In the present study, the phylogenetic relationships within the genus were examined using mitochondrial DNA sequences of the cytochrome b and a part of the NADH dehydrogenase subunit 6. All 23 Formica species sampled in the Palaearctic clustered according to the subgeneric affiliation except F. uralensis that formed a separate phylogenetic group. Unlike Coptoformica and Formica s. str., the subgenus Serviformica did not form a tight cluster but more likely consisted of a few small clades. The genetic distances between the subgenera were around 10%, implying approximate divergence time of 5 Myr if we used the conventional insect divergence rate of 2% per Myr. Within-subgenus divergence estimates were 6.69% in Serviformica, 3.61% in Coptoformica, 1.18% in Formica s. str., which supported our previous results on relatively rapid speciation in the latter subgenus. The phylogeny inferred from DNA sequences provides a necessary framework against which the evolution of social traits can be compared. We discuss implications of inferred phylogeny for the evolution of social traits.

  20. DEEPre: sequence-based enzyme EC number prediction by deep learning.

    Science.gov (United States)

    Li, Yu; Wang, Sheng; Umarov, Ramzan; Xie, Bingqing; Fan, Ming; Li, Lihua; Gao, Xin

    2018-03-01

    Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms. The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre. xin.gao@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  1. Modeling genetic imprinting effects of DNA sequences with multilocus polymorphism data

    Directory of Open Access Journals (Sweden)

    Staud Roland

    2009-08-01

    Full Text Available Abstract Single nucleotide polymorphisms (SNPs represent the most widespread type of DNA sequence variation in the human genome and they have recently emerged as valuable genetic markers for revealing the genetic architecture of complex traits in terms of nucleotide combination and sequence. Here, we extend an algorithmic model for the haplotype analysis of SNPs to estimate the effects of genetic imprinting expressed at the DNA sequence level. The model provides a general procedure for identifying the number and types of optimal DNA sequence variants that are expressed differently due to their parental origin. The model is used to analyze a genetic data set collected from a pain genetics project. We find that DNA haplotype GAC from three SNPs, OPRKG36T (with two alleles G and T, OPRKA843G (with alleles A and G, and OPRKC846T (with alleles C and T, at the kappa-opioid receptor, triggers a significant effect on pain sensitivity, but with expression significantly depending on the parent from which it is inherited (p = 0.008. With a tremendous advance in SNP identification and automated screening, the model founded on haplotype discovery and statistical inference may provide a useful tool for genetic analysis of any quantitative trait with complex inheritance.

  2. New methods for next generation sequencing based microRNA expression profiling

    OpenAIRE

    den Dunnen Johan T; van Ommen Gertjan; Ariyurek Yavuz; Buermans Henk PJ; 't Hoen Peter AC

    2010-01-01

    Abstract Background MicroRNAs are small non-coding RNA transcripts that regulate post-transcriptional gene expression. The millions of short sequence reads generated by next generation sequencing technologies make this technique explicitly suitable for profiling of known and novel microRNAs. A modification to the small-RNA expression kit (SREK, Ambion) library preparation method for the SOLiD sequencing platform is described to generate microRNA sequencing libraries that are compatible with t...

  3. The effect of strand bias in Illumina short-read sequencing data

    Directory of Open Access Journals (Sweden)

    Guo Yan

    2012-11-01

    Full Text Available Abstract Background When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-read sequencing data to evaluate the effect of strand bias on genotyping quality, and to explore the possible causes of strand bias. Result We collected 22 breast cancer samples from 22 patients and sequenced their exome using the Illumina GAIIx machine. By comparing the consistency between the genotypes inferred from this sequencing data with the genotypes inferred from SNP chip data, we found that, when using sequencing data, SNPs with extreme strand bias did not have significantly lower consistency rates compared to SNPs with low or no strand bias. However, this result may be limited by the small subset of SNPs present in both the exome sequencing and the SNP chip data. We further compared the transition and transversion ratio and the number of novel non-synonymous SNPs between the SNPs with low or no strand bias and those with extreme strand bias, and found that SNPs with low or no strand bias have better overall quality. We also discovered that the strand bias occurs randomly at genomic positions across these samples, and observed no consistent pattern of strand bias location across samples. By comparing results from two different aligners, BWA and Bowtie, we found very consistent strand bias patterns. Thus strand bias is unlikely to be caused by alignment artifacts. We successfully replicated our results using two additional independent datasets with different capturing methods and Illumina sequencers. Conclusion Extreme strand bias indicates a potential high false-positive rate for SNPs.

  4. Management of the interplay effect when using dynamic MLC sequences to treat moving targets

    International Nuclear Information System (INIS)

    Court, Laurence E.; Wagar, Matthew; Ionascu, Dan; Berbeco, Ross; Chin, Lee

    2008-01-01

    Interplay between organ motion and leaf motion has been shown to generally have a small dosimetric impact for most clinical intensity-modulated radiation therapy treatments. However, it has also been shown that for some MLC sequences there can be large daily variations in the delivered dose, depending on details of patient motion or the number of fractions. This study investigates guidelines for dynamic MLC sequences that will keep daily dose variations due to the interplay between organ motion and leaf motion within 10%. Dose distributions for a range of MLC separations (0.2-5.0 cm) and displacements between adjacent MLCs (0-1.5 cm) were exported from ECLIPSE to purpose-written software, which simulated the dose distribution delivered to a moving target. Target motion parallel and perpendicular to the MLC motion was investigated for a range of amplitudes (0.5-4.0 cm), periods (1.5-10 s), and MLC speeds (0.1-3.0 cm/s) with target motions modeled as sin 6 . Results were confirmed experimentally by measuring the dose delivered to an ion chamber array in a moving phantom for different MLC sequences. The simulation results were used to identify MLC sequences that kept dose variations within 10% compared to the dose delivered with no motion. The maximum allowable MLC speed, when target motion is parallel to the MLC motion, was found to be a simple function of target period and MLC separation. When the target motion is perpendicular to MLC motion, the maximum allowable MLC speed can be described as a function of MLC separation and the displacement of adjacent MLCs. These guidelines were successfully applied to two-dimensional motion, and a simple program was written to import MLC sequence files and evaluate whether the maximum daily dose discrepancy caused by the interplay effect will be larger than 10%. This software was experimentally evaluated, and found to conservatively predict whether a given MLC sequence could give large daily dose discrepancies

  5. A novel on-line spatial-temporal k-anonymity method for location privacy protection from sequence rules-based inference attacks.

    Science.gov (United States)

    Zhang, Haitao; Wu, Chenxue; Chen, Zewei; Liu, Zhao; Zhu, Yunhong

    2017-01-01

    Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non

  6. Geographic Distribution of Leishmania Species in Ecuador Based on the Cytochrome B Gene Sequence Analysis

    Science.gov (United States)

    Kato, Hirotomo; Gomez, Eduardo A.; Martini-Robles, Luiggi; Muzzio, Jenny; Velez, Lenin; Calvopiña, Manuel; Romero-Alvarez, Daniel; Mimori, Tatsuyuki; Uezato, Hiroshi; Hashiguchi, Yoshihisa

    2016-01-01

    A countrywide epidemiological study was performed to elucidate the current geographic distribution of causative species of cutaneous leishmaniasis (CL) in Ecuador by using FTA card-spotted samples and smear slides as DNA sources. Putative Leishmania in 165 samples collected from patients with CL in 16 provinces of Ecuador were examined at the species level based on the cytochrome b gene sequence analysis. Of these, 125 samples were successfully identified as Leishmania (Viannia) guyanensis, L. (V.) braziliensis, L. (V.) naiffi, L. (V.) lainsoni, and L. (Leishmania) mexicana. Two dominant species, L. (V.) guyanensis and L. (V.) braziliensis, were widely distributed in Pacific coast subtropical and Amazonian tropical areas, respectively. Recently reported L. (V.) naiffi and L. (V.) lainsoni were identified in Amazonian areas, and L. (L.) mexicana was identified in an Andean highland area. Importantly, the present study demonstrated that cases of L. (V.) braziliensis infection are increasing in Pacific coast areas. PMID:27410039

  7. Sequence-Specific β-Peptide Synthesis by a Rotaxane-Based Molecular Machine.

    Science.gov (United States)

    De Bo, Guillaume; Gall, Malcolm A Y; Kitching, Matthew O; Kuschel, Sonja; Leigh, David A; Tetlow, Daniel J; Ward, John W

    2017-08-09

    We report on the synthesis and operation of a three-barrier, rotaxane-based, artificial molecular machine capable of sequence-specific β-homo (β 3 ) peptide synthesis. The machine utilizes nonproteinogenic β 3 -amino acids, a class of amino acids not generally accepted by the ribosome, particularly consecutively. Successful operation of the machine via native chemical ligation (NCL) demonstrates that even challenging 15- and 19-membered ligation transition states are suitable for information translation using this artificial molecular machine. The peptide-bond-forming catalyst region can be removed from the transcribed peptide by peptidases, artificial and biomachines working in concert to generate a product that cannot be made by either machine alone.

  8. Xylariaceae diversity in Thailand and Philippines, based on rDNA sequencing

    Directory of Open Access Journals (Sweden)

    Natarajan Velmurugan

    2013-05-01

    Full Text Available Twenty three different Xylariaceae Tul. & C. Tul were isolatedfrom samples collected from forest zones of Thailand and Philippines.The fungal samples were characterized based on morphological characteristics and nuclear ITS1-5.8S rDNA-ITS2 region sequences. Ten species of Xylaria, two species of Hypoxylon, Biscogniauxia, Rosellinia and one species of Annulohypoxylon and Entonaema were found. Entonaema the distinctive genus of Xylariaceae, isolated in the study from Thailand samples showed a close relationship with Xylaria in phylogenetic tree. Xylariaceous species identified at molecular level showed significant similarity of the morphological characters, such as stromal structure, ascal apex and the germ slit of ascospores. In addition, three species of Arthrinium, two species of Pestalotiopsis were also isolated and characterized in the study. A phylogenetic affinity of Pestalotiopsis with Xylariaceae was found.

  9. Xylariaceae diversity in Thailand and Philippines, based on rDNA sequencing

    Directory of Open Access Journals (Sweden)

    Natarajan Velmurugan

    2013-07-01

    Full Text Available Twenty three different Xylariaceae Tul. & C. Tul were isolated from samples collected from forest zones of Thailand and Philippines. The fungal samples were characterized based on morphological characteristics and nuclear ITS1-5.8S rDNA-ITS2 region sequences. Ten species of Xylaria, two species of Hypoxylon, Biscogniauxia, Rosellinia and one species of Annulohypoxylon and Entonaema were found. Entonaema the distinctive genus of Xylariaceae, isolated in the study from Thailand samples showed a close relationship withXylaria in phylogenetic tree. Xylariaceous species identified at molecular level showed significant similarity of the morphological characters, such as stromal structure, ascal apex and the germ slit of ascospores. In addition, three species of Arthrinium, two species of Pestalotiopsis were also isolated and characterized in the study. A phylogenetic affinity of Pestalotiopsis with Xylariaceae was found.

  10. Automatic start-up system of nuclear reactor based on sequence control technology

    International Nuclear Information System (INIS)

    Zhang Yao; Zhang Dafa; Peng Huaqing

    2009-01-01

    A conceptive design of an automatic start-up system based on the sequence control for the nuclear reactors is given in this paper, so as to solve the problems during the start-up process, such as the long operation time, low automatic control level and high accident rate. The start-up process and its requirements are analyzed in detail at first. Then,the principle, the architecture, the key technologies of the automatic start-up system of nuclear reactors are designed and discussed. With the designed system, the automatic start-up of the nuclear reactor can be realized,the work load of the operator can be reduced,and the safety and efficiency of the nuclear power plant during its start-up can be improved. (authors)

  11. Personal sleep pattern visualization using sequence-based kernel self-organizing map on sound data.

    Science.gov (United States)

    Wu, Hongle; Kato, Takafumi; Yamada, Tomomi; Numao, Masayuki; Fukui, Ken-Ichi

    2017-07-01

    We propose a method to discover sleep patterns via clustering of sound events recorded during sleep. The proposed method extends the conventional self-organizing map algorithm by kernelization and sequence-based technologies to obtain a fine-grained map that visualizes the distribution and changes of sleep-related events. We introduced features widely applied in sound processing and popular kernel functions to the proposed method to evaluate and compare performance. The proposed method provides a new aspect of sleep monitoring because the results demonstrate that sound events can be directly correlated to an individual's sleep patterns. In addition, by visualizing the transition of cluster dynamics, sleep-related sound events were found to relate to the various stages of sleep. Therefore, these results empirically warrant future study into the assessment of personal sleep quality using sound data. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. An expanded phylogeny of treefrogs (Hylidae) based on nuclear and mitochondrial sequence data.

    Science.gov (United States)

    Wiens, John J; Kuczynski, Caitlin A; Hua, Xia; Moen, Daniel S

    2010-06-01

    The treefrogs (Hylidae) make up one of the most species-rich families of amphibians. With 885 species currently described, they contain >13% of all amphibian species. In recent years, there has been considerable progress in resolving hylid phylogeny. However, the most comprehensive phylogeny to date (Wiens et al., 2006) included only 292 species, was based only on parsimony, provided only poor support for most higher-level relationships, and conflicted with previous hypotheses in several parts (including the monophyly and relationships of major clades of Hylinae). Here, we present an expanded phylogeny for hylid frogs, including data for 362 hylid taxa for up to 11 genes (4 mitochondrial, 7 nuclear), including 70 additional taxa and >270 sequences not included in the previously most comprehensive analysis. The new tree from maximum likelihood analysis is more well-resolved, strongly supported, and concordant with previous hypotheses, and provides a framework for future systematic, biogeographic, ecological, and evolutionary studies. 2010 Elsevier Inc. All rights reserved.

  13. DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

    Science.gov (United States)

    Wei, Ze-Gang; Zhang, Shao-Wu

    2017-07-21

    Recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many heuristic methods have been proposed for OTU inferences with low computational complexity, they just select one sequence as the seed for each cluster and the results are sensitive to the selected sequences that represent the clusters. To address this issue, we present a de Bruijn graph-based heuristic clustering method (DBH) for clustering massive 16S rRNA sequences into OTUs by introducing a novel seed selection strategy and greedy clustering approach. Compared with existing widely used methods on several simulated and real-life metagenomic datasets, the results show that DBH has higher clustering performance and low memory usage, facilitating the overestimation of OTUs number. DBH is more effective to handle large-scale metagenomic datasets. The DBH software can be freely downloaded from https://github.com/nwpu134/DBH.git for academic users. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Monomer sequence in PLGA microparticles: Effects on acidic microclimates and in vivo inflammatory response.

    Science.gov (United States)

    Washington, Michael A; Balmert, Stephen C; Fedorchak, Morgan V; Little, Steven R; Watkins, Simon C; Meyer, Tara Y

    2018-01-01

    , erosion, and MW loss (Biomaterials2017, 117, 66 and other references cited within the manuscript), provide significant insight not only about sequence effects in PLGAs but into the underlying mechanisms of PLGA degradation in general. Copyright © 2017 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  15. A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly.

    Science.gov (United States)

    Goralski, Michal; Sobieszczanska, Paula; Obrepalska-Steplowska, Aleksandra; Swiercz, Aleksandra; Zmienko, Agnieszka; Figlerowicz, Marek

    2016-01-01

    Nicotiana benthamiana has been widely used in laboratories around the world for studying plant-pathogen interactions and posttranscriptional gene expression silencing. Yet the exploration of its transcriptome has lagged behind due to the lack of both adequate sequence information and genome-wide analysis tools, such as DNA microarrays. Despite the increasing use of high-throughput sequencing technologies, the DNA microarrays still remain a popular gene expression tool, because they are cheaper and less demanding regarding bioinformatics skills and computational effort. We designed a gene expression microarray with 103,747 60-mer probes, based on two recently published versions of N. benthamiana transcriptome (v.3 and v.5). Both versions were reconstructed from RNA-Seq data of non-strand-specific pooled-tissue libraries, so we defined the sense strand of the contigs prior to designing the probe. To accomplish this, we combined a homology search against Arabidopsis thaliana proteins and hybridization to a test 244k microarray containing pairs of probes, which represented individual contigs. We identified the sense strand in 106,684 transcriptome contigs and used this information to design an Nb-105k microarray on an Agilent eArray platform. Following hybridization of RNA samples from N. benthamiana roots and leaves we demonstrated that the new microarray had high specificity and sensitivity for detection of differentially expressed transcripts. We also showed that the data generated with the Nb-105k microarray may be used to identify incorrectly assembled contigs in the v.5 transcriptome, by detecting inconsistency in the gene expression profiles, which is indicated using multiple microarray probes that match the same v.5 primary transcripts. We provided a complete design of an oligonucleotide microarray that may be applied to the research of N. benthamiana transcriptome. This, in turn, will allow the N. benthamiana research community to take full advantage of

  16. Accurate diagnostics for Bovine tuberculosis based on high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander Churbanov

    Full Text Available BACKGROUND: Bovine tuberculosis (bTB is an enduring contagious disease of cattle that has caused substantial losses to the global livestock industry. Despite large-scale eradication efforts, bTB continues to persist. Current bTB tests rely on the measurement of immune responses in vivo (skin tests, and in vitro (bovine interferon-γ release assay. Recent developments are characterized by interrogating the expression of an increasing number of genes that participate in the immune response. Currently used assays have the disadvantages of limited sensitivity and specificity, which may lead to incomplete eradication of bTB. Moreover, bTB that reemerges from wild disease reservoirs requires early and reliable diagnostics to prevent further spread. In this work, we use high-throughput sequencing of the peripheral blood mononuclear cells (PBMCs transcriptome to identify an extensive panel of genes that participate in the immune response. We also investigate the possibility of developing a reliable bTB classification framework based on RNA-Seq reads. METHODOLOGY/PRINCIPAL FINDINGS: Pooled PBMC mRNA samples from unaffected calves as well as from those with disease progression of 1 and 2 months were sequenced using the Illumina Genome Analyzer II. More than 90 million reads were splice-aligned against the reference genome, and deposited to the database for further expression analysis and visualization. Using this database, we identified 2,312 genes that were differentially expressed in response to bTB infection (p<10(-8. We achieved a bTB infected status classification accuracy of more than 99% with split-sample validation on newly designed and learned mixtures of expression profiles. CONCLUSIONS/SIGNIFICANCE: We demonstrated that bTB can be accurately diagnosed at the early stages of disease progression based on RNA-Seq high-throughput sequencing. The inclusion of multiple genes in the diagnostic panel, combined with the superior sensitivity and broader

  17. Performance comparison of bench-top next generation sequencers using microdroplet PCR-based enrichment for targeted sequencing in patients with autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Eriko Koshimizu

    Full Text Available Next-generation sequencing (NGS combined with enrichment of target genes enables highly efficient and low-cost sequencing of multiple genes for genetic diseases. The aim of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection in autism spectrum disorder (ASD. We assessed the performance of the bench-top Ion Torrent PGM and Illumina MiSeq platforms as optimized solutions for mutation detection, using microdroplet PCR-based enrichment of 62 ASD associated genes. Ten patients with known mutations were sequenced using NGS to validate the sensitivity of our method. The overall read quality was better with MiSeq, largely because of the increased indel-related error associated with PGM. The sensitivity of SNV detection was similar between the two platforms, suggesting they are both suitable for SNV detection in the human genome. Next, we used these methods to analyze 28 patients with ASD, and identified 22 novel variants in genes associated with ASD, with one mutation detected by MiSeq only. Thus, our results support the combination of target gene enrichment and NGS as a valuable molecular method for investigating rare variants in ASD.

  18. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations.

    Science.gov (United States)

    Lees, Jonathan G; Janes, Robert W

    2008-01-15

    A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  19. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    Directory of Open Access Journals (Sweden)

    Lees Jonathan G

    2008-01-01

    Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  20. Region-based association tests for sequencing data on survival traits.

    Science.gov (United States)

    Chien, Li-Chu; Bowden, Donald W; Chiu, Yen-Feng

    2017-09-01

    Family-based designs enriched with affected subjects and disease associated variants can increase statistical power for identifying functional rare variants. However, few rare variant analysis approaches are available for time-to-event traits in family designs and none of them applicable to the X chromosome. We developed novel pedigree-based burden and kernel association tests for time-to-event outcomes with right censoring for pedigree data, referred to FamRATS (family-based rare variant association tests for survival traits). Cox proportional hazard models were employed to relate a time-to-event trait with rare variants with flexibility to encompass all ranges and collapsing of multiple variants. In addition, the robustness of violating proportional hazard assumptions was investigated for the proposed and four current existing tests, including the conventional population-based Cox proportional model and the burden, kernel, and sum of squares statistic (SSQ) tests for family data. The proposed tests can be applied to large-scale whole-genome sequencing data. They are appropriate for the practical use under a wide range of misspecified Cox models, as well as for population-based, pedigree-based, or hybrid designs. In our extensive simulation study and data example, we showed that the proposed kernel test is the most powerful and robust choice among the proposed burden test and the existing four rare variant survival association tests. When applied to the Diabetes Heart Study, the proposed tests found exome variants of the JAK1 gene on chromosome 1 showed the most significant association with age at onset of type 2 diabetes from the exome-wide analysis. © 2017 WILEY PERIODICALS, INC.

  1. Ambiguous allele combinations in HLA Class I and Class II sequence-based typing: when precise nucleotide sequencing leads to imprecise allele identification

    Directory of Open Access Journals (Sweden)

    Larsen Paula

    2004-09-01

    Full Text Available Abstract Sequence-based typing (SBT is one of the most comprehensive methods utilized for HLA typing. However, one of the inherent problems with this typing method is the interpretation of ambiguous allele combinations which occur when two or more different allele combinations produce identical sequences. The purpose of this study is to investigate the probability of this occurrence. We performed HLA-A,-B SBT for Exons 2 and 3 on 676 donors. Samples were analyzed with a capillary sequencer. The racial distribution of the donors was as follows: 615-Caucasian, 13-Asian, 23-African American, 17-Hispanic and 8-Unknown. 672 donors were analyzed for HLA-A locus ambiguities and 666 donors were analyzed for HLA-B locus ambiguities. At the HLA-A locus a total of 548 total ambiguous allele combinations were identified (548/1344 = 41%. Most (278/548 = 51% of these ambiguities were due to the fact that Exon 4 analysis was not performed. At the HLA-B locus 322 total ambiguous allele combinations were found (322/1332 = 24%. The HLA-B*07/08/15/27/35/44 antigens, common in Caucasians, produced a large portion of the ambiguities (279/322 = 87%. A large portion of HLA-A and B ambiguous allele combinations can be addressed by utilizing a group-specific primary amplification approach to produce an unambiguous homozygous sequence. Therefore, although the prevalence of ambiguous allele combinations is high, if the resolution of these ambiguities is clinically warranted, methods exist to compensate for this problem.

  2. Molecular typing of Legionella pneumophila isolates from environmental water samples and clinical samples using a five-gene sequence typing and standard Sequence-Based Typing.

    Science.gov (United States)

    Zhan, Xiao-Yong; Zhu, Qing-Yi

    2018-01-01

    Inadequate discriminatory power to distinguish between L. pneumophila isolates, especially those belonging to disease-related prevalent sequence types (STs) such as ST1, ST36 and ST47, is an issue of SBT scheme. In this study, we developed a multilocus sequence typing (MLST) scheme based on two non-virulence loci (trpA, cca) and three virulence loci (icmK, lspE, lssD), to genotype 110 L. pneumophila isolates from various natural and artificial water sources in Guangdong province of China, and compared with the SBT. The isolates were assigned to 33 STs of the SBT and 91 new sequence types (nSTs) of the MLST. The indices of discrimination (IODs) of SBT and MLST were 0.920 and 0.985, respectively. Maximum likelihood trees of the concatenated SBT and MLST sequences both showed distinct phylogenetic relationships between the isolates from the two environments. More intragenic recombinations were detected in nSTs than in STs, and they were both more abundant in natural water isolates. We found out the MLST had a high discriminatory ability for the disease-associated ST1 isolates: 22 ST1 isolates were assigned to 19 nSTs. Furthermore, we assayed the discrimination of the MLST for 29 reference strains (19 clinical and 10 environmental). The clinical strains were assigned to eight STs and ten nSTs. The MLST could also subtype the prevalent clinical ST36 or ST47 strains: eight ST36 strains were subtyped into three nSTs and two ST47 strains were subtyped into two nSTs. We found different distribution patterns of nSTs between the environmental and clinical ST36 isolates, and between the outbreak clinical ST36 isolates and the sporadic clinical ST36 isolates. These results together revealed the MLST scheme could be used as part of a typing scheme that increased discrimination when necessary.

  3. The effect of cognitive aging on implicit sequence learning and dual tasking

    Directory of Open Access Journals (Sweden)

    Jochen eVandenbossche

    2014-02-01

    Full Text Available We investigated the influence of attentional demands on sequence-specific learning by means of the serial reaction time (SRT task (Nissen & Bullemer, 1987 in young (age 18-25 and aged (age 55-75 adults. Participants had to respond as fast as possible to a stimulus presented in one of four horizontal locations by pressing a key corresponding to the spatial position of the stimulus. During the training phase sequential blocks were accompanied by (1 no secondary task (single, (2 a secondary tone counting task (dual tone, or (3 a secondary shape counting task (dual shape. Both secondary tasks were administered to investigate whether low and high interference tasks interact with implicit learning and age. The testing phase, under baseline single condition, was implemented to assess differences in sequence-specific learning between young and aged adults. Results indicate that (1 aged subjects show less sequence learning compared to young adults, (2 young participants show similar implicit learning effects under both single and dual task conditions when we account for explicit awareness, and (3 aged adults demonstrate reduced learning when the primary task is accompanied with a secondary task, even when explicit awareness is included as a covariate in the analysis. These findings point to implicit learning deficits under dual task conditions that can be related to cognitive aging, demonstrating the need for sufficient cognitive resources while performing a sequence learning task.

  4. Effect of sequence dispersity on morphology of tapered diblock copolymers from molecular dynamics simulations.

    Science.gov (United States)

    Levine, William G; Seo, Youngmi; Brown, Jonathan R; Hall, Lisa M

    2016-12-21

    Tapered diblock copolymers are similar to typical AB diblock copolymers but have an added transition region between the two blocks which changes gradually in composition from pure A to pure B. This tapered region can be varied from 0% (true diblock) to 100% (gradient copolymer) of the polymer length, and this allows some control over the microphase separated domain spacing and other material properties. We perform molecular dynamics simulations of linearly tapered block copolymers with tapers of various lengths, initialized from fluids density functional theory predictions. To investigate the effect of sequence dispersity, we compare systems composed of identical polymers, whose taper has a fixed sequence that most closely approximates a linear gradient, with sequentially disperse polymers, whose sequences are created statistically to yield the appropriate ensemble average linear gradient. Especially at high segregation strength, we find clear differences in polymer conformations and microstructures between these systems. Importantly, the statistical polymers are able to find more favorable conformations given their sequence, for instance, a statistical polymer with a larger fraction of A than the median will tend towards the A lamellae. The conformations of the statistically different polymers can thus be less stretched, and these systems have higher overall density. Consequently, the lamellae formed by statistical polymers have smaller domain spacing with sharper interfaces.

  5. An integrated genetic data environment (GDE)-based LINUX interface for analysis of HIV-1 and other microbial sequences.

    Science.gov (United States)

    De Oliveira, T; Miller, R; Tarin, M; Cassol, S

    2003-01-01

    Sequence databases encode a wealth of information needed to develop improved vaccination and treatment strategies for the control of HIV and other important pathogens. To facilitate effective utilization of these datasets, we developed a user-friendly GDE-based LINUX interface that reduces input/output file formatting. GDE was adapted to the Linux operating system, bioinformatics tools were integrated with microbe-specific databases, and up-to-date GDE menus were developed for several clinically important viral, bacterial and parasitic genomes. Each microbial interface was designed for local access and contains Genbank, BLAST-formatted and phylogenetic databases. GDE-Linux is available for research purposes by direct application to the corresponding author. Application-specific menus and support files can be downloaded from (http://www.bioafrica.net).

  6. Long-PCR based next generation sequencing of the whole mitochondrial genome of the peacock skate Pavoraja nitida (Elasmobranchii: Arhynchobatidae).

    Science.gov (United States)

    Yang, Lei; Naylor, Gavin J P

    2016-01-01

    We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.

  7. DNA Methylation Analysis by Bisulfite Conversion Coupled to Double Multiplexed Amplicon-Based Next-Generation Sequencing (NGS).

    Science.gov (United States)

    Bashtrykov, Pavel; Jeltsch, Albert

    2018-01-01

    Methylation of cytosine bases in DNA is one of the main epigenetic signals regulating gene expression and chromatin structure. The distribution of DNA methylation in the genome has a cell-type-specific pattern and can be modulated by internal or external stimuli. One of the most powerful approaches to investigate DNA methylation patterns is bisulfite conversion of the DNA followed by DNA sequencing, which allows to determine methylation patterns at a single-cytosine resolution. Here, we present a protocol for bisulfite DNA methylation analysis of targeted genomic regions using amplicon-based next-generation sequencing (NGS) on an Illumina sequencing system. We use a PCR-free library generation approach and implement a nested strategy for double molecular barcoding of samples (combining indexing of adapters and in-line barcoding of individual amplicons) which allows highly multiplexed sequencing. Also, we discuss the main limitations of this technology in particular in relation to clonal DNA amplification and other PCR artifacts.

  8. Whole genome sequencing reveals a 7 base-pair deletion in DMD exon 42 in a dog with muscular dystrophy.

    Science.gov (United States)

    Nghiem, Peter P; Bello, Luca; Balog-Alvarez, Cindy; López, Sara Mata; Bettis, Amanda; Barnett, Heather; Hernandez, Briana; Schatzberg, Scott J; Piercy, Richard J; Kornegay, Joe N

    2017-04-01

    Dystrophin is a key cytoskeletal protein coded by the Duchenne muscular dystrophy (DMD) gene located on the X-chromosome. Truncating mutations in the DMD gene cause loss of dystrophin and the classical DMD clinical syndrome. Spontaneous DMD gene mutations and associated phenotypes occur in several other species. The mdx mouse model and the golden retriever muscular dystrophy (GRMD) canine model have been used extensively to study DMD disease pathogenesis and show efficacy and side effects of putative treatments. Certain DMD gene mutations in high-risk, the so-called hot spot areas can be particularly helpful in modeling molecular therapies. Identification of specific mutations has been greatly enhanced by new genomic methods. Whole genome, next generation sequencing (WGS) has been recently used to define DMD patient mutations, but has not been used in dystrophic dogs. A dystrophin-deficient Cavalier King Charles Spaniel (CKCS) dog was evaluated at the functional, histopathological, biochemical, and molecular level. The affected dog's phenotype was compared to the previously reported canine dystrophinopathies. WGS was then used to detect a 7 base pair deletion in DMD exon 42 (c.6051-6057delTCTCAAT mRNA), predicting a frameshift in gene transcription and truncation of dystrophin protein translation. The deletion was confirmed with conventional PCR and Sanger sequencing. This mutation is in a secondary DMD gene hotspot area distinct from the one identified earlier at the 5' donor splice site of intron 50 in the CKCS breed.

  9. Geographic structure and demographic history of Iranian brown bear (Ursus arctos based on mtDNA control region sequences

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Ashrafzadeh

    2015-12-01

    Full Text Available In recent years, the brown bear's range has declined and its populations in some areas have faced extinction. Therefore, to have a comprehensive picture of genetic diversity and geographic structure of populations is essential for effective conservation strategies. In this research, we sequenced a 271bp segment of mtDNA control region of seven Iranian brown bears, where a total dataset of 467 sequences (brown and polar bears were used in analyses. Overall, 113 different haplotypes and 77 polymorphic sites were identified within the segment. Based on phylogenetic analyses, Iranian brown bears were not nested in any other clades. The low values of Nm (range=0.014-0.187 and high values of Fst (range=0.728-0.972 among Iranian bears and others revealed a genetically significant differentiation. We aren't found any significant signal of demographic reduction in Iranian bears. The time to the most recent common ancestor of Iranian brown bears (Northern Iran was found to be around 19000 BP.

  10. The historical biogeography of Pteroglossus aracaris (Aves, Piciformes, Ramphastidae based on Bayesian analysis of mitochondrial DNA sequences

    Directory of Open Access Journals (Sweden)

    Sérgio L. Pereira

    2008-01-01

    Full Text Available Most Neotropical birds, including Pteroglossus aracaris, do not have an adequate fossil record to be used as time constraints in molecular dating. Hence, the evolutionary timeframe of the avian biota can only be inferred using alternative time constraints. We applied a Bayesian relaxed clock approach to propose an alternative interpretation for the historical biogeography of Pteroglossus based on mitochondrial DNA sequences, using different combinations of outgroups and time constraints obtained from outgroup fossils, vicariant barriers and molecular time estimates. The results indicated that outgroup choice has little effect on the Bayesian posterior distribution of divergence times within Pteroglossus , that geological and molecular time constraints seem equally suitable to estimate the Bayesian posterior distribution of divergence times for Pteroglossus , and that the fossil record alone overestimates divergence times within the fossil-lacking ingroup. The Bayesian estimates of divergence times suggest that the radiation of Pteroglossus occurred from the Late Miocene to the Pliocene (three times older than estimated by the “standard” mitochondrial rate of 2% sequence divergence per million years, likely triggered by Andean uplift, multiple episodes of marine transgressions in South America, and formation of present-day river basins. The time estimates are in agreement with other Neotropical taxa with similar geographic distributions.

  11. Generating markers based on biotic stress of protein system in and tandem repeats sequence for Aquilaria sp

    International Nuclear Information System (INIS)

    Azhar Mohamad; Muhammad Hanif Azhari N; Siti Norhayati Ismail

    2014-01-01

    Aquilaria sp. belongs to the Thymelaeaceae family and is well distributed in Asia region. The species has multipurpose use from root to shoot and is an economically important crop, which generates wide interest in understanding genetic diversity of the species. Knowledge on DNA-based markers has become a prerequisite for more effective application of molecular marker techniques in breeding and mapping programs. In this work, both targeted genes and tandem repeat sequences were used for DNA fingerprinting in Aquilaria sp. A total of 100 ISSR (inter simple sequence repeat) primers and 50 combination pairs of specific primers derived from conserved region of a specific protein known as system in were optimized. 38 ISSR primers were found affirmative for polymorphism evaluation study and were generated from both specific and degenerate ISSR primers. And one utmost combination of system in primers showed significant results in distinguishing the Aquilaria sp. In conclusion, polymorphism derived from ISSR profiling and targeted stress genes of protein system in proved as a powerful approach for identification and molecular classification of Aquilaria sp. which will be useful for diversification in identifying any mutant lines derived from nature. (author)

  12. An innovative experimental sequence on electromagnetic induction and eddy currents based on video analysis and cheap data acquisition

    Science.gov (United States)

    Bonanno, A.; Bozzo, G.; Sapia, P.

    2017-11-01

    In this work, we present a coherent sequence of experiments on electromagnetic (EM) induction and eddy currents, appropriate for university undergraduate students, based on a magnet falling through a drilled aluminum disk. The sequence, leveraging on the didactical interplay between the EM and mechanical aspects of the experiments, allows us to exploit the students’ awareness of mechanics to elicit their comprehension of EM phenomena. The proposed experiments feature two kinds of measurements: (i) kinematic measurements (performed by means of high-speed video analysis) give information on the system’s kinematics and, via appropriate numerical data processing, allow us to get dynamic information, in particular on energy dissipation; (ii) induced electromagnetic field (EMF) measurements (by using a homemade multi-coil sensor connected to a cheap data acquisition system) allow us to quantitatively determine the inductive effects of the moving magnet on its neighborhood. The comparison between experimental results and the predictions from an appropriate theoretical model (of the dissipative coupling between the moving magnet and the conducting disk) offers many educational hints on relevant topics related to EM induction, such as Maxwell’s displacement current, magnetic field flux variation, and the conceptual link between induced EMF and induced currents. Moreover, the didactical activity gives students the opportunity to be trained in video analysis, data acquisition and numerical data processing.

  13. The Effect of Stress and Speech Rate on Vowel Coarticulation in Catalan Vowel-Consonant-Vowel Sequences

    Science.gov (United States)

    Recasens, Daniel

    2015-01-01

    Purpose: The goal of this study was to ascertain the effect of changes in stress and speech rate on vowel coarticulation in vowel-consonant-vowel sequences. Method: Data on second formant coarticulatory effects as a function of changing /i/ versus /a/ were collected for five Catalan speakers' productions of vowel-consonant-vowel sequences with the…

  14. SEAPATH: A microcomputer code for evaluating physical security effectiveness using adversary sequence diagrams

    International Nuclear Information System (INIS)

    Darby, J.L.

    1986-01-01

    The Adversary Sequence Diagram (ASD) concept was developed by Sandia National Laboratories (SNL) to examine physical security system effectiveness. Sandia also developed a mainframe computer code, PANL, to analyze the ASD. The authors have developed a microcomputer code, SEAPATH, which also analyzes ASD's. The Authors are supporting SNL in software development of the SAVI code; SAVI utilizes the SEAPATH algorithm to identify and quantify paths

  15. Sequencing Effects of Balance and Plyometric Training on Physical Performance in Youth Soccer Athletes.

    Science.gov (United States)

    Hammami, Raouf; Granacher, Urs; Makhlouf, Issam; Behm, David G; Chaouachi, Anis

    2016-12-01

    Hammami, R, Granacher, U, Makhlouf, I, Behm, DG, and Chaouachi, A. Sequencing effects of balance and plyometric training on physical performance in youth soccer athletes. J Strength Cond Res 30(12): 3278-3289, 2016-Balance training may have a preconditioning effect on subsequent power training with youth. There are no studies examining whether the sequencing of balance and plyometric training has additional training benefits. The objective was to examine the effect of sequencing balance and plyometric training on the performance of 12- to 13-year-old athletes. Twenty-four young elite soccer players trained twice per week for 8 weeks either with an initial 4 weeks of balance training followed by 4 weeks of plyometric training (BPT) or 4 weeks of plyometric training proceeded by 4 weeks of balance training (PBT). Testing was conducted pre- and posttraining and included medicine ball throw; horizontal and vertical jumps; reactive strength; leg stiffness; agility; 10-, 20-, and 30-m sprints; Standing Stork balance test; and Y-Balance test. Results indicated that BPT provided significantly greater improvements with reactive strength index, absolute and relative leg stiffness, triple hop test, and a trend for the Y-Balance test (p = 0.054) compared with PBT. Although all other measures had similar changes for both groups, the average relative improvement for the BPT was 22.4% (d = 1.5) vs. 15.0% (d = 1.1) for the PBT. BPT effect sizes were greater with 8 of 13 measures. In conclusion, although either sequence of BPT or PBT improved jumping, hopping, sprint acceleration, and Standing Stork and Y-Balance, BPT initiated greater training improvements in reactive strength index, absolute and relative leg stiffness, triple hop test, and the Y-Balance test. BPT may provide either similar or superior performance enhancements compared with PBT.

  16. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing.

    Science.gov (United States)

    Frampton, Garrett M; Fichtenholtz, Alex; Otto, Geoff A; Wang, Kai; Downing, Sean R; He, Jie; Schnall-Levin, Michael; White, Jared; Sanford, Eric M; An, Peter; Sun, James; Juhn, Frank; Brennan, Kristina; Iwanik, Kiel; Maillet, Ashley; Buell, Jamie; White, Emily; Zhao, Mandy; Balasubramanian, Sohail; Terzic, Selmira; Richards, Tina; Banning, Vera; Garcia, Lazaro; Mahoney, Kristen; Zwirko, Zac; Donahue, Amy; Beltran, Himisha; Mosquera, Juan Miguel; Rubin, Mark A; Dogan, Snjezana; Hedvat, Cyrus V; Berger, Michael F; Pusztai, Lajos; Lechner, Matthias; Boshoff, Chris; Jarosz, Mirna; Vietz, Christine; Parker, Alex; Miller, Vincent A; Ross, Jeffrey S; Curran, John; Cronin, Maureen T; Stephens, Philip J; Lipson, Doron; Yelensky, Roman

    2013-11-01

    As more clinically relevant cancer genes are identified, comprehensive diagnostic approaches are needed to match patients to therapies, raising the challenge of optimization and analytical validation of assays that interrogate millions of bases of cancer genomes altered by multiple mechanisms. Here we describe a test based on massively parallel DNA sequencing to characterize base substitutions, short insertions and deletions (indels), copy number alterations and selected fusions across 287 cancer-related genes from routine formalin-fixed and paraffin-embedded (FFPE) clinical specimens. We implemented a practical validation strategy with reference samples of pooled cell lines that model key determinants of accuracy, including mutant allele frequency, indel length and amplitude of copy change. Test sensitivity achieved was 95-99% across alteration types, with high specificity (positive predictive value >99%). We confirmed accuracy using 249 FFPE cancer specimens characterized by established assays. Application of the test to 2,221 clinical cases revealed clinically actionable alterations in 76% of tumors, three times the number of actionable alterations detected by current diagnostic tests.

  17. A quantum-inspired genetic algorithm based on probabilistic coding for multiple sequence alignment.

    Science.gov (United States)

    Huo, Hong-Wei; Stojkovic, Vojislav; Xie, Qiao-Luan

    2010-02-01

    Quantum parallelism arises from the ability of a quantum memory register to exist in a superposition of base states. Since the number of possible base states is 2(n), where n is the number of qubits in the quantum memory register, one operation on a quantum computer performs what an exponential number of operations on a classical computer performs. The power of quantum algorithms comes from taking advantages of quantum parallelism. Quantum algorithms are exponentially faster than classical algorithms. Genetic optimization algorithms are stochastic search algorithms which are used to search large, nonlinear spaces where expert knowledge is lacking or difficult to encode. QGMALIGN--a probabilistic coding based quantum-inspired genetic algorithm for multiple sequence alignment is presented. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The experimental results show that QGMALIGN can compete with the popular methods, such as CLUSTALX and SAGA, and performs well on the presenting biological data. Moreover, the addition of genetic operators to the quantum-inspired algorithm lowers the cost of overall running time.

  18. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks.

    Science.gov (United States)

    Avsec, Žiga; Barekatain, Mohammadamin; Cheng, Jun; Gagneur, Julien

    2017-11-16

    Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries, or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at goo.gl/3yMY5w. avsec@in.tum.de; gagneur@in.tum.de. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  19. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: Effects of sequence length and practice

    Directory of Open Access Journals (Sweden)

    Kevin J Reilly

    2013-10-01

    Full Text Available The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, 5 adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria.

  20. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

    Science.gov (United States)

    Reilly, Kevin J.; Spencer, Kristie A.

    2013-01-01

    The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121

  1. High diversity of airborne fungi in the hospital environment as revealed by meta-sequencing-based microbiome analysis

    OpenAIRE

    Xunliang Tong; Hongtao Xu; Lihui Zou; Meng Cai; Xuefeng Xu; Zuotao Zhao; Fei Xiao; Yanming Li

    2017-01-01

    Invasive fungal infections acquired in the hospital have progressively emerged as an important cause of life-threatening infection. In particular, airborne fungi in hospitals are considered critical pathogens of hospital-associated infections. To identify the causative airborne microorganisms, high-volume air samplers were utilized for collection, and species identification was performed using a culture-based method and DNA sequencing analysis with the Illumina MiSeq and HiSeq 2000 sequencing...

  2. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999; FINAL

    International Nuclear Information System (INIS)

    Harger, Carol A.

    1999-01-01

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence.[The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  3. Final Technical Report on the Genome Sequence DataBase (GSDB): DE-FG03 95 ER 62062 September 1997-September 1999

    Energy Technology Data Exchange (ETDEWEB)

    Harger, Carol A.

    1999-10-28

    Since September 1997 NCGR has produced two web-based tools for researchers to use to access and analyze data in the Genome Sequence DataBase (GSDB). These tools are: Sequence Viewer, a nucleotide sequence and annotation visualization tool, and MAR-Finder, a tool that predicts, base upon statistical inferences, the location of matrix attachment regions (MARS) within a nucleotide sequence. [The annual report for June 1996 to August 1997 is included as an attachment to this final report.

  4. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  5. Transcriptome walking: a laboratory-oriented GUI-based approach to mRNA identification from deep-sequenced data.

    Science.gov (United States)

    French, Andrew S

    2012-12-05

    Deep sequencing technology provides efficient and economical production of large numbers of randomly positioned, relatively short, estimates of base identities in DNA molecules. Application of this technology to mRNA samples allows rapid examination of the molecular genetic environment in individual cells or tissues, the transcriptome. However, assembly of such short sequences into complete mRNA creates a challenge that limits the usefulness of the technology, particularly when no, or limited, genomic data is available. Several approaches to this problem have been developed, but there is still no general method to rapidly obtain an mRNA sequence from deep sequence data when a specific molecule, or family of molecules, are of interest. A frequent requirement is to identify specific mRNA molecules from tissues that are being investigated by methods such as electrophysiology, immunocytology and pharmacology. To be widely useful, any approach must be relatively simple to use in the laboratory by operators without extensive statistical or bioinformatics knowledge, and with readily available hardware. An approach was developed that allows de novo assembly of individual mRNA sequences in two linked stages: sequence discovery and sequence completion. Both stages rely on computer assisted, Graphical User Interface (GUI)-guided, user interaction with the data, but proceed relatively efficiently once discovery is complete. The method grows a discovered sequence by repeated passes through the complete raw data in a series of steps, and is hence termed 'transcriptome walking'. All of the operations required for transcriptome analysis are combined in one program that presents a relatively simple user interface and runs on a standard desktop, or laptop computer, but takes advantage of multi-core processors, when available. Complete mRNA sequence identifications usually require less than 24 hours. This approach has already identified previously unknown mRNA sequences in two animal

  6. The effect of music background on the emotional appraisal of film sequences

    Directory of Open Access Journals (Sweden)

    Pavlović Ivanka

    2011-01-01

    Full Text Available In this study the effects of musical background on the emotional appraisal of film sequences was investigated. Four pairs of polar emotions defined in Plutchik’s model were used as basic emotional qualities: joy-sadness, anticipation-surprise, fear-anger, and trust disgust. In the preliminary study eight film sequences and eight music themes were selected as the best representatives of all eight Plutchik’s emotions. In the main experiment the participant judged the emotional qualities of film-music combinations on eight seven-point scales. Half of the combinations were congruent (e.g. joyful film - joyful music, and half were incongruent (e.g. joyful film - sad music. Results have shown that visual information (film had greater effects on the emotion appraisal than auditory information (music. The modulation effects of music background depend on emotional qualities. In some incongruent combinations (joysadness the modulations in the expected directions were obtained (e.g. joyful music reduces the sadness of a sad film, in some cases (anger-fear no modulation effects were obtained, and in some cases (trust-disgust, anticipation-surprise the modulation effects were in an unexpected direction (e.g. trustful music increased the appraisal of disgust of a disgusting film. These results suggest that the appraisals of conjoint effects of emotions depend on the medium (film masks the music and emotional quality (three types of modulation effects.

  7. Phylogenetic Resolution inJuglansBased on Complete Chloroplast Genomes and Nuclear DNA Sequences.

    Science.gov (United States)

    Dong, Wenpan; Xu, Chao; Li, Wenqing; Xie, Xiaoman; Lu, Yizeng; Liu, Yanlei; Jin, Xiaobai; Suo, Zhili

    2017-01-01

    Walnuts ( Juglans of the Juglandaceae) are well-known economically important resource plants for the edible nuts, high-quality wood, and medicinal use, with a distribution from tropical to temperate zones and from Asia to Europe and Americas. There are about 21 species in Juglans . Classification of Juglans at section level is problematic, because the phylogenetic position of Juglans cinerea is disputable. Lacking morphological and DNA markers severely inhibited the development of related researches. In this study, the complete chloroplast genomes and two nuclear DNA regions (the internal transcribed spacer and ubiquitin ligase gene) of 10 representative taxa of Juglans were used for comparative genomic analyses in order to deepen the understanding on the application value of genetic information for inferring the phylogenetic relationship of the genus. The Juglans chloroplast genomes possessed the typical quadripartite structure of angiosperms, consisting of a pair of inverted repeat regions separated by a large single-copy region and a small single-copy region. All the 10 chloroplast genomes possessed 112 unique genes arranged in the same order, including 78 protein-coding, 30 tRNA, and 4 rRNA genes. A combined sequence data set from two nuclear DNA regions revealed that Juglans plants could be classified into three branches: (1) section Juglans , (2) section Cardiocaryon including J. cinerea which is closer to J. mandshurica , and (3) section Rhysocaryon . However, three branches with a different phylogenetic topology were recognized in Juglans using the complete chloroplast genome sequences: (1) section Juglans , (2) section Cardiocaryon , and (3) section Rhysocaryon plus J. cinerea . The molecular taxonomy of Juglans is almost compatible to the morphological taxonomy except J. cinerea (section Trachycaryon ). Based on the complete chloroplast genome sequence data, the divergence time between section Juglans and section Cardiocaryon was 44.77 Mya, while section

  8. Phylogenetic Resolution in Juglans Based on Complete Chloroplast Genomes and Nuclear DNA Sequences

    Directory of Open Access Journals (Sweden)

    Wenpan Dong

    2017-06-01

    Full Text Available Walnuts (Juglans of the Juglandaceae are well-known economically important resource plants for the edible nuts, high-quality wood, and medicinal use, with a di