WorldWideScience

Sample records for high sequence similarity

  1. A fast Boyer-Moore type pattern matching algorithm for highly similar sequences.

    Science.gov (United States)

    Ben Nsira, Nadia; Lecroq, Thierry; Elloumi, Mourad

    2015-01-01

    In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences. In this paper we propose a very efficient algorithm that solves the exact pattern matching problem in a set of highly similar DNA sequences where only the pattern can be pre-processed. This new algorithm extends variants of the Boyer-Moore exact string matching algorithm. Experimental results show that it exhibits the best performances in practice.

  2. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus.

    Directory of Open Access Journals (Sweden)

    Kui Lin

    2014-01-01

    Full Text Available Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.

  3. Single Nucleus Genome Sequencing Reveals High Similarity among Nuclei of an Endomycorrhizal Fungus

    Science.gov (United States)

    Zhang, Zhonghua; Ivanov, Sergey; Saunders, Diane G. O.; Mu, Desheng; Pang, Erli; Cao, Huifen; Cha, Hwangho; Lin, Tao; Zhou, Qian; Shang, Yi; Li, Ying; Sharma, Trupti; van Velzen, Robin; de Ruijter, Norbert; Aanen, Duur K.; Win, Joe; Kamoun, Sophien; Bisseling, Ton; Geurts, René; Huang, Sanwen

    2014-01-01

    Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya. PMID:24415955

  4. DnaJ sequences of Bacillus cereus strains isolated from outbreaks of hospital infection are highly similar to Bacillus anthracis.

    Science.gov (United States)

    Zhang, Jiwei; van Hung, Pham; Hayashi, Masahiro; Yoshida, Shigeru; Ohkusu, Kiyofumi; Ezaki, Takayuki

    2011-07-01

    Bacillus cereus is becoming an important nomosomial pathogen because of frequent isolation from blood cultures and from severe systemic infections. To differentiate highly pathogenic outbreak strain of B. cereus from other sources of the Bacillus cereus, we attempted to analyze their dnaJ sequences. Assays indicated that dnaJ sequence similarity of all of 52 blood culture isolates of B. cereus ranged from 92.8% to 100%. The distance between B. anthracis and B. cereus except six outbreak isolates ranged from 3.8% to 6.4%. The dnaJ sequences of six outbreak strains of B. cereus (GTC 02891, GTC 02896, GTC 02916, GTC 02917, GTC 03221, and GTC 03222) were closely related to those of B. anthracis (99.2%-99.5% sequence similarity). Ba813 sequences were only found in the six outbreak strains of B. cereus. The other pathogenic factors of B. anthracis were not found in these six outbreak strains, with the exception of GTC 02891 (cap-positive). The six outbreak strains formed clear β-hemolytic colonies on a sheep blood agar plate. Our findings suggest that outbreak strains of B. cereus isolated from blood cultures are likely to have the risk of causing serious infection, and dnaJ and Ba813 are important markers to identify such strains. Phylogenetic analysis of dnaJ and MLST revealed that the six outbreak strains of B. cereus are closely related to B. anthracis.

  5. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus

    NARCIS (Netherlands)

    Lin, K.; Limpens, E.H.M.; Zhang, Z.; Ivanov, S.; Saunders, D.G.O.; Mu, D.; Pang, E.; Cao, H.; Cha, H.; Lin, T.; Zhou, Q.; Shang, Y.; Li, Y.; Sharma, T.C.; Velzen, van R.; Ruijter, de N.C.A.; Aanen, D.K.; Win, J.; Kamoun, S.; Bisseling, T.; Geurts, R.; Huang, S.W.

    2014-01-01

    Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. A

  6. High similarity of U2 snDNA sequence between A and B chromosomes in the grasshopper Abracris flavolineata.

    Science.gov (United States)

    Menezes-de-Carvalho, Nahanna Zimmermann; Palacios-Gimenez, Octavio Manuel; Milani, Diogo; Cabral-de-Mello, Diogo Cavalcanti

    2015-10-01

    B chromosomes are frequently enriched for a wide variety of repetitive DNAs. Among grasshoppers in the species Abracris flavolineata (Ommatolampidinae) the B chromosomes are submetacentric, C-negative and harbor repetitive DNAs such as, U2 snDNA, C 0 t-1 DNA, two Mariner-like elements and some microsatellites. Here, we provide evidence showing the intragenome similarity between the B chromosome and the A complement in A. flavolineata, combining analysis of microdissection and chromosome painting and B chromosome-specific amplification through polymerase chain reaction (PCR) of U2 snDNA. Chromosome painting revealed signals spread through the C-negative regions, including the A and B chromosomes. Moreover, significant clustered signals forming bands were observed in some A chromosomes, and for the B chromosome, significant signals were located on both arms, which could be caused by accumulation of repetitive DNA sequences. The C-positive regions did not reveal any signals. Sequence comparison of U2 snDNA between that obtained from a genome without the B chromosome and that from µB-DNA revealed high similarity with the occurrence of four shared haplotypes, one of them (i.e., Hap1) being highly prevalent and putatively ancestral. The highest divergence from Hap1 was observed for Hap3, which was caused by only six mutational steps. These data support an intraspecific origin of the B chromosome in A. flavolineata that is highly similar with the A complement, and the low U2 snDNA sequence diversity observed in the B chromosome could be related to its recent origin, besides intrachromosomal concerted evolution for U2 snDNA repeats in the B chromosome.

  7. Detailed protein sequence alignment based on Spectral Similarity Score (SSS

    Directory of Open Access Journals (Sweden)

    Thomas Dina

    2005-04-01

    Full Text Available Abstract Background The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain. Results Distance matrices of various branches of the human kinome, that is the full complement of human kinases, were developed that matched the phylogenetic tree of the human kinome establishing the efficacy of the global alignment of the algorithm. PKCd and PKCe kinases share close biological properties and structural similarities but do not give high scores with character based alignments. Detailed comparison established close similarities between subsequences that do not have any significant character identity. We compared their known 3D structures to establish that the algorithm is able to pick subsequences that are not considered similar by character based matching algorithms but share structural similarities. Similarly many subsequences with low character identity were picked between xyna-theau and xyna-clotm F/10 xylanases. Comparison of 3D structures of the subsequences confirmed the claim of similarity in structure. Conclusion An algorithm is developed which is inspired by successful application of spectral similarity applied to music sequences. The method captures subsequences that do not align by traditional character based alignment tools but give rise to similar secondary and tertiary structures. The Spectral

  8. Identifying cis-regulatory sequences by word profile similarity.

    Directory of Open Access Journals (Sweden)

    Garmay Leung

    Full Text Available BACKGROUND: Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples. METHODOLOGY/PRINCIPAL FINDINGS: We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila. CONCLUSIONS/SIGNIFICANCE: Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz.

  9. Sequence similarity network reveals common ancestry of multidomain proteins.

    Science.gov (United States)

    Song, Nan; Joseph, Jacob M; Davis, George B; Durand, Dannie

    2008-05-16

    We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain

  10. Sequence similarity network reveals common ancestry of multidomain proteins.

    Directory of Open Access Journals (Sweden)

    Nan Song

    2008-05-01

    Full Text Available We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1 an extension of the traditional model of homology to include domain insertions; and 2 a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with

  11. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  12. Similarity Measurement of Web Sessions Based on Sequence Alignment

    Institute of Scientific and Technical Information of China (English)

    LI Chaofeng; LU Yansheng

    2007-01-01

    The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity.The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail.Experiments have proved that our method is valid and efficient.

  13. A new high-throughput sequencing method for determining diversity and similarity of T cell receptor (TCR) α and β repertoires and identifying potential new invariant TCR α chains.

    Science.gov (United States)

    Kitaura, Kazutaka; Shini, Tadasu; Matsutani, Takaji; Suzuki, Ryuji

    2016-10-11

    High-throughput sequencing of T cell receptor (TCR) genes is a powerful tool for analyses of antigen specificity, clonality and diversity of T lymphocytes. Here, we developed a new TCR repertoire analysis method using 454 DNA sequencing technology in combination with an adaptor-ligation mediated polymerase chain reaction (PCR). This method allows the amplification of all TCR genes without PCR bias. To compare gene usage, diversity and similarity of expressed TCR repertoires among individuals, we conducted next-generation sequencing (NGS) of TRA and TRB genes in peripheral blood mononuclear cells from 20 healthy human individuals. From a total of 267,037 sequence reads from 20 individuals, 149,216 unique sequence reads were identified. Preferential usage of several V and J genes were observed while some recombinations of TRAV with TRAJ appeared to be restricted. The extent of TCR diversity was not significantly different between TRA and TRB, while TRA repertoires were more similar between individuals than TRB repertoires were. The interindividual similarity of TRA depended largely on the frequent presence of shared TCRs among two or more individuals. A publicly available TRA had a near-germline TCR with a shorter CDR3. Notably, shared TRA sequences, especially those shared among a large number of individuals', often contained TCRα related with invariant TCRα derived from invariant natural killer T cells and mucosal-associated invariant T cells. These results suggest that retrieval of shared TCRs by NGS would be useful for the identification of potential new invariant TCRα chains. This NGS method will enable the comprehensive quantitative analysis of TCR repertoires at a clonal level.

  14. The learning of two similar complex movement sequences: does practice insulate a sequence from interference?

    Science.gov (United States)

    Panzer, Stefan; Shea, Charles H

    2008-12-01

    Panzer et al. [Panzer, S., Wilde, H., & Shea, C. H. (2006). The learning of two similar complex movement sequences: Proactive and retroactive effects on learning. Journal of Motor Behavior, 38, 60-70] found evidence to indicate that the memory state(s) underpinning the production of a movement sequence that was practiced for one day was essentially "overwritten" when another similar sequence was subsequently practiced on the next day. An interference paradigm was used to determine if additional practice on the first sequence would insulate it from retroactive interference arising from learning a new similar sequence. Participants produced the sequences by moving a lever with their right arm/hand to sequentially presented target locations. The experimental group practiced one 16-element movement sequence (S1) for two consecutive days. A second 16-element sequence (S2) was practiced on Day 3. The sequence practiced on Day 3 was created by switching the positions of 2 of 16 elements in the sequence practiced on the first day. Control groups received either two days of practice on S1 or one day of practice on S2. Contrary to our earlier findings (Panzer, Wilde, & Shea, 2006) of strong retroactive interference when S1 was only practiced for one day, we found no evidence of retroactive interference when S1 was practiced for two days prior to the switch to S2 practice. Interestingly, but also contrary to our earlier findings, we found the learning of S2 was facilitated by the prior practice of S1. This proactive facilitation was observed in S2 acquisition and on the S2 retention test.

  15. Characterization of minisatellites in Arabidopsis thaliana with sequence similarity to the human minisatellite core sequence.

    Science.gov (United States)

    Tourmente, S; Deragon, J M; Lafleuriel, J; Tutois, S; Pélissier, T; Cuvillier, C; Espagnol, M C; Picard, G

    1994-08-25

    A strategy based on random PCR amplification was used to isolate new repetitive elements of Arabidopsis thaliana. One of the random PCR product analyzed by this approach contained a tandem repetitive minisatellite sequence composed of 33 bp repeated units. The genomic locus corresponding to this PCR product was isolated by screening a lambda genomic library. New related loci were also isolated from the genomic library by screening with a 14 mer oligonucleotide representing a region conserved among the different repeated units. Alignment of the consensus sequence for each minisatellite locus allowed the definition of an Arabidopsis thaliana core sequence that shows strong sequence similarities with the human core sequence and with the generalized recombination signal Chi of Escherichia coli. The minisatellites were tested for their ability to detect polymorphism, and their chromosomal position was established.

  16. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    Directory of Open Access Journals (Sweden)

    Chen Ke

    2008-05-01

    Full Text Available Abstract Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is

  17. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    Directory of Open Access Journals (Sweden)

    Matija Korpar

    Full Text Available In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  18. Comparison of microarray-predicted closest genomes to sequencing for poliovirus vaccine strain similarity and influenza A phylogeny.

    Science.gov (United States)

    Maurer-Stroh, Sebastian; Lee, Charlie W H; Patel, Champa; Lucero, Marilla; Nohynek, Hanna; Sung, Wing-Kin; Murad, Chrysanti; Ma, Jianmin; Hibberd, Martin L; Wong, Christopher W; Simões, Eric A F

    2016-03-01

    We evaluate sequence data from the PathChip high-density hybridization array for epidemiological interpretation of detected pathogens. For influenza A, we derive similar relative outbreak clustering in phylogenetic trees from PathChip-derived compared to classical Sanger-derived sequences. For a positive polio detection, recent infection could be excluded based on vaccine strain similarity.

  19. Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images

    Institute of Scientific and Technical Information of China (English)

    Yusei Kobori; Satoshi Mizuta

    2016-01-01

    Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.

  20. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

    Science.gov (United States)

    Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan

    2017-06-24

    The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn (2)) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to

  1. Phylogeny and prediction of genetic similarity of Cronobacter and related taxa by multilocus sequence analysis (MLSA).

    Science.gov (United States)

    Kuhnert, Peter; Korczak, Bozena M; Stephan, Roger; Joosten, Han; Iversen, Carol

    2009-12-31

    Multilocus sequence analysis (MLSA) based on recN, rpoA and thdF genes was done on more than 30 species of the family Enterobacteriaceae with a focus on Cronobacter and the related genus Enterobacter. The sequences provide valuable data for phylogenetic, taxonomic and diagnostic purposes. Phylogenetic analysis showed that the genus Cronobacter forms a homogenous cluster related to recently described species of Enterobacter, but distant to other species of this genus. Combining sequence information on all three genes is highly representative for the species' %GC-content used as taxonomic marker. Sequence similarity of the three genes and even of recN alone can be used to extrapolate genetic similarities between species of Enterobacteriaceae. Finally, the rpoA gene sequence, which is the easiest one to determine, provides a powerful diagnostic tool to identify and differentiate species of this family. The comparative analysis gives important insights into the phylogeny and genetic relatedness of the family Enterobacteriaceae and will serve as a basis for further studies and clarifications on the taxonomy of this large and heterogeneous family.

  2. Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search.

    Science.gov (United States)

    Ho, Shen-Shyang; Dai, Peng; Rudzicz, Frank

    2016-06-01

    Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold.

  3. Similar representations of sequence knowledge in young and older adults: A study of effector independent transfer

    Directory of Open Access Journals (Sweden)

    Jonathan Sebastiaan Barnhoorn

    2016-08-01

    Full Text Available Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is highly relevant in day-to-day motor activity and facilitates generalization of learning to new contexts. By using movement types that are completely unrelated in terms of muscle activation and response location, we focused on transfer facilitated by the early, visuospatial system.We tested 32 right-handed older adults (65 – 74 and 32 young adults (18 – 30. During practice of a discrete sequence production task, participants learned two 6-element sequences using either unimanual key-presses (KPs or by moving a lever with lower arm flexion-extension (FE movements. Each sequence was performed 144 times. They then performed a test phase consisting of familiar and random sequences performed with the type of movements not used during practice. Both age groups displayed transfer from FE to KP movements as indicated by faster performance on the familiar sequences in the test phase. Only young adults transferred their sequence knowledge from KP to FE movements. In both directions, the young showed higher transfer than older adults. These results suggest that the older participants, like the young, represented their sequences in an abstract visuospatial manner. Transfer was asymmetric in both age groups: there was more transfer from FE to KP movements than vice versa. This similar asymmetry is a further indication that the types of representations that older adults develop are comparable to those that young adults develop. We furthermore found that older adults improved less during FE practice, gained less explicit knowledge, displayed a smaller visuospatial working memory capacity and had lower processing speed than young adults. Despite the many differences

  4. Similarity

    Science.gov (United States)

    Apostol, Tom M. (Editor)

    1990-01-01

    In this 'Project Mathematics! series, sponsored by the California Institute for Technology (CalTech), the mathematical concept of similarity is presented. he history of and real life applications are discussed using actual film footage and computer animation. Terms used and various concepts of size, shape, ratio, area, and volume are demonstrated. The similarity of polygons, solids, congruent triangles, internal ratios, perimeters, and line segments using the previous mentioned concepts are shown.

  5. Cross-kingdom sequence similarities between human micro-RNAs and plant viruses.

    Science.gov (United States)

    Rebolledo-Mendez, Jovan D; Vaishnav, Radhika A; Cooper, Nigel G; Friedland, Robert P

    2013-09-01

    Micro-RNAs regulate the expression of cellular and tissue phenotypes at a post-transcriptional level through a complex process involving complementary interactions between micro-RNAs and messenger-RNAs. Similar nucleotide interactions have been shown to occur as cross-kingdom events; for example, between plant viruses and plant micro-RNAs and also between animal viruses and animal micro-RNAs. In this study, this view is expanded to look for cross-kingdom similarities between plant virus and human micro-RNA sequences. A method to identify significant nucleotoide sequence similarities between plant viruses and hsa micro-RNAs was created. Initial analyses demonstrate that plant viruses contain nucleotide sequences which exactly match the seed sequences of human micro-RNAs in both parallel and anti-parallel directions. For example, the bean common mosaic virus strain NL4 from Colombia contains sequences that match exactly the seed sequence for micro-RNA of the hsa-mir-1226 in the parallel direction, which suggests a cross-kingdom conservation. Similarly, the rice yellow stunt viral cRNA contains a sequence that is an exact match in the anti-parallel direction to the seed sequence of hsa-micro-RNA let-7b. The functional implications of these results need to be explored. The finding of these cross-kingdom sequence similarities is a useful starting point in support of bench level investigations.

  6. RAP: a computer program for exploring similarities in behavior sequences using random projections.

    Science.gov (United States)

    Quera, Vicenç

    2008-02-01

    A computer program (RAP, for random projection) for exploring similarities between and within sequences of behavior is presented. Given a time window of a sequence, the program calculates a signature, a real-valued vector that is a random projection of the contents of the window (i.e., the codes occurring within it and their relative location, or onset and offset times) into an arbitrary K-dimensional space. Then, given two different time windows from the same sequence or from different sequences, their similarity is computed as an inverse function of the Euclidean distance between their respective signatures. By defining moving (overlapped or not overlapped) windows along each sequence and calculating similarities between every pair of windows from the two sequences, a map of similarities or possible recurrent patterns is obtained; the RAP program represents them as gray-level lattices, which are displayed as mouse-sensitive images in an HTML file. Computation of similarities is based on the random projection method, as presented by Mannila and Seppänen (2001), for the analysis of sequences of events. The program reads sequence data files in Sequential Data Interchange Standard (SDIS) format (Bakeman Quera, 1992,1995a).

  7. MosaicFinder: identification of fused gene families in sequence similarity networks.

    Science.gov (United States)

    Jachiet, Pierre-Alain; Pogorelcnik, Romain; Berry, Anne; Lopez, Philippe; Bapteste, Eric

    2013-04-01

    Gene fusion is an important evolutionary process. It can yield valuable information to infer the interactions and functions of proteins. Fused genes have been identified as non-transitive patterns of similarity in triplets of genes. To be computationally tractable, this approach usually imposes an a priori distinction between a dataset in which fused genes are searched for, and a dataset that may have provided genetic material for fusion. This reduces the 'genetic space' in which fusion can be discovered, as only a subset of triplets of genes is investigated. Moreover, this approach may have a high-false-positive rate, and it does not identify gene families descending from a common fusion event. We represent similarities between sequences as a network. This leads to an efficient formulation of previous methods of fused gene identification, which we implemented in the Python program FusedTriplets. Furthermore, we propose a new characterization of families of fused genes, as clique minimal separators of the sequence similarity network. This well-studied graph topology provides a robust and fast method of detection, well suited for automatic analyses of big datasets. We implemented this method in the C++ program MosaicFinder, which additionally uses local alignments to discard false-positive candidates and indicates potential fusion points. The grouping into families will help distinguish sequencing or prediction errors from real biological fusions, and it will yield additional insight into the function and history of fused genes. FusedTriplets and MosaicFinder are published under the GPL license and are freely available with their source code at this address: http://sourceforge.net/projects/mosaicfinder. Supplementary data are available at Bioinformatics online.

  8. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

    Directory of Open Access Journals (Sweden)

    Holly J Atkinson

    Full Text Available The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

  9. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

    2005-01-01

    Motivation: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding todya as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...

  10. Protein sequence alignment with family-specific amino acid similarity matrices

    Science.gov (United States)

    2011-01-01

    Background Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment. Findings I utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned. Conclusions The results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm. PMID:21846354

  11. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Science.gov (United States)

    Satoh, Soichirou; Mimuro, Mamoru; Tanaka, Ayumi

    2013-01-01

    Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  12. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    Directory of Open Access Journals (Sweden)

    Jaimie-Leigh Jonker

    Full Text Available Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes. It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa. Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes. Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa are more conserved within barnacles than others (20 kDa.

  13. Density-based retrieval from high-similarity image databases

    DEFF Research Database (Denmark)

    Hansen, Michael Edberg; Carstensen, Jens Michael

    2004-01-01

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...

  14. Efficient estimation for high similarities using odd sketches

    DEFF Research Database (Denmark)

    Mitzenmacher, Michael; Pagh, Rasmus; Pham, Ninh Dang

    2014-01-01

    . This means that Odd Sketches provide a highly space-efficient estimator for sets of high similarity, which is relevant in applications such as web duplicate detection, collaborative filtering, and association rule learning. The method extends to weighted Jaccard similarity, relevant e.g. for TF-IDF vector...

  15. Using homology relations within a database markedly boosts protein sequence similarity search.

    Science.gov (United States)

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-02

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

  16. Sequence similarity network reveals the imprints of major diversification events in the evolution of microbial life

    Directory of Open Access Journals (Sweden)

    Shu eCheng

    2014-11-01

    Full Text Available Ancient transitions, such as between life that evolved in a reducing versus an oxidizing atmosphere precipitated by the Great Oxygenation Event (GOE ca. 2.4 billion years ago, fundamentally altered the space in which prokaryotes could derive metabolic energy. Despite fundamental changes in Earth’s redox state, there are very few comprehensive, proteome-wide analyses about the effects of these changes on gene content and evolution. Here, using a pan-proteome sequence similarity network applied to broadly sampled lifestyles of 84 prokaryotes that were categorized into four different redox groups (i.e., methanogens, obligate anaerobes, facultative anaerobes, and obligate aerobes, we reconstructed the genetic inventory of major respiratory communities. We show that a set of putative core homologs that is highly conserved in prokaryotic proteomes is characterized by the loss of canonical network connections and low conductance that correlates with differences in respiratory phenotypes. We suggest these different network patterns observed for different respiratory communities could be explained by two major evolutionary diversification events in the history of microbial life. The first event (M is a divergence between methanogenesis and other anaerobic lifestyles in prokaryotes (archaebacteria and eubacteria. The second diversification event (OX is from anaerobic to aerobic lifestyles that left a proteome-wide footprint among prokaryotes. Additional analyses revealed that oxidoreductase evolution played a central role in these two diversification events. Distinct cofactor binding domains were frequently recombined, allowing these enzymes to utilize increasingly oxidized substrates with high specificity.

  17. HAMSA: Highly Accelerated Multiple Sequence Aligner

    Directory of Open Access Journals (Sweden)

    Naglaa M. Reda

    2016-06-01

    Full Text Available For biologists, the existence of an efficient tool for multiple sequence alignment is essential. This work presents a new parallel aligner called HAMSA. HAMSA is a bioinformatics application designed for highly accelerated alignment of multiple sequences of proteins and DNA/RNA on a multi-core cluster system. The design of HAMSA is based on a combination of our new optimized algorithms proposed recently of vectorization, partitioning, and scheduling. It mainly operates on a distance vector instead of a distance matrix. It accomplishes similarity computations and generates the guide tree in a highly accelerated and accurate manner. HAMSA outperforms MSAProbs with 21.9- fold speedup, and ClustalW-MPI of 11-fold speedup. It can be considered as an essential tool for structure prediction, protein classification, motive finding and drug design studies.

  18. Sequence similarity is more relevant than species specificity in probabilistic backtranslation

    Directory of Open Access Journals (Sweden)

    Di Pietro Cinzia

    2007-02-01

    Full Text Available Abstract Background Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. Results This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. Conclusion The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.

  19. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  20. ON THE POWER AND LIMITS OF SEQUENCE SIMILARITY BASED CLUSTERING OF PROTEINS INTO FAMILIES

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... in an automated fashion. Our analysis demonstrates the benefits and limitations of the clustering of proteins with low sequence similarity indicating that each protein family requires its own distinct set of tools and parameters. All results, a tool prediction service, and additional supporting material is also...

  1. An Axiomatic Approach to the notion of Similarity of individual Sequences and their Classification

    CERN Document Server

    Ziv, Jacob

    2011-01-01

    An axiomatic approach to the notion of similarity of sequences, that seems to be natural in many cases (e.g. Phylogenetic analysis), is proposed. Despite of the fact that it is not assume that the sequences are a realization of a probabilistic process (e.g. a variable-order Markov process), it is demonstrated that any classifier that fully complies with the proposed similarity axioms must be based on modeling of the training data that is contained in a (long) individual training sequence via a suffix tree with no more than O(N) leaves (or, alternatively, a table with O(N) entries) where N is the length of the test sequence. Some common classification algorithms may be slightly modified to comply with the proposed axiomatic conditions and the resulting organization of the training data, thus yielding a formal justification for their good empirical performance without relying on any a-priori (sometimes unjustified) probabilistic assumption. One such case is discussed in details.

  2. Density-based retrieval from high-similarity image databases

    DEFF Research Database (Denmark)

    Hansen, Michael Edberg; Carstensen, Jens Michael

    2004-01-01

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...... a method for HSID retrieval using a similarity measure based on a linear combination of Jeffreys-Matusita distances between distributions of local (pixelwise) features estimated from a set of automatically and consistently defined image regions. The weight coefficients are estimated based on optimal...... retrieval performance. Experimental results on the difficult task of visually identifying clones of fungal colonies grown in a petri dish and categorization of pelts show a high retrieval accuracy of the method when combined with standardized sample preparation and image acquisition....

  3. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  4. Similarity spectra analysis of high-performance jet aircraft noise.

    Science.gov (United States)

    Neilsen, Tracianne B; Gee, Kent L; Wall, Alan T; James, Michael M

    2013-04-01

    Noise measured in the vicinity of an F-22A Raptor has been compared to similarity spectra found previously to represent mixing noise from large-scale and fine-scale turbulent structures in laboratory-scale jet plumes. Comparisons have been made for three engine conditions using ground-based sideline microphones, which covered a large angular aperture. Even though the nozzle geometry is complex and the jet is nonideally expanded, the similarity spectra do agree with large portions of the measured spectra. Toward the sideline, the fine-scale similarity spectrum is used, while the large-scale similarity spectrum provides a good fit to the area of maximum radiation. Combinations of the two similarity spectra are shown to match the data in between those regions. Surprisingly, a combination of the two is also shown to match the data at the farthest aft angle. However, at high frequencies the degree of congruity between the similarity and the measured spectra changes with engine condition and angle. At the higher engine conditions, there is a systematically shallower measured high-frequency slope, with the largest discrepancy occurring in the regions of maximum radiation.

  5. Base composition, size and sequence similarities of genoma deoxyribonucleic acids from clinical isolates of Pseudomonas putrefaciens.

    Science.gov (United States)

    Owen, R J; Legors, R M; Lapage, S P

    1978-01-01

    The mean base compositions of DNA from 27 strains of Pseudomonas putrefaciens, P. rubescens and P. piscicida ranged from 43-4 to 53-2 mol% GC with genome sizes from 3.04 X 10(9) to 4.23 X 10(9) daltons. On the basis of in vitro DNA-DNA binding, estimated spectrophotometrically from initial renaturation rates, P. putrefaciens strains were heterogenous in the extent to which they shared similar nucleotide sequences, and were divided into four DNA homology groups. The DNA characteristics of strains in these groups correlated with several biochemical characteristics that facilitated identification of clinical isolates of P. putrefaciens. The two species P. putrefaciens and P. rubescens appear to be synonymous and none of the four groups of P. putrefaciens was related in DNA sequences to P. pisicida. Pseudomonas putrefaciens should theretofore be retained as a single species and characteristics for identifying the various groups within the species are listed.

  6. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    Energy Technology Data Exchange (ETDEWEB)

    Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  7. Ultra-fast sequence clustering from similarity networks with SiLiX

    Directory of Open Access Journals (Sweden)

    Duret Laurent

    2011-04-01

    Full Text Available Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package SiLiX that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, SiLiX presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. SiLiX is freely available at http://lbbe.univ-lyon1.fr/SiLiX.

  8. Sequence Similarity and Functional Relationship Among Eukaryotic ZIP and CDF Transporters

    Institute of Scientific and Technical Information of China (English)

    Taiho Kambe; Tomoyuki Suzuki; Masaya Nagao; Yuko Yamaguchi-Iwai

    2006-01-01

    ZIP (ZRT/IRT-like Protein) and CDF (Cation Diffusion Facilitator) are two large metal transporter families mainly transporting zinc into and out of the cytosol.Several ZIP and CDF transporters have been characterized in mammals and various model organisms, such as yeast, nematode, fruit fly, and zebrafish, and many candidate genes have been identified by genome projects. Unexpected functions of ZIP and CDF transporters have been recently reported in some model organisms,leading to major advances in our understanding of the functions of mammalian counterparts. Here, we review the recent information on the sequence similarity and functional relationship among eukaryotic ZIP and CDF transporters obtained from the representative model organisms.

  9. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy.

    Science.gov (United States)

    Zou, Quan; Hu, Qinghua; Guo, Maozu; Wang, Guohua

    2015-08-01

    Multiple sequence alignment (MSA) is important work, but bottlenecks arise in the massive MSA of homologous DNA or genome sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research. We developed two software tools to address the DNA MSA problem. The first employed trie trees to accelerate the centre star MSA strategy. The expected time complexity was decreased to linear time from square time. To address large-scale data, parallelism was applied using the hadoop platform. Experiments demonstrated the performance of our proposed methods, including their running time, sum-of-pairs scores and scalability. Moreover, we supplied two massive DNA/RNA MSA datasets for further testing and research. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    Directory of Open Access Journals (Sweden)

    Manzini Giovanni

    2007-07-01

    Full Text Available Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity, NCD (Normalized Compression Dissimilarity and CD (Compression Dissimilarity. Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC

  11. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Adaptive Sampling for High Throughput Data Using Similarity Measures

    Energy Technology Data Exchange (ETDEWEB)

    Bulaevskaya, V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sales, A. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-05-06

    The need for adaptive sampling arises in the context of high throughput data because the rates of data arrival are many orders of magnitude larger than the rates at which they can be analyzed. A very fast decision must therefore be made regarding the value of each incoming observation and its inclusion in the analysis. In this report we discuss one approach to adaptive sampling, based on the new data point’s similarity to the other data points being considered for inclusion. We present preliminary results for one real and one synthetic data set.

  13. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  14. Scaling Relations of Local Magnitude versus Moment Magnitude for Sequences of Similar Earthquakes in Switzerland

    KAUST Repository

    Bethmann, F.

    2011-03-22

    Theoretical considerations and empirical regressions show that, in the magnitude range between 3 and 5, local magnitude, ML, and moment magnitude, Mw, scale 1:1. Previous studies suggest that for smaller magnitudes this 1:1 scaling breaks down. However, the scatter between ML and Mw at small magnitudes is usually large and the resulting scaling relations are therefore uncertain. In an attempt to reduce these uncertainties, we first analyze the ML versus Mw relation based on 195 events, induced by the stimulation of a geothermal reservoir below the city of Basel, Switzerland. Values of ML range from 0.7 to 3.4. From these data we derive a scaling of ML ~ 1:5Mw over the given magnitude range. We then compare peak Wood-Anderson amplitudes to the low-frequency plateau of the displacement spectra for six sequences of similar earthquakes in Switzerland in the range of 0:5 ≤ ML ≤ 4:1. Because effects due to the radiation pattern and to the propagation path between source and receiver are nearly identical at a particular station for all events in a given sequence, the scatter in the data is substantially reduced. Again we obtain a scaling equivalent to ML ~ 1:5Mw. Based on simulations using synthetic source time functions for different magnitudes and Q values estimated from spectral ratios between downhole and surface recordings, we conclude that the observed scaling can be explained by attenuation and scattering along the path. Other effects that could explain the observed magnitude scaling, such as a possible systematic increase of stress drop or rupture velocity with moment magnitude, are masked by attenuation along the path.

  15. Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX

    Science.gov (United States)

    Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-01-01

    Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers. PMID:20161784

  16. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  17. A novel phytase with sequence similarity to purple acid phosphatases is expressed in cotyledons of germinating soybean seedlings.

    Science.gov (United States)

    Hegeman, C E; Grabau, E A

    2001-08-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases.

  18. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  19. Absent words and the (dis)similarity analysis of DNA sequences:An experimental study

    OpenAIRE

    Rahman, Mohammad Saifur; Alatabbi, Ali; Athar, Tanver; Crochemore, Maxime; Rahman, M. Sohel

    2016-01-01

    Background: An absent word with respect to a sequence is a word that does not occur in the sequence as a factor; an absent word is minimal if all its factors on the other hand occur in that sequence. In this paper we explore the idea of using minimal absent words (MAW) to compute the distance between two biological sequences. The motivation and rationale of our work comes from the potential advantage of being able to extract as little information as possible from large genomic sequences to re...

  20. Absent words and the (dis)similarity analysis of DNA sequences: an experimental study

    OpenAIRE

    Rahman, Mohammad Saifur; Alatabbi, Ali; Athar, Tanver; Crochemore, Maxime; Rahman, M. Sohel

    2016-01-01

    Background An absent word with respect to a sequence is a word that does not occur in the sequence as a factor; an absent word is minimal if all its factors on the other hand occur in that sequence. In this paper we explore the idea of using minimal absent words (MAW) to compute the distance between two biological sequences. The motivation and rationale of our work comes from the potential advantage of being able to extract as little information as possible from large genomic sequences to rea...

  1. DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

    CERN Document Server

    Afify, Heba; Wahed, Manal Abdel

    2011-01-01

    Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information theory are often perceived as being of interest for data communication and storage. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison of genomic databases. This paper presents a differential compression algorithm that is based on production of difference sequences according to op-code table in order to optimize the compression of homologous sequences in dataset. Therefore, the stored data are composed of reference sequence, the set of differences, and differences locations, instead of storing each sequence individually. This algorithm does not require a priori knowledge about the statistics of the sequence set. The...

  2. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data.

    Science.gov (United States)

    Zhao, Yongan; Tang, Haixu; Ye, Yuzhen

    2012-01-01

    With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20-90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode. Implemented in C++, the source code is freely available for download at the RAPSearch2 website: http://omics.informatics.indiana.edu/mg/RAPSearch2/. yye@indiana.edu Available at the RAPSearch2 website.

  3. Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison.

    Science.gov (United States)

    Hoang, Tung; Yin, Changchuan; Yau, Stephen S-T

    2016-10-01

    Numerical encoding plays an important role in DNA sequence analysis via computational methods, in which numerical values are associated with corresponding symbolic characters. After numerical representation, digital signal processing methods can be exploited to analyze DNA sequences. To reflect the biological properties of the original sequence, it is vital that the representation is one-to-one. Chaos Game Representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane that allows the depiction of the DNA sequence in the form of image. Using CGR, a biological sequence can be transformed one-to-one to a numerical sequence that preserves the main features of the original sequence. In this research, we propose to encode DNA sequences by considering 2D CGR coordinates as complex numbers, and apply digital signal processing methods to analyze their evolutionary relationship. Computational experiments indicate that this approach gives comparable results to the state-of-the-art multiple sequence alignment method, Clustal Omega, and is significantly faster. The MATLAB code for our method can be accessed from: www.mathworks.com/matlabcentral/fileexchange/57152.

  4. Efficient estimation for high similarities using odd sketches

    DEFF Research Database (Denmark)

    Mitzenmacher, Michael; Pagh, Rasmus; Pham, Ninh Dang

    2014-01-01

    Estimating set similarity is a central problem in many computer applications. In this paper we introduce the Odd Sketch, a compact binary sketch for estimating the Jaccard similarity of two sets. The exclusive-or of two sketches equals the sketch of the symmetric difference of the two sets. This ...

  5. Implicit Learning of Musical Timbre Sequences: Statistical Regularities Confronted With Acoustical (Dis)Similarities

    Science.gov (United States)

    Tillmann, Barbara; McAdams, Stephen

    2004-01-01

    The present study investigated the influence of acoustical characteristics on the implicit learning of statistical regularities (transition probabilities) in sequences of musical timbres. The sequences were constructed in such a way that the acoustical dissimilarities between timbres potentially created segmentations that either supported (S1) or…

  6. Implicit Learning of Musical Timbre Sequences: Statistical Regularities Confronted With Acoustical (Dis)Similarities

    Science.gov (United States)

    Tillmann, Barbara; McAdams, Stephen

    2004-01-01

    The present study investigated the influence of acoustical characteristics on the implicit learning of statistical regularities (transition probabilities) in sequences of musical timbres. The sequences were constructed in such a way that the acoustical dissimilarities between timbres potentially created segmentations that either supported (S1) or…

  7. A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding

    Science.gov (United States)

    Jin, Xin; Nie, Rencan; Zhou, Dongming; Yao, Shaowen; Chen, Yanyan; Yu, Jiefu; Wang, Quan

    2016-11-01

    A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

  8. Evidence for Deep Regulatory Similarities in Early Developmental Programs across Highly Diverged Insects

    Science.gov (United States)

    Zhang, Yinan; Samee, Md. Abul Hassan; Halfon, Marc S.; Sinha, Saurabh

    2014-01-01

    Many genes familiar from Drosophila development, such as the so-called gap, pair-rule, and segment polarity genes, play important roles in the development of other insects and in many cases appear to be deployed in a similar fashion, despite the fact that Drosophila-like “long germband” development is highly derived and confined to a subset of insect families. Whether or not these similarities extend to the regulatory level is unknown. Identification of regulatory regions beyond the well-studied Drosophila has been challenging as even within the Diptera (flies, including mosquitoes) regulatory sequences have diverged past the point of recognition by standard alignment methods. Here, we demonstrate that methods we previously developed for computational cis-regulatory module (CRM) discovery in Drosophila can be used effectively in highly diverged (250–350 Myr) insect species including Anopheles gambiae, Tribolium castaneum, Apis mellifera, and Nasonia vitripennis. In Drosophila, we have successfully used small sets of known CRMs as “training data” to guide the search for other CRMs with related function. We show here that although species-specific CRM training data do not exist, training sets from Drosophila can facilitate CRM discovery in diverged insects. We validate in vivo over a dozen new CRMs, roughly doubling the number of known CRMs in the four non-Drosophila species. Given the growing wealth of Drosophila CRM annotation, these results suggest that extensive regulatory sequence annotation will be possible in newly sequenced insects without recourse to costly and labor-intensive genome-scale experiments. We develop a new method, Regulus, which computes a probabilistic score of similarity based on binding site composition (despite the absence of nucleotide-level sequence alignment), and demonstrate similarity between functionally related CRMs from orthologous loci. Our work represents an important step toward being able to trace the evolutionary

  9. Evidence for deep regulatory similarities in early developmental programs across highly diverged insects.

    Science.gov (United States)

    Kazemian, Majid; Suryamohan, Kushal; Chen, Jia-Yu; Zhang, Yinan; Samee, Md Abul Hassan; Halfon, Marc S; Sinha, Saurabh

    2014-09-01

    Many genes familiar from Drosophila development, such as the so-called gap, pair-rule, and segment polarity genes, play important roles in the development of other insects and in many cases appear to be deployed in a similar fashion, despite the fact that Drosophila-like "long germband" development is highly derived and confined to a subset of insect families. Whether or not these similarities extend to the regulatory level is unknown. Identification of regulatory regions beyond the well-studied Drosophila has been challenging as even within the Diptera (flies, including mosquitoes) regulatory sequences have diverged past the point of recognition by standard alignment methods. Here, we demonstrate that methods we previously developed for computational cis-regulatory module (CRM) discovery in Drosophila can be used effectively in highly diverged (250-350 Myr) insect species including Anopheles gambiae, Tribolium castaneum, Apis mellifera, and Nasonia vitripennis. In Drosophila, we have successfully used small sets of known CRMs as "training data" to guide the search for other CRMs with related function. We show here that although species-specific CRM training data do not exist, training sets from Drosophila can facilitate CRM discovery in diverged insects. We validate in vivo over a dozen new CRMs, roughly doubling the number of known CRMs in the four non-Drosophila species. Given the growing wealth of Drosophila CRM annotation, these results suggest that extensive regulatory sequence annotation will be possible in newly sequenced insects without recourse to costly and labor-intensive genome-scale experiments. We develop a new method, Regulus, which computes a probabilistic score of similarity based on binding site composition (despite the absence of nucleotide-level sequence alignment), and demonstrate similarity between functionally related CRMs from orthologous loci. Our work represents an important step toward being able to trace the evolutionary history of gene

  10. Distinguishing authentic mitochondrial and plastid DNAs from similar DNA sequences in the nucleus using the polymerase chain reaction.

    Science.gov (United States)

    Kumar, Rachana A; Bendich, Arnold J

    2011-08-01

    DNA sequences similar to those in the organellar genomes are also found in the nucleus. These non-coding sequences may be co-amplified by PCR with the authentic organellar DNA sequences, leading to erroneous conclusions. To avoid this problem, we describe an experimental procedure to prevent amplification of this "promiscuous" DNA when total tissue DNA is used with PCR. First, primers are designed for organelle-specific sequences using a bioinformatics method. These primers are then tested using methylation-sensitive PCR. The method is demonstrated for both end-point and real-time PCR with Zea mays, where most of the DNA sequences in the organellar genomes are also present in the nucleus. We use this procedure to quantify those nuclear DNA sequences that are near-perfect replicas of organellar DNA. This method should be useful for applications including phylogenetic analysis, organellar DNA quantification and clinical testing.

  11. On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence

    Directory of Open Access Journals (Sweden)

    Theobald Douglas L

    2011-11-01

    Full Text Available Abstract Background The universal common ancestry (UCA of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. Results For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation, readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial

  12. Similar Representations of Sequence Knowledge in Young and Older Adults: A Study of Effector Independent Transfer

    NARCIS (Netherlands)

    Barnhoorn, Jonathan Sebastiaan; Döhring, Falko R.; van Asseldonk, Edwin H.F.; Verwey, Willem B.

    2016-01-01

    Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is

  13. Testing statistical significance scores of sequence comparison methods with structure similarity

    NARCIS (Netherlands)

    Hulsen, T.; Vlieg, J. de; Leunissen, J.A.M.; Groenen, P.M.

    2006-01-01

    BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical s

  14. Testing statistical significance scores of sequence comparison methods with structure similarity

    NARCIS (Netherlands)

    Hulsen, T.; Vlieg, de J.; Leunissen, J.A.M.; Groenen, P.

    2006-01-01

    Background - In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical

  15. Similar Representations of Sequence Knowledge in Young and Older Adults: A Study of Effector Independent Transfer

    NARCIS (Netherlands)

    Barnhoorn, Jonathan S.; Döhring, Falko R.; Asseldonk, van Edwin H.F.; Verwey, Willem B.

    2016-01-01

    Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is highl

  16. Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

    Directory of Open Access Journals (Sweden)

    Apurva Barve

    2013-01-01

    Full Text Available Xeroderma pigmentosum group A (XPA is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1 and replication protein A 70 kDa subunit (RPA70 proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.

  17. Identification of similar regions of protein structures using integrated sequence and structure analysis tools

    Directory of Open Access Journals (Sweden)

    Heiland Randy

    2006-03-01

    Full Text Available Abstract Background Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site http://www.sblest.org/ and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. Results Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein. Conclusion With structural genomics initiatives determining structures with little, if any, functional characterization

  18. The Question of Similar Sequence in Development across Cultures and the Method of Critical Exploration.

    Science.gov (United States)

    Lister, Caroline; And Others

    1993-01-01

    Nonretarded (NR) and educable mentally retarded (EMR) 6- to 19-year-old children in Istanbul, Turkey, completed conservation tasks. Found a similarity in the developmental progression of conservation concepts between the two groups and between these groups and groups of NR and EMR children in England as reported in previous studies. (BC)

  19. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  20. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing.

    Science.gov (United States)

    Li, Kelvin; Shrivastava, Susmita; Brownley, Anushka; Katzel, Dan; Bera, Jayati; Nguyen, Anh Thu; Thovarai, Vishal; Halpin, Rebecca; Stockwell, Timothy B

    2012-11-06

    In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute's (JCVI) high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus sequence that encapsulates the allelic variation of the targeted

  1. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...... splicing events and coding potential of isoforms from full isoform deconvolution software, such as Cufflinks (article II), is presented. Finally, a study using 5’-end RNA-seq for alternative promoter detection between healthy patients and patients with acute promyelocytic leukemia is presented (article III...

  2. Correlating low-similarity peptide sequences and HIV B-cell epitopes.

    Science.gov (United States)

    Kanduc, Darja; Serpico, Rosario; Lucchese, Alberta; Shoenfeld, Yehuda

    2008-02-01

    Although a large number of human immunodeficiency virus-1 (HIV-1) derived B-cell epitopes has been experimentally identified, the structural requirements underlying HIV humoral immune response remain unknown. Here, we review the current literature on HIV B-cell epitopes as catalogued in the www.hiv.lanl.gov/content/immunology website, searching for common structural and/or functional immunogenic motifs. The analysis of HIV antibody data documents that the linear determinants recognized by human or murine humoral immune responses, are (or harbor) pentapeptide fragments with no or only very low similarity to the respective host proteome. The present literature analysis provides relevant insights that may be applied to design anti-HCV therapeutic approaches exempt from autoimmune collateral effects.

  3. Sequence diversity, cytotoxicity and antigenic similarities of the leukotoxin of isolates of Mannheimia species from mastitis in domestic sheep.

    Science.gov (United States)

    Omaleki, Lida; Browning, Glenn F; Barber, Stuart R; Allen, Joanne L; Srikumaran, Subramaniam; Markham, Philip F

    2014-11-07

    Species within the genus Mannheimia are among the most important causes of ovine mastitis. Isolates of these species can express leukotoxin A (LktA), a primary virulence factor of these bacteria. To examine the significance of variation in the LktA, the sequences of the lktA genes in a panel of isolates from cases of ovine mastitis were compared. The cross-neutralising capacities of rat antisera raised against LktA of one Mannheimia glucosida, one haemolytic Mannheimia ruminalis, and two Mannheimia haemolytica isolates were also examined to assess the effect that variation in the lktA gene can have on protective immunity against leukotoxins with differing sequences. The lktA nucleotide distance between the M. haemolytica isolates was greater than between the M. glucosida isolates, with the M. haemolytica isolates divisible into two groups based on their lktA sequences. Comparison of the topology of phylogenetic trees of 16S rDNA and lktA sequences revealed differences in the relationships between some isolates, suggesting horizontal gene transfer. Cross neutralisation data obtained with monospecific anti-LktA rat sera were used to derive antigenic similarity coefficients for LktA from the four Mannheimia species isolates. Similarity coefficients indicated that LktA of the two M. haemolytica isolates were least similar, while LktA from M. glucosida was most similar to those for one of the M. haemolytica isolates and the haemolytic M. ruminalis isolate. The results suggested that vaccination with the M. glucosida leukotoxin would generate the greatest cross-protection against ovine mastitis caused by Mannheimia species with these alleles. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Nullomers and High Order Nullomers in Genomic Sequences

    Science.gov (United States)

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon

  5. Musicians' and nonmusicians' short-term memory for verbal and musical sequences: comparing phonological similarity and pitch proximity.

    Science.gov (United States)

    Williamson, Victoria J; Baddeley, Alan D; Hitch, Graham J

    2010-03-01

    Language-music comparative studies have highlighted the potential for shared resources or neural overlap in auditory short-term memory. However, there is a lack of behavioral methodologies for comparing verbal and musical serial recall. We developed a visual grid response that allowed both musicians and nonmusicians to perform serial recall of letter and tone sequences. The new method was used to compare the phonological similarity effect with the impact of an operationalized musical equivalent-pitch proximity. Over the course of three experiments, we found that short-term memory for tones had several similarities to verbal memory, including limited capacity and a significant effect of pitch proximity in nonmusicians. Despite being vulnerable to phonological similarity when recalling letters, however, musicians showed no effect of pitch proximity, a result that we suggest might reflect strategy differences. Overall, the findings support a limited degree of correspondence in the way that verbal and musical sounds are processed in auditory short-term memory.

  6. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    Energy Technology Data Exchange (ETDEWEB)

    Zemla, A; Lang, D; Kostova, T; Andino, R; Zhou, C

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitate the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected

  7. Similarity of rpoB gene sequences of sucrose-fermenting and non-fermenting Corynebacterium diphtheriae strains.

    Science.gov (United States)

    Hirata, R; Pacheco, L G; Soares, S C; Santos, L S; Moreira, L O; Sabbadini, P S; Santos, C S; Miyoshi, A; Azevedo, V A; Mattos-Guaraldi, A L

    2011-03-01

    During the last decades, the majority of Brazilian Corynebacterium diphtheriae isolates were shown to be capable to metabolize sucrose, sometimes leading to erroneous identification as a non-diphtheric Corynebacterium species. The sequencing of the polymorphic region of the RNA polymerase beta subunit-encoding gene (rpoB) is an important taxonomic tool for identification of corynebacteria. The present study aimed to investigate the rpoB gene polymorphic features of sucrose-fermenting and non sucrose-fermenting strains. The results showed that sucrose-fermenting strains presented rpoB gene polymorphic regions with more than 98% similarity with the sequences deposited in the gene bank corresponding to non sucrose-fermenting strains. Data indicate that sucrose-fermenting isolates may act as a variant of C. diphtheriae biotype mitis. In addition we alert that sucrose-fermenting strains should not be discarded as contaminants mainly in countries where the possibility of isolation of this variant is higher.

  8. Remarkable sequence similarity between the dinoflagellate-infecting marine girus and the terrestrial pathogen African swine fever virus

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2009-10-01

    Full Text Available Abstract Heterocapsa circularisquama DNA virus (HcDNAV; previously designated as HcV is a giant virus (girus with a ~356-kbp double-stranded DNA (dsDNA genome. HcDNAV lytically infects the bivalve-killing marine dinoflagellate H. circularisquama, and currently represents the sole DNA virus isolated from dinoflagellates, one of the most abundant protists in marine ecosystems. Its morphological features, genome type, and host range previously suggested that HcDNAV might be a member of the family Phycodnaviridae of Nucleo-Cytoplasmic Large DNA Viruses (NCLDVs, though no supporting sequence data was available. NCLDVs currently include two families found in aquatic environments (Phycodnaviridae, Mimiviridae, one mostly infecting terrestrial animals (Poxviridae, another isolated from fish, amphibians and insects (Iridoviridae, and the last one (Asfarviridae exclusively represented by the animal pathogen African swine fever virus (ASFV, the agent of a fatal hemorrhagic disease in domestic swine. In this study, we determined the complete sequence of the type B DNA polymerase (PolB gene of HcDNAV. The viral PolB was transcribed at least from 6 h post inoculation (hpi, suggesting its crucial function for viral replication. Most unexpectedly, the HcDNAV PolB sequence was found to be closely related to the PolB sequence of ASFV. In addition, the amino acid sequence of HcDNAV PolB showed a rare amino acid substitution within a motif containing highly conserved motif: YSDTDS was found in HcDNAV PolB instead of YGDTDS in most dsDNA viruses. Together with the previous observation of ASFV-like sequences in the Sorcerer II Global Ocean Sampling metagenomic datasets, our results further reinforce the ideas that the terrestrial ASFV has its evolutionary origin in marine environments.

  9. Logging Data High-Resolution Sequence Stratigraphy

    Institute of Scientific and Technical Information of China (English)

    Li Hongqi; Xie Yinfu; Sun Zhongchun; Luo Xingping

    2006-01-01

    The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed sets on the basis of manifold logging data. The formation of calcareous interbeds, shale resistivity differences and the relation of reservoir resistivity to altitude are considered on the basis of log curve morphological characteristics, core observation, cast thin section, X-ray diffraction and scanning electron microscopy. The results show that the thickness of calcareous interbeds is between 0.5 m and 2 m, increasing on weathering crusts and faults. Calcareous interbeds occur at the bottom of Reservoir resistivity increases with altitude. Calcareous interbeds may be a symbol of recognition for the boundary of bed sets and isochronous contrast bed sets, and shale resistivity differences may confirm the stack relation and connectivity of bed sets. Based on this, a high-rcsolution chronostratigraphic framework of Xi-1 segment in Shinan area, Junggar basin is presented, and the connectivity of bed sets and oil-water contact is confirmed. In this chronostratigraphic framework, the growth order, stack mode and space shape of bed sets are qualitatively and quantitatively described.

  10. Recombination and selectional forces in cyanopeptolin NRPS operons from highly similar, but geographically remote Planktothrix strains

    Directory of Open Access Journals (Sweden)

    Kristensen Tom

    2008-08-01

    Full Text Available Abstract Background Cyanopeptolins are nonribosomally produced heptapetides showing a highly variable composition. The cyanopeptolin synthetase operon has previously been investigated in three strains from the genera Microcystis, Planktothrix and Anabaena. Cyanopeptolins are displaying protease inhibitor activity, but the biological function(s is (are unknown. Cyanopeptolin gene cluster variability and biological functions of the peptide variants are likely to be interconnected. Results We have investigated two cyanopeptolin gene clusters from highly similar, but geographically remote strains of the same genus. Sequencing of a nonribosomal peptide synthetase (NRPS cyanopeptolin gene cluster from the Japanese strain Planktothrix NIES 205 (205-oci, showed the 30 kb gene cluster to be highly similar to the oci gene cluster previously described in Planktothrix NIVA CYA 116, isolated in Norway. Both operons contained seven NRPS modules, a sulfotransferase (S and a glyceric acid loading (GA-domain. Sequence analyses showed a high degree of conservation, except for the presence of an epimerase domain in NIES 205 and the regions around the epimerase, showing high substitution rates and Ka/Ks values above 1. The two strains produce almost identical cyanopeptolins, cyanopeptolin-1138 and oscillapeptin E respectively, but with slight differences regarding the production of minor cyanopeptolin variants. These variants may be the result of relaxed adenylation (A-domain specificity in the nonribosomal enzyme complex. Other genetic markers (16S rRNA, ntcA and the phycocyanin cpcBA spacer were identical, supporting that these geographically separated Planktothrix strains are closely related. Conclusion A horizontal gene transfer event resulting in exchange of a whole module-encoding region was observed. Nucleotide statistics indicate that both purifying selection and positive selection forces are operating on the gene cluster. The positive selection forces are

  11. Targeted high-throughput sequencing of tagged nucleic acid samples

    OpenAIRE

    M.; Meyer; Stenzel, U.; Myles, S.; Prüfer, K; Hofreiter, M.

    2007-01-01

    High-throughput 454 DNA sequencing technology allows much faster and more cost-effective sequencing than traditional Sanger sequencing. However, the technology imposes inherent limitations on the number of samples that can be processed in parallel. Here we introduce parallel tagged sequencing (PTS), a simple, inexpensive and flexible barcoding technique that can be used for parallel sequencing any number and type of double-stranded nucleic acid samples. We demonstrate that PTS is particularly...

  12. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L. reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    Directory of Open Access Journals (Sweden)

    Chen Jun

    2012-11-01

    Full Text Available Abstract Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10

  13. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    Science.gov (United States)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  14. 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases.

    Science.gov (United States)

    Tedersoo, Leho; Nilsson, R Henrik; Abarenkov, Kessy; Jairus, Teele; Sadam, Ave; Saar, Irja; Bahram, Mohammad; Bechem, Eneke; Chuyong, George; Kõljalg, Urmas

    2010-10-01

    • Compared with Sanger sequencing-based methods, pyrosequencing provides orders of magnitude more data on the diversity of organisms in their natural habitat, but its technological biases and relative accuracy remain poorly understood. • This study compares the performance of pyrosequencing and traditional sequencing for species' recovery of ectomycorrhizal fungi on root tips in a Cameroonian rain forest and addresses biases related to multi-template PCR and pyrosequencing analyses. • Pyrosequencing and the traditional method yielded qualitatively similar results, but there were slight, but significant, differences that affected the taxonomic view of the fungal community. We found that most pyrosequencing singletons were artifactual and contained a strongly elevated proportion of insertions compared with natural intra- and interspecific variation. The alternative primers, DNA extraction methods and PCR replicates strongly influenced the richness and community composition as recovered by pyrosequencing. • Pyrosequencing offers a powerful alternative for the identification of ectomycorrhizal fungi in pooled root samples, but requires careful selection of molecular tools. A well-populated backbone database facilitates the detection of biological and technical artifacts. The pyrosequencing pipeline is available at http://unite.ut.ee/454pipeline.tgz.

  15. Sequence Similarity of Clostridium difficile Strains by Analysis of Conserved Genes and Genome Content Is Reflected by Their Ribotype Affiliation

    Science.gov (United States)

    Kurka, Hedwig; Ehrenreich, Armin; Ludwig, Wolfgang; Monot, Marc; Rupnik, Maja; Barbut, Frederic; Indra, Alexander; Dupuy, Bruno; Liebl, Wolfgang

    2014-01-01

    PCR-ribotyping is a broadly used method for the classification of isolates of Clostridium difficile, an emerging intestinal pathogen, causing infections with increased disease severity and incidence in several European and North American countries. We have now carried out clustering analysis with selected genes of numerous C. difficile strains as well as gene content comparisons of their genomes in order to broaden our view of the relatedness of strains assigned to different ribotypes. We analyzed the genomic content of 48 C. difficile strains representing 21 different ribotypes. The calculation of distance matrix-based dendrograms using the neighbor joining method for 14 conserved genes (standard phylogenetic marker genes) from the genomes of the C. difficile strains demonstrated that the genes from strains with the same ribotype generally clustered together. Further, certain ribotypes always clustered together and formed ribotype groups, i.e. ribotypes 078, 033 and 126, as well as ribotypes 002 and 017, indicating their relatedness. Comparisons of the gene contents of the genomes of ribotypes that clustered according to the conserved gene analysis revealed that the number of common genes of the ribotypes belonging to each of these three ribotype groups were very similar for the 078/033/126 group (at most 69 specific genes between the different strains with the same ribotype) but less similar for the 002/017 group (86 genes difference). It appears that the ribotype is indicative not only of a specific pattern of the amplified 16S–23S rRNA intergenic spacer but also reflects specific differences in the nucleotide sequences of the conserved genes studied here. It can be anticipated that the sequence deviations of more genes of C. difficile strains are correlated with their PCR-ribotype. In conclusion, the results of this study corroborate and extend the concept of clonal C. difficile lineages, which correlate with ribotypes affiliation. PMID:24482682

  16. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    Science.gov (United States)

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition.

    Science.gov (United States)

    Zhang, Lichao; Zhao, Xiqiang; Kong, Liang

    2014-08-21

    Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.

  18. Molecular characterization of a novel luteovirus from peach identified by high-throughput sequencing.

    Science.gov (United States)

    Wu, L-P; Liu, H-W; Bateman, M; Liu, Z; Li, R

    2017-05-26

    Contigs with sequence homologies to cherry-associated luteovirus were identified by high-throughput sequencing analysis in two peach accessions. Complete genomic sequences of the two isolates of this virus were determined to be 5,819 and 5,814 nucleotides long, respectively. The genome of the new virus is typical of luteoviruses, containing eight open reading frames in a very similar arrangement. Its genomic sequence is 58-74% identical to those of other members of the genus Luteovirus. These sequences thus belong to a new virus, which we have named "peach-associated luteovirus".

  19. Expressing Redundancy among Linear-Epitope Sequence Data Based on Residue-Level Physicochemical Similarity in the Context of Antigenic Cross-Reaction

    Directory of Open Access Journals (Sweden)

    Salvador Eugenio C. Caoili

    2016-01-01

    Full Text Available Epitope-based design of vaccines, immunotherapeutics, and immunodiagnostics is complicated by structural changes that radically alter immunological outcomes. This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity, which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions even where these might be considered conservative in the context of classical sequence analysis. From the perspective of immune function based on molecular recognition of epitopes, functional redundancy of epitope data (FRED thus may be defined in a biologically more meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction, with functional similarity between epitopes expressed as the Shannon information entropy for differential epitope binding. Such similarity may be estimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealized binding site of high complementarity to the immunogen epitope, by analogy between protein folding and ligand-receptor binding; but this underestimates potential for cross-reactivity, suggesting that epitope-binding site complementarity is typically suboptimal as regards immunologic specificity. The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immune function that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes.

  20. Base excision of oxidative purine and pyrimidine DNA damage in Saccharomyces cerevisiae by a DNA glycosylase with sequence similarity to endonuclease III from Escherichia coli.

    Science.gov (United States)

    Eide, L; Bjørås, M; Pirovano, M; Alseth, I; Berdal, K G; Seeberg, E

    1996-10-01

    One gene locus on chromosome I in Saccharomyces cerevisiae encodes a protein (YAB5_YEAST; accession no. P31378) with local sequence similarity to the DNA repair glycosylase endonuclease III from Escherichia coli. We have analyzed the function of this gene, now assigned NTG1 (endonuclease three-like glycosylase 1), by cloning, mutant analysis, and gene expression in E. coli. Targeted gene disruption of NTG1 produces a mutant that is sensitive to H2O2 and menadione, indicating that NTG1 is required for repair of oxidative DNA damage in vivo. Northern blot analysis and expression studies of a NTG1-lacZ gene fusion showed that NTG1 is induced by cell exposure to different DNA damaging agents, particularly menadione, and hence belongs to the DNA damage-inducible regulon in S. cerevisiae. When expressed in E. coli, the NTG1 gene product cleaves plasmid DNA damaged by osmium tetroxide, thus, indicating specificity for thymine glycols in DNA similarly as is the case for EndoIII. However, NTG1 also releases formamidopyrimidines from DNA with high efficiency and, hence, represents a glycosylase with a novel range of substrate recognition. Sequences similar to NTG1 from other eukaryotes, including Caenorhabditis elegans, Schizosaccharomyces pombe, and mammals, have recently been entered in the GenBank suggesting the universal presence of NTG1-like genes in higher organisms. S. cerevisiae NTG1 does not have the [4Fe-4S] cluster DNA binding domain characteristic of the other members of this family.

  1. Quantitative and selective polymerase chain reaction analysis of highly similar human alpha-class glutathione transferases.

    Science.gov (United States)

    Larsson, Emilia; Mannervik, Bengt; Raffalli-Mathieu, Françoise

    2011-05-01

    Alpha-class glutathione transferases (GSTs) found expressed in human tissues constitute a family of four homologous enzymes with contrasting enzyme activities. In particular, GST A3-3 has been shown to contribute to the biosynthesis of steroid hormones in human cells and is selectively expressed in steroidogenic tissues. The more ubiquitous GST A1-1, GST A2-2, and GST A4-4 appear to be primarily involved in detoxification processes and are expressed at higher levels than GST A3-3. We are interested in studying the cell and tissue expression of the GST A3-3 gene, yet the existence of highly expressed sequence-similar homologs and of several splice variants is a serious challenge for the specific detection of unique transcript species. We found that published polymerase chain reaction (PCR) primers for GST A3-3 lack the specificity required for reliable quantitative analysis. Therefore, we designed quantitative PCR (qPCR) primers with greatly increased discrimination power for the human GSTA3 full-length transcript. The improved primers allow accurate discrimination between GST A3-3 and the other alpha-class GSTs and so are of great value to studies of the expression of the GSTA3 gene. The novel primers were used to quantify GSTA3 transcripts in human embryonic liver and steroidogenic cell lines.

  2. Whole genome sequence of two Rathayibacter toxicus strains reveals a tunicamycin biosynthetic cluster similar to Streptomyces chartreusis

    Science.gov (United States)

    Sechler, Aaron J.; Tancos, Matthew A.; Schneider, David J.; King, Jonas G.; Fennessey, Christine M.; Schroeder, Brenda K.; Murray, Timothy D.; Luster, Douglas G.; Schneider, William L.

    2017-01-01

    Rathayibacter toxicus is a forage grass associated Gram-positive bacterium of major concern to food safety and agriculture. This species is listed by USDA-APHIS as a plant pathogen select agent because it produces a tunicamycin-like toxin that is lethal to livestock and may be vectored by nematode species native to the U.S. The complete genomes of two strains of R. toxicus, including the type strain FH-79, were sequenced and analyzed in comparison with all available, complete R. toxicus genomes. Genome sizes ranged from 2,343,780 to 2,394,755 nucleotides, with 2079 to 2137 predicted open reading frames; all four strains showed remarkable synteny over nearly the entire genome, with only a small transposed region. A cluster of genes with similarity to the tunicamycin biosynthetic cluster from Streptomyces chartreusis was identified. The tunicamycin gene cluster (TGC) in R. toxicus contained 14 genes in two transcriptional units, with all of the functional elements for tunicamycin biosynthesis present. The TGC had a significantly lower GC content (52%) than the rest of the genome (61.5%), suggesting that the TGC may have originated from a horizontal transfer event. Further analysis indicated numerous remnants of other potential horizontal transfer events are present in the genome. In addition to the TGC, genes potentially associated with carotenoid and exopolysaccharide production, bacteriocins and secondary metabolites were identified. A CRISPR array is evident. There were relatively few plant-associated cell-wall hydrolyzing enzymes, but there were numerous secreted serine proteases that share sequence homology to the pathogenicity-associated protein Pat-1 of Clavibacter michiganensis. Overall, the genome provides clear insight into the possible mechanisms for toxin production in R. toxicus, providing a basis for future genetic approaches. PMID:28796837

  3. "Cytochrome c oxidase I DNA sequence of Camponotus ants with different nesting strategies is a tool for distinguishing between morphologically similar species".

    Science.gov (United States)

    Ramalho, Manuela O F; Santos, Rodrigo M; Fernandes, Tae T; Morini, Maria Santina C; Bueno, Odair C

    2016-08-01

    The great diversity of Camponotus, high levels of geographic, intraspecific and morphological variation common to most species of this genus make the determination of the interspecific limits of Camponotus a complex task. The Cytochrome c oxidase 1 (COI) gene was sequenced in this study to serve as an auxiliary tool in the identification of two taxa of Camponotus thought to be morphologically similar. Additionally, characteristics related to nesting were described. Five to fifteen workers from twenty-one colonies were analyzed, collected from twigs scattered in the leaf litter and from trees located in different regions of Brazil. Phylogenetic reconstructions, haplotype network, and nesting strategies confirmed the existence of two species and that they correspond to Camponotus senex and Camponotus textor. Our results emphasize that the COI can be used as an additional tool for the identification of morphologically similar Camponotus species.

  4. Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    An Xiaoping

    2011-04-01

    Full Text Available Abstract Background T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; Methods genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; Results we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies; Conclusions this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

  5. Generating barcoded libraries for multiplex high-throughput sequencing.

    Science.gov (United States)

    Knapp, Michael; Stiller, Mathias; Meyer, Matthias

    2012-01-01

    Molecular barcoding is an essential tool to use the high throughput of next generation sequencing platforms optimally in studies involving more than one sample. Various barcoding strategies allow for the incorporation of short recognition sequences (barcodes) into sequencing libraries, either by ligation or polymerase chain reaction (PCR). Here, we present two approaches optimized for generating barcoded sequencing libraries from low copy number extracts and amplification products typical of ancient DNA studies.

  6. Color-Based Image Retrieval from High-Similarity Image Databases

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg; Carstensen, Jens Michael

    2003-01-01

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...

  7. Color-Based Image Retrieval from High-Similarity Image Databases

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...

  8. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob;

    2016-01-01

    and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference--whole-blood DNA--based on concordance rates calculated...

  9. Deep sequencing analysis of tick-borne encephalitis virus from questing ticks at natural foci reveals similarities between quasispecies pools of the virus.

    Science.gov (United States)

    Asghar, Naveed; Pettersson, John H-O; Dinnetz, Patrik; Andreassen, Åshild; Johansson, Magnus

    2017-01-10

    Every year, tick-borne encephalitis virus (TBEV) causes severe central nervous system infection in 10,000 to 15,000 people in Europe and Asia. TBEV is maintained in the environment by an enzootic cycle that requires a tick vector and a vertebrate host, and the adaptation of TBEV to vertebrate and invertebrate environments is essential for TBEV persistence in nature. This adaptation is facilitated by the error-prone nature of the virus' RNA-dependent RNA polymerase that generates genetically distinct virus variants called quasispecies. TBEV shows a focal geographical distribution pattern where each focus represents a TBEV hotspot. Here we sequenced and characterized two TBEV genomes, JP-296 and JP-554, from questing Ixodes ricinus ticks at a TBEV focus in central Sweden. Phylogenetic analysis showed geographical clustering among the newly sequenced strains and three previously sequenced Scandinavian strains, Toro-2003, Saringe-2009, and Mandal-2009, which originated from same ancestor. Among these five Scandinavian TBEV strains, only Mandal-2009 showed a large deletion within the 3´ non-coding region (NCR) similar to the highly virulent TBEV strain Hypr. Deep sequencing of JP-296, JP-554, and Mandal-2009 revealed significantly high quasispecies diversity for JP-296 and JP-554, with intact 3´NCRs, compared to the low diversity in Mandal-2009, with a truncated 3´NCR. SNP analysis showed that 40% of the SNPs were common between quasispecies populations of JP-296 and JP-554, indicating a putative mechanism for how TBEV persists and is maintained within its natural foci.

  10. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  11. High frequency of hepatitis E virus infection in swine from South Brazil and close similarity to human HEV isolates.

    Science.gov (United States)

    Passos-Castilho, Ana Maria; Granato, Celso Francisco Hernandes

    2017-01-03

    Hepatitis E virus is responsible for acute and chronic liver infections worldwide. Swine hepatitis E virus has been isolated in Brazil, and a probable zoonotic transmission has been described, although data are still scarce. The aim of this study was to investigate the frequency of hepatitis E virus infection in pigs from a small-scale farm in the rural area of Paraná State, South Brazil. Fecal samples were collected from 170 pigs and screened for hepatitis E virus RNA using a duplex real-time RT-PCR targeting a highly conserved 70nt long sequence within overlapping parts of ORF2 and ORF3 as well as a 113nt sequence of ORF2. Positive samples with high viral loads were subjected to direct sequencing and phylogenetic analysis. hepatitis E virus RNA was detected in 34 (20.0%) of the 170 pigs following positive results in at least one set of screening real-time RT-PCR primers and probes. The swine hepatitis E virus strains clustered with the genotype hepatitis E virus-3b reference sequences in the phylogenetic analysis and showed close similarity to human hepatitis E virus isolates previously reported in Brazil.

  12. Characterization of a highly repeated DNA sequence family in five species of the genus Eulemur.

    Science.gov (United States)

    Ventura, M; Boniotto, M; Cardone, M F; Fulizio, L; Archidiacono, N; Rocchi, M; Crovella, S

    2001-09-19

    The karyotypes of Eulemur species exhibit a high degree of variation, as a consequence of the Robertsonian fusion and/or centromere fission. Centromeric and pericentromeric heterochromatin of eulemurs is constituted by highly repeated DNA sequences (including some telomeric TTAGGG repeats) which have so far been investigated and used for the study of the systematic relationships of the different species of the genus Eulemur. In our study, we have cloned a set of repetitive pericentromeric sequences of five Eulemur species: E. fulvus fulvus (EFU), E. mongoz (EMO), E. macaco (EMA), E. rubriventer (ERU), and E. coronatus (ECO). We have characterized these clones by sequence comparison and by comparative fluorescence in situ hybridization analysis in EMA and EFU. Our results showed a high degree of sequence similarity among Eulemur species, indicating a strong conservation, within the five species, of these pericentromeric highly repeated DNA sequences.

  13. Effects of High-Order Co-occurrences on Word Semantic Similarities

    CERN Document Server

    Lemaire, Benoît

    2008-01-01

    A computational model of the construction of word meaning through exposure to texts is built in order to simulate the effects of co-occurrence values on word semantic similarities, paragraph by paragraph. Semantic similarity is here viewed as association. It turns out that the similarity between two words W1 and W2 strongly increases with a co-occurrence, decreases with the occurrence of W1 without W2 or W2 without W1, and slightly increases with high-order co-occurrences. Therefore, operationalizing similarity as a frequency of co-occurrence probably introduces a bias: first, there are cases in which there is similarity without co-occurrence and, second, the frequency of co-occurrence overestimates similarity.

  14. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

    Science.gov (United States)

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-10-11

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

  15. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  16. Transcriptome-Based Identification of Highly Similar Odorant-Binding Proteins among Neotropical Stink Bugs and Their Egg Parasitoid.

    Directory of Open Access Journals (Sweden)

    Luciana R Farias

    Full Text Available Olfaction plays a fundamental role in insect survival through resource location and intra and interspecific communications. We used RNA-Seq to analyze transcriptomes for odorant-binding proteins (OBPs from major stink bug pest species in Brazil, Euschistus heros, Chinavia ubica, and Dichelops melacanthus, and from their egg parasitoid, Telenomus podisi. We identified 23 OBPs in E. heros, 25 OBPs in C. ubica, 9 OBPs in D. melacanthus, and 7 OBPs in T. podisi. The deduced amino acid sequences of the full-length OBPs had low intraspecific similarity, but very high similarity between two pairs of OBPs from E. heros and C. ubica (76.4 and 84.0% and between two pairs of OBPs from the parasitoid and its preferred host E. heros (82.4 and 88.5%, confirmed by a high similarity of their predicted tertiary structures. The similar pairs of OBPs from E. heros and C. ubica may suggest that they have derived from a common ancestor, and retain the same biological function to bind a ligand perceived or produced in both species. The T. podisi OBPs similar to E. heros were not orthologous to any known hymenopteran OBPs, and may have evolved independently and converged to the host OBPs, providing a possible basis for the host location of T. podisi using E. heros semiochemical cues.

  17. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    Abstract High-throughput sequencing (HTS) technologies revolutionized the field of molecular biology by enabling large scale whole genome sequencing as well as a broad range of experiments for studying the cell's inner workings directly on DNA or RNA level. Given the dramatically increased rate...

  18. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    Culture-independent studies using next generation sequencing have revolutionizedmicrobial ecology, however, oomycete ecology in soils is severely lagging behind. The aimof this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomyce...

  19. The sequence of learning cycle activities in high school chemistry

    Science.gov (United States)

    Abraham, Michael R.; Renner, John W.

    The sequence of the three phases of two high school learning cycles in chemistry was altered in order to: (I ) give insights into the factors which account for the success of the learning cycle, (2) serve as an indirect test of the association between Piaget's theory and the learning cycle, and (3) to compare the learning cycle with traditional instruction. Each of the six sequences (one n o d and five altered) was studied with content and atritudc measures. The outcomes of the study supported the contention that the normal learning cycle sequence is the optimum sequence for achievement of content knowledge.

  20. Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model

    OpenAIRE

    Salha M. Alzahrani; Naomie Salim; Vasile Palade

    2015-01-01

    Highly obfuscated plagiarism cases contain unseen and obfuscated texts, which pose difficulties when using existing plagiarism detection methods. A fuzzy semantic-based similarity model for uncovering obfuscated plagiarism is presented and compared with five state-of-the-art baselines. Semantic relatedness between words is studied based on the part-of-speech (POS) tags and WordNet-based similarity measures. Fuzzy-based rules are introduced to assess the semantic distance between source and su...

  1. Subfamily logos: visualization of sequence deviations at alignment positions with high information content

    Directory of Open Access Journals (Sweden)

    Beitz Eric

    2006-06-01

    Full Text Available Abstract Background Recognition of relevant sequence deviations can be valuable for elucidating functional differences between protein subfamilies. Interesting residues at highly conserved positions can then be mutated and experimentally analyzed. However, identification of such sites is tedious because automated approaches are scarce. Results Subfamily logos visualize subfamily-specific sequence deviations. The display is similar to classical sequence logos but extends into the negative range. Positive, upright characters correspond to residues which are characteristic for the subfamily, negative, upside-down characters to residues typical for the remaining sequences. The symbol height is adjusted to the information content of the alignment position. Residues which are conserved throughout do not appear. Conclusion Subfamily logos provide an intuitive display of relevant sequence deviations. The method has proven to be valid using a set of 135 aligned aquaporin sequences in which established subfamily-specific positions were readily identified by the algorithm.

  2. Highly conserved non-coding sequences are associated with vertebrate development.

    Directory of Open Access Journals (Sweden)

    Adam Woolfe

    2005-01-01

    Full Text Available In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH, in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development

  3. Case modifying of high-speed cutting database based on CSP and similarity theory

    Institute of Scientific and Technical Information of China (English)

    Kejun XIANG; Zhanqiang LIU; Xing AI

    2009-01-01

    By analyzing the reasoning of a high-speed cutting database system, a case modifying method is put forward. According to the variables' difference of the solution part in a case, a constraint satisfaction problem (CSP) and similarity calculation are used to modify a case. The constraint relationship of discrete variables is described by establishing a rule knowledge base. The algorithm of CSP is used to solve the discrete variable constraint problem. On the basis of the high-speed cutting theory, a similarity calculation formula is deduced to calculate the consecutive variables. The CSP and similarity calculation are applied to case modifying, which is possible to automatically modify cases in the high-speed cutting database system.

  4. Color-Based Image Retrieval from High-Similarity Image Databases

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...... a method for HSID retrieval using a similarity measure based on a linear combination of Jeffreys-Matusita (JM) distances between distributions of color (and color derivatives) estimated from a set of automatically extracted image regions. The weight coefficients are estimated based on optimal retrieval...... performance. Experimental results on the difficult task of visually identifying clones of fungal colonies grown in a petri dish and categorization of pelts show a high retrieval accuracy of the method when combined with standardized sample preparation and image acquisition....

  5. Color-Based Image Retrieval from High-Similarity Image Databases

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg; Carstensen, Jens Michael

    2003-01-01

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...... a method for HSID retrieval using a similarity measure based on a linear combination of Jeffreys-Matusita (JM) distances between distributions of color (and color derivatives) estimated from a set of automatically extracted image regions. The weight coefficients are estimated based on optimal retrieval...... performance. Experimental results on the difficult task of visually identifying clones of fungal colonies grown in a petri dish and categorization of pelts show a high retrieval accuracy of the method when combined with standardized sample preparation and image acquisition....

  6. Similarity and cascade flow characteristics of a highly loaded helium compressor

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Bin, E-mail: jiangbin_hrbeu@163.com [College of Power & Energy Engineering of Harbin Engineering University, Harbin 150001 (China); Chen, Zhongliang [College of Power & Energy Engineering of Harbin Engineering University, Harbin 150001 (China); Chen, Hang [AVIG Shenyang Engine Design and Research Institute, Shenyang 110015 (China); Zhang, Hai; Zheng, Qun [College of Power & Energy Engineering of Harbin Engineering University, Harbin 150001 (China)

    2015-05-15

    Highlights: • The deviation of different similarity criteria is analyzed theoretically. • Flow difference between helium and air compressor cascades is analyzed numerically. • The analysis of calculated results validates the theoretical derivation. • Flow characteristics of highly loaded helium compressor blade profile are computed. - Abstract: Helium compressor is a major component of the Power Conversion Unit (PCU) used in a High Temperature Gas Cooled Reactor (HTGR). Because the high cost of closed cycle test and leakage problem of helium gas, air could be used as working fluid instead of helium in compressor performance tests. However, the properties of Helium are largely different from those of air, e.g. the adiabatic exponent of Helium is 1.6, while the adiabatic exponent itself is a criterion of similarity between the two compressors. The characteristics of compressor will be different due to the effect of the adiabatic exponent of working fluid, especially for highly loaded compressor working at higher inlet Mach number. In this paper, a theoretical study on the similarity between air compressor and a highly loaded helium compressor is carried out and the deviation of similarity is analyzed. Numerical simulations are then used to confirm the theoretical analysis. The results indicate that the similarity deviation could not be neglected for highly loaded compressor cascade, which means the experience and experimental results of those conventional air compressor cannot be applied directly to the design of highly loaded helium compressor. The flow characteristics of a highly loaded helium compressor at different Reynolds numbers, attack angles, Mach numbers and cascade geometries are then investigated.

  7. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  8. High efficiency spreading spectrum modulation using double orthogonal complex sequences

    Institute of Scientific and Technical Information of China (English)

    Shi Xiaohong

    2012-01-01

    This paper presents a novel scheme of high efficiency spreading spectrum modulation using double orthogonal complex sequences (DoCS). In this scheme, input data bit-stream is split into many groups with length M. Each group is then mapped into a word of width M and then utihzed to select one sequence from 2u-2 DoCS sequences each with length L. After that, the selected sequence is modulated on carrier in quadrature phase shift keying (QPSK) mode. In addition, a new method named forward phase correction (FPC) is put forward for carrier recovery. Theoretical analysis and bit-error-ratio(BER) experiment results indicate that the proposed scheme has better performance than the conventional direct sequence spread spectrum(DSSS) scheme both in bandwidth efficiency and processing gain of the receiver.

  9. On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation.

    Science.gov (United States)

    Wong, Wing-Cheong; Maurer-Stroh, Sebastian; Eisenhaber, Birgit; Eisenhaber, Frank

    2014-06-02

    Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison.

  10. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

    Science.gov (United States)

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-09-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods.

  11. Efficient statistical significance approximation for local similarity analysis of high-throughput time series data.

    Science.gov (United States)

    Xia, Li C; Ai, Dongmei; Cram, Jacob; Fuhrman, Jed A; Sun, Fengzhu

    2013-01-15

    Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation. We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples. The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSA's website: http://meta.usc.edu/softs/lsa. Supplementary data are available at Bioinformatics online. fsun@usc.edu.

  12. DNA sequence and structure properties analysis reveals similarities and differences to promoters of stress responsive genes in Arabidopsis thaliana.

    Science.gov (United States)

    Zhu, Pan; Zhou, Yanhong; Zhang, Libin; Ma, Chuang

    2015-01-01

    Understanding regulatory mechanisms of stress response in plants has important biological and agricultural significances. In this study, we firstly compiled a set of genes responsive to different stresses in Arabidopsis thaliana and then comparatively analysed their promoters at both the DNA sequence and three-dimensional structure levels. Amazingly, the comparison revealed that the profiles of several sequence and structure properties vary distinctly in different regions of promoters. Moreover, the content of nucleotide T and the profile of B-DNA twist are distinct in promoters from different stress groups, suggesting Arabidopsis genes might exploit different regulatory mechanisms in response to various stresses. Finally, we evaluated the performance of two representative promoter predictors including EP3 and PromPred. The evaluation results revealed their strengths and weakness for identifying stress-related promoters, providing valuable guidelines to accelerate the discovery of novel stress-related promoters and genes in plants.

  13. Library preparation for highly accurate population sequencing of RNA viruses

    Science.gov (United States)

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  14. High sequence conservation among cucumber mosaic virus isolates from lily.

    Science.gov (United States)

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins.

  15. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  16. Déjà vu: a database of highly similar citations in the scientific literature

    Science.gov (United States)

    Errami, Mounir; Sun, Zhaohui; Long, Tara C.; George, Angela C.; Garner, Harold R.

    2009-01-01

    In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available Déjà vu, a publicly available database of highly similar Medline citations identified by the text similarity search engine eTBLAST. Following manual verification, highly similar citation pairs are classified into various categories ranging from duplicates with different authors to sanctioned duplicates. Déjà vu records also contain user-provided commentary and supporting information to substantiate each document's categorization. Déjà vu and eTBLAST are available to authors, editors, reviewers, ethicists and sociologists to study, intercept, annotate and deter questionable publication practices. These tools are part of a sustained effort to enhance the quality of Medline as ‘the’ biomedical corpus. The Déjà vu database is freely accessible at http://spore.swmed.edu/dejavu. The tool eTBLAST is also freely available at http://etblast.org. PMID:18757888

  17. Deja vu: a database of highly similar citations in the scientific literature.

    Science.gov (United States)

    Errami, Mounir; Sun, Zhaohui; Long, Tara C; George, Angela C; Garner, Harold R

    2009-01-01

    In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available Déjà vu, a publicly available database of highly similar Medline citations identified by the text similarity search engine eTBLAST. Following manual verification, highly similar citation pairs are classified into various categories ranging from duplicates with different authors to sanctioned duplicates. Déjà vu records also contain user-provided commentary and supporting information to substantiate each document's categorization. Déjà vu and eTBLAST are available to authors, editors, reviewers, ethicists and sociologists to study, intercept, annotate and deter questionable publication practices. These tools are part of a sustained effort to enhance the quality of Medline as 'the' biomedical corpus. The Déjà vu database is freely accessible at http://spore.swmed.edu/dejavu. The tool eTBLAST is also freely available at http://etblast.org.

  18. Characterization of a highly toxic strain of Bacillus thuringiensis serovar kurstaki very similar to the HD-73 strain.

    Science.gov (United States)

    Reinoso-Pozo, Yaritza; Del Rincón-Castro, Ma Cristina; Ibarra, Jorge E

    2016-09-01

    The LBIT-1200 strain of Bacillus thuringiensis was recently isolated from soil, and showed a 6.4 and 9.5 increase in toxicity, against Manduca sexta and Trichoplusia ni, respectively, compared to HD-73. However, LBIT-1200 was still highly similar to HD-73, including the production of bipyramidal crystals containing only one protein of ∼130 000 kDa, its flagellin gene sequence related to the kurstaki serotype, plasmid and RepPCR patterns similar to HD-73, no production of β-exotoxin and no presence of VIP genes. Sequencing of its cry gene showed the presence of a cry1Ac-type gene with four amino acid differences, including two amino acid replacements in domain III, compared to Cry1Ac1, which may explain its higher toxicity. In conclusion, the LBIT-1200 strain is a variant of the HD-73 strain but shows a much higher toxicity, which makes this new strain an important candidate to be developed as a bioinsecticide, once it passes other tests, throughout its biotechnological development. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  19. Draft Genome Sequence of Environmental Vibrio cholerae 2012EL-1759 with Similarities to the V. cholerae O1 Classical Biotype

    OpenAIRE

    Katz, Lee S.; Turnsek, Maryann; Kahler, Amy; Hill, Vincent R.; Boyd, E. Fidelma; Tarr, Cheryl L.

    2014-01-01

    Vibrio cholerae 2012EL-1759 is an environmental isolate from Haiti that was recovered in 2012 during a cholera outbreak. The genomic backbone is similar to that of the prototypical V. cholerae O1 classical biotype strain O395, and it carries the Vibrio pathogenicity islands (VPI-1 and VPI-2) and a cholera toxin (CTX) prephage.

  20. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  1. Genome characteristics of a novel phage from Bacillus thuringiensis showing high similarity with phage from Bacillus cereus.

    Directory of Open Access Journals (Sweden)

    Yihui Yuan

    Full Text Available Bacillus thuringiensis is an important entomopathogenic bacterium belongs to the Bacillus cereus group, which also includes B. anthracis and B. cereus. Several genomes of phages originating from this group had been sequenced, but no genome of Siphoviridae phage from B. thuringiensis has been reported. We recently sequenced and analyzed the genome of a novel phage, BtCS33, from a B. thuringiensis strain, subsp. kurstaki CS33, and compared the gneome of this phage to other phages of the B. cereus group. BtCS33 was the first Siphoviridae phage among the sequenced B. thuringiensis phages. It produced small, turbid plaques on bacterial plates and had a narrow host range. BtCS33 possessed a linear, double-stranded DNA genome of 41,992 bp with 57 putative open reading frames (ORFs. It had a typical genome structure consisting of three modules: the "late" region, the "lysogeny-lysis" region and the "early" region. BtCS33 exhibited high similarity with several phages, B. cereus phage Wβ and some variants of Wβ, in genome organization and the amino acid sequences of structural proteins. There were two ORFs, ORF22 and ORF35, in the genome of BtCS33 that were also found in the genomes of B. cereus phage Wβ and may be involved in regulating sporulation of the host cell. Based on these observations and analysis of phylogenetic trees, we deduced that B. thuringiensis phage BtCS33 and B. cereus phage Wβ may have a common distant ancestor.

  2. High-throughput sequencing in veterinary infection biology and diagnostics.

    Science.gov (United States)

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  3. Tuftsin binds neuropilin-1 through a sequence similar to that encoded by exon 8 of vascular endothelial growth factor.

    Science.gov (United States)

    von Wronski, Mathew A; Raju, Natarajan; Pillai, Radhakrishna; Bogdan, Nancy J; Marinelli, Edmund R; Nanjappan, Palaniappa; Ramalingam, Kondareddiar; Arunachalam, Thangavel; Eaton, Steve; Linder, Karen E; Yan, Feng; Pochon, Sibylle; Tweedle, Michael F; Nunn, Adrian D

    2006-03-03

    Tuftsin, Thr-Lys-Pro-Arg (TKPR), is an immunostimulatory peptide with reported nervous system effects as well. We unexpectedly found that tuftsin and a higher affinity antagonist, TKPPR, bind selectively to neuropilin-1 and block vascular endothelial growth factor (VEGF) binding to that receptor. Dimeric and tetrameric forms of TKPPR had greatly increased affinity for neuropilin-1 based on competition binding experiments. On endothelial cells tetrameric TKPPR inhibited the VEGF(165)-induced autophosphorylation of vascular endothelial growth factor receptor-2 (VEGFR-2) even though it did not directly inhibit VEGF binding to VEGFR-2. Homology between exon 8 of VEGF and TKPPR suggests that the sequence coded for by exon 8 may stabilize VEGF binding to neuropilin-1 to facilitate signaling through VEGFR-2. Given the overlap between processes involving neuropilin-1 and tuftsin, we propose that at least some of the previously reported effects of tuftsin are mediated through neuropilin-1.

  4. Thimet oligopeptidase: similarity to 'soluble angiotensin II-binding protein' and some corrections to the published amino acid sequence of the rat testis enzyme.

    Science.gov (United States)

    McKie, N; Dando, P M; Rawlings, N D; Barrett, A J

    1993-01-01

    The deduced amino acid sequence of pig liver soluble angiotensin II-binding protein [Sugiura, Hagiwara and Hirose (1992) J. Biol. Chem. 267, 18067-18072] is similar over most of its length to that reported for rat testis thimet oligopeptidase (EC 3.4.24.15) by Pierotti, Dong, Glucksman, Orlowski and Roberts [(1990) (Biochemistry 29, 10323-10329]. We have found that homogeneous rat testis thimet oligopeptidase binds angiotensin II with the same distinctive characteristics as the pig liver protein. Analysis of the nucleotide sequences reported for the two proteins pointed to the likelihood that sequencing errors had caused two segments of the amino acid sequence of the rat protein to be translated out of frame, and re-sequencing of selected parts of the clone (kindly provided by the previous authors) confirmed this. The revised deduced amino acid sequence of rat thimet oligopeptidase contains 687 residues, representing a protein of 78,308 Da, and is more closely related to those of the pig liver protein and other known homologues of thimet oligopeptidase than that described previously. Images Figure 1 PMID:8216239

  5. Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification.

    Science.gov (United States)

    Arif, Muhammad

    2012-06-01

    In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.

  6. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  7. High-resolution mapping of protein sequence-function relationships.

    Science.gov (United States)

    Fowler, Douglas M; Araya, Carlos L; Fleishman, Sarel J; Kellogg, Elizabeth H; Stephany, Jason J; Baker, David; Fields, Stanley

    2010-09-01

    We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.

  8. Exome sequencing identifies ZNF644 mutations in high myopia.

    Directory of Open Access Journals (Sweden)

    Yi Shi

    2011-06-01

    Full Text Available Myopia is the most common ocular disorder worldwide, and high myopia in particular is one of the leading causes of blindness. Genetic factors play a critical role in the development of myopia, especially high myopia. Recently, the exome sequencing approach has been successfully used for the disease gene identification of Mendelian disorders. Here we show a successful application of exome sequencing to identify a gene for an autosomal dominant disorder, and we have identified a gene potentially responsible for high myopia in a monogenic form. We captured exomes of two affected individuals from a Han Chinese family with high myopia and performed sequencing analysis by a second-generation sequencer with a mean coverage of 30× and sufficient depth to call variants at ∼97% of each targeted exome. The shared genetic variants of these two affected individuals in the family being studied were filtered against the 1000 Genomes Project and the dbSNP131 database. A mutation A672G in zinc finger protein 644 isoform 1 (ZNF644 was identified as being related to the phenotype of this family. After we performed sequencing analysis of the exons in the ZNF644 gene in 300 sporadic cases of high myopia, we identified an additional five mutations (I587V, R680G, C699Y, 3'UTR+12 C>G, and 3'UTR+592 G>A in 11 different patients. All these mutations were absent in 600 normal controls. The ZNF644 gene was expressed in human retinal and retinal pigment epithelium (RPE. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, mutation may cause the axial elongation of eyeball found in high myopia patients. Our results suggest that ZNF644 might be a causal gene for high myopia in a monogenic form.

  9. The SLO1 PPR protein is required for RNA editing at multiple sites with similar upstream sequences in Arabidopsis mitochondria.

    Science.gov (United States)

    Sung, Tzu-Ying; Tseng, Ching-Chih; Hsieh, Ming-Hsiun

    2010-08-01

    In Arabidopsis, RNA editing changes more than 500 cytidines to uridines in mitochondrial transcripts. The editing enzyme and co-factors involved in these processes are largely unknown. We have identified a nuclear gene SLOW GROWTH1 (SLO1) encoding an E motif-containing pentatricopeptide repeat protein that is required for RNA editing of nad4 and nad9 in Arabidopsis mitochondria. The SLO1 protein is localized to the mitochondrion, and its absence gives rise to small plants with slow growth and delayed development. A survey of approximately 500 mitochondrial RNA editing sites in Arabidopsis reveals that the editing of two sites, nad4-449 and nad9-328, is abolished in the slo1 mutants. Sequence comparison in the upstream (from -1 to -15 bp) of nad4-449 and nad9-328 editing sites shows that nine of the 15 nucleotides are identical. In addition to RNA editing, we used RNA gel blot analysis to compare the abundance and banding patterns of mitochondrial transcripts between the wild type and slo1 mutants. Of the 79 genes and open reading frames examined, steady-state levels of 56 mitochondrial transcripts are increased in the slo1 mutants. These results suggest that the SLO1 protein may indirectly regulate plant growth and development via affecting mitochondrial RNA editing and gene expression.

  10. Range-Wide Sex-Chromosome Sequence Similarity Supports Occasional XY Recombination in European Tree Frogs (Hyla arborea)

    Science.gov (United States)

    Brelsford, Alan; Perrin, Nicolas

    2014-01-01

    In contrast with mammals and birds, most poikilothermic vertebrates feature structurally undifferentiated sex chromosomes, which may result either from frequent turnovers, or from occasional events of XY recombination. The latter mechanism was recently suggested to be responsible for sex-chromosome homomorphy in European tree frogs (Hyla arborea). However, no single case of male recombination has been identified in large-scale laboratory crosses, and populations from NW Europe consistently display sex-specific allelic frequencies with male-diagnostic alleles, suggesting the absence of recombination in their recent history. To address this apparent paradox, we extended the phylogeographic scope of investigations, by analyzing the sequences of three sex-linked markers throughout the whole species distribution. Refugial populations (southern Balkans and Adriatic coast) show a mix of X and Y alleles in haplotypic networks, and no more within-individual pairwise nucleotide differences in males than in females, testifying to recurrent XY recombination. In contrast, populations of NW Europe, which originated from a recent postglacial expansion, show a clear pattern of XY differentiation; the X and Y gametologs of the sex-linked gene Med15 present different alleles, likely fixed by drift on the front wave of expansions, and kept differentiated since. Our results support the view that sex-chromosome homomorphy in H. arborea is maintained by occasional or historical events of recombination; whether the frequency of these events indeed differs between populations remains to be clarified. PMID:24892652

  11. Range-wide sex-chromosome sequence similarity supports occasional XY recombination in European tree frogs (Hyla arborea.

    Directory of Open Access Journals (Sweden)

    Christophe Dufresnes

    Full Text Available In contrast with mammals and birds, most poikilothermic vertebrates feature structurally undifferentiated sex chromosomes, which may result either from frequent turnovers, or from occasional events of XY recombination. The latter mechanism was recently suggested to be responsible for sex-chromosome homomorphy in European tree frogs (Hyla arborea. However, no single case of male recombination has been identified in large-scale laboratory crosses, and populations from NW Europe consistently display sex-specific allelic frequencies with male-diagnostic alleles, suggesting the absence of recombination in their recent history. To address this apparent paradox, we extended the phylogeographic scope of investigations, by analyzing the sequences of three sex-linked markers throughout the whole species distribution. Refugial populations (southern Balkans and Adriatic coast show a mix of X and Y alleles in haplotypic networks, and no more within-individual pairwise nucleotide differences in males than in females, testifying to recurrent XY recombination. In contrast, populations of NW Europe, which originated from a recent postglacial expansion, show a clear pattern of XY differentiation; the X and Y gametologs of the sex-linked gene Med15 present different alleles, likely fixed by drift on the front wave of expansions, and kept differentiated since. Our results support the view that sex-chromosome homomorphy in H. arborea is maintained by occasional or historical events of recombination; whether the frequency of these events indeed differs between populations remains to be clarified.

  12. Two Highly Similar Poplar Paleo-subgenomes Suggest an Autotetraploid Ancestor of Salicaceae Plants.

    Science.gov (United States)

    Liu, Yinzhe; Wang, Jinpeng; Ge, Weina; Wang, Zhenyi; Li, Yuxian; Yang, Nanshan; Sun, Sangrong; Zhang, Liwei; Wang, Xiyin

    2017-01-01

    As a model plant to study perennial trees in the Salicaceae family, the poplar (Populus trichocarpa) genome was sequenced, revealing recurrent paleo-polyploidizations during its evolution. A comparative and hierarchical alignment of its genome to a well-selected reference genome would help us better understand poplar's genome structure and gene family evolution. Here, by adopting the relatively simpler grape (Vitis vinifera) genome as reference, and by inferring both intra- and inter-genomic gene collinearity, we produced a united alignment of these two genomes and hierarchically distinguished the layers of paralogous and orthologous genes, as related to recursive polyploidizations and speciation. We uncovered homologous blocks in the grape and poplar genomes and also between them. Moreover, we characterized the genes missing and found that poplar had two considerably similar subgenomes (≤0.05 difference in gene deletion) produced by the Salicaceae-common tetraploidization, suggesting its autotetraploid nature. Taken together, this work provides a timely and valuable dataset of orthologous and paralogous genes for further study of the genome structure and functional evolution of poplar and other Salicaceae plants.

  13. Binary interactions with high accretion rates onto main sequence stars

    Science.gov (United States)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10-2 M ⊙ yr-1 for solar type stars, and up to ≈ 1 M ⊙ yr-1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  14. CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences.

    Directory of Open Access Journals (Sweden)

    Sergey V Prykhozhij

    Full Text Available Genome engineering has been revolutionized by the discovery of clustered regularly interspaced palindromic repeats (CRISPR and CRISPR-associated system genes (Cas in bacteria. The type IIB Streptococcus pyogenes CRISPR/Cas9 system functions in many species and additional types of CRISPR/Cas systems are under development. In the type II system, expression of CRISPR single guide RNA (sgRNA targeting a defined sequence and Cas9 generates a sequence-specific nuclease inducing small deletions or insertions. Moreover, knock-in of large DNA inserts has been shown at the sites targeted by sgRNAs and Cas9. Several tools are available for designing sgRNAs that target unique locations in the genome. However, the ability to find sgRNA targets common to several similar sequences or, by contrast, unique to each of these sequences, would also be advantageous. To provide such a tool for several types of CRISPR/Cas system and many species, we developed the CRISPR MultiTargeter software. Similar DNA sequences in question are duplicated genes and sets of exons of different transcripts of a gene. Thus, we implemented a basic sgRNA target search of input sequences for single-sgRNA and two-sgRNA/Cas9 nickase targeting, as well as common and unique sgRNA target searches in 1 a set of input sequences; 2 a set of similar genes or transcripts; or 3 transcripts a single gene. We demonstrate potential uses of the program by identifying unique isoform-specific sgRNA sites in 71% of zebrafish alternative transcripts and common sgRNA target sites in approximately 40% of zebrafish duplicated gene pairs. The design of unique targets in alternative exons is helpful because it will facilitate functional genomic studies of transcript isoforms. Similarly, its application to duplicated genes may simplify multi-gene mutational targeting experiments. Overall, this program provides a unique interface that will enhance use of CRISPR/Cas technology.

  15. CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences.

    Science.gov (United States)

    Prykhozhij, Sergey V; Rajan, Vinothkumar; Gaston, Daniel; Berman, Jason N

    2015-01-01

    Genome engineering has been revolutionized by the discovery of clustered regularly interspaced palindromic repeats (CRISPR) and CRISPR-associated system genes (Cas) in bacteria. The type IIB Streptococcus pyogenes CRISPR/Cas9 system functions in many species and additional types of CRISPR/Cas systems are under development. In the type II system, expression of CRISPR single guide RNA (sgRNA) targeting a defined sequence and Cas9 generates a sequence-specific nuclease inducing small deletions or insertions. Moreover, knock-in of large DNA inserts has been shown at the sites targeted by sgRNAs and Cas9. Several tools are available for designing sgRNAs that target unique locations in the genome. However, the ability to find sgRNA targets common to several similar sequences or, by contrast, unique to each of these sequences, would also be advantageous. To provide such a tool for several types of CRISPR/Cas system and many species, we developed the CRISPR MultiTargeter software. Similar DNA sequences in question are duplicated genes and sets of exons of different transcripts of a gene. Thus, we implemented a basic sgRNA target search of input sequences for single-sgRNA and two-sgRNA/Cas9 nickase targeting, as well as common and unique sgRNA target searches in 1) a set of input sequences; 2) a set of similar genes or transcripts; or 3) transcripts a single gene. We demonstrate potential uses of the program by identifying unique isoform-specific sgRNA sites in 71% of zebrafish alternative transcripts and common sgRNA target sites in approximately 40% of zebrafish duplicated gene pairs. The design of unique targets in alternative exons is helpful because it will facilitate functional genomic studies of transcript isoforms. Similarly, its application to duplicated genes may simplify multi-gene mutational targeting experiments. Overall, this program provides a unique interface that will enhance use of CRISPR/Cas technology.

  16. High nucleosome occupancy is encoded at human regulatory sequences.

    Directory of Open Access Journals (Sweden)

    Desiree Tillo

    Full Text Available Active eukaryotic regulatory sites are characterized by open chromatin, and yeast promoters and transcription factor binding sites (TFBSs typically have low intrinsic nucleosome occupancy. Here, we show that in contrast to yeast, DNA at human promoters, enhancers, and TFBSs generally encodes high intrinsic nucleosome occupancy. In most cases we examined, these elements also have high experimentally measured nucleosome occupancy in vivo. These regions typically have high G+C content, which correlates positively with intrinsic nucleosome occupancy, and are depleted for nucleosome-excluding poly-A sequences. We propose that high nucleosome preference is directly encoded at regulatory sequences in the human genome to restrict access to regulatory information that will ultimately be utilized in only a subset of differentiated cells.

  17. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Athavale, Ajay [Monsanto

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  18. Similarity in recombination rate estimates highly correlates with genetic differentiation in humans.

    Directory of Open Access Journals (Sweden)

    Hafid Laayouni

    Full Text Available Recombination varies greatly among species, as illustrated by the poor conservation of the recombination landscape between humans and chimpanzees. Thus, shorter evolutionary time frames are needed to understand the evolution of recombination. Here, we analyze its recent evolution in humans. We calculated the recombination rates between adjacent pairs of 636,933 common single-nucleotide polymorphism loci in 28 worldwide human populations and analyzed them in relation to genetic distances between populations. We found a strong and highly significant correlation between similarity in the recombination rates corrected for effective population size and genetic differentiation between populations. This correlation is observed at the genome-wide level, but also for each chromosome and when genetic distances and recombination similarities are calculated independently from different parts of the genome. Moreover, and more relevant, this relationship is robustly maintained when considering presence/absence of recombination hotspots. Simulations show that this correlation cannot be explained by biases in the inference of recombination rates caused by haplotype sharing among similar populations. This result indicates a rapid pace of evolution of recombination, within the time span of differentiation of modern humans.

  19. Generating long sequences of high-intensity femtosecond pulses

    CERN Document Server

    Bitter, Martin

    2015-01-01

    We present an approach to create pulse sequences extending beyond 150~picoseconds in duration, comprised of $100~\\mu$J femtosecond pulses. A quarter of the pulse train is produced by a high-resolution pulse shaper, which allows full controllability over the timing of each pulse. Two nested Michelson interferometers follow to quadruple the pulse number and the sequence duration. To boost the pulse energy, the long train is sent through a multi-pass Ti:Sapphire amplifier, followed by an external compressor. A periodic sequence of 84~pulses of 120~fs width and an average pulse energy of 107~$\\mu$J, separated by 2~ps, is demonstrated as a proof of principle.

  20. Next-generation sequencing: big data meets high performance computing.

    Science.gov (United States)

    Schmidt, Bertil; Hildebrandt, Andreas

    2017-02-02

    The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and their efficient implementation on modern high performance computing systems is required.

  1. Compression of structured high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Fabien Campagne

    Full Text Available Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS. Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution, or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org that support common analyses for a range of high-throughput sequencing assays.

  2. Ultradeep 16S rRNA sequencing analysis of geographically similar but diverse unexplored marine samples reveal varied bacterial community composition.

    Directory of Open Access Journals (Sweden)

    Chairmandurai Aravindraja

    Full Text Available BACKGROUND: Bacterial community composition in the marine environment differs from one geographical location to another. Reports that delineate the bacterial diversity of different marine samples from geographically similar location are limited. The present study aims to understand whether the bacterial community compositions from different marine samples harbour similar bacterial diversity since these are geographically related to each other. METHODS AND PRINCIPAL FINDINGS: In the present study, 16S rRNA deep sequencing analysis targeting V3 region was performed using Illumina bar coded sequencing. A total of 22.44 million paired end reads were obtained from the metagenomic DNA of Marine sediment, Rhizosphere sediment, Seawater and the epibacterial DNA of Seaweed and Seagrass. Diversity index analysis revealed that Marine sediment has the highest bacterial diversity and the least bacterial diversity was observed in Rhizosphere sediment. Proteobacteria, Actinobacteria and Bacteroidetes were the dominant taxa present in all the marine samples. Nearly 62-71% of rare species were identified in all the samples and most of these rare species were unique to a particular sample. Further taxonomic assignment at the phylum and genus level revealed that the bacterial community compositions differ among the samples. CONCLUSION: This is the first report that supports the fact that, bacterial community composition is specific for specific samples irrespective of its similar geographical location. Existence of specific bacterial community for each sample may drive overall difference in bacterial structural composition of each sample. Further studies like whole metagenomic sequencing will throw more insights to the key stone players and its interconnecting metabolic pathways. In addition, this is one of the very few reports that depicts the unexplored bacterial diversity of marine samples (Marine sediment, Rhizosphere sediment, Seawater and the host associated

  3. Intermittent and continuous high-intensity exercise training induce similar acute but different chronic muscle adaptations.

    Science.gov (United States)

    Cochran, Andrew J R; Percival, Michael E; Tricarico, Steven; Little, Jonathan P; Cermak, Naomi; Gillen, Jenna B; Tarnopolsky, Mark A; Gibala, Martin J

    2014-05-01

    High-intensity interval training (HIIT) performed in an 'all-out' manner (e.g. repeated Wingate tests) is a time-efficient strategy to induce skeletal muscle remodelling towards a more oxidative phenotype. A fundamental question that remains unclear, however, is whether the intermittent or 'pulsed' nature of the stimulus is critical to the adaptive response. In study 1, we examined whether the activation of signalling cascades linked to mitochondrial biogenesis was dependent on the manner in which an acute high-intensity exercise stimulus was applied. Subjects performed either four 30 s Wingate tests interspersed with 4 min of rest (INT) or a bout of continuous exercise (CONT) that was matched for total work (67 ± 7 kJ) and which required ∼4 min to complete as fast as possible. Both protocols elicited similar increases in markers of adenosine monophosphate-activated protein kinase (AMPK) and p38 mitogen-activated protein kinase activation, as well as Peroxisome proliferator-activated receptor gamma coactivator 1-alpha (PGC-1α) mRNA expression (main effects for time, P ≤ 0.05). In study 2, we determined whether 6 weeks of the CONT protocol (3 days per week) would increase skeletal muscle mitochondrial content to a similar extent to what we have previously reported after 6 weeks of INT. Despite similar acute signalling responses to the CONT and INT protocols, training with CONT did not increase the maximal activity or protein content of a range of mitochondrial markers. However, peak oxygen uptake was higher after CONT training (from 45.7 ± 5.4 to 48.3 ± 6.5 ml kg(-1) min(-1); P muscle adaptations to low-volume, all-out HIIT. Despite the lack of skeletal muscle mitochondrial adaptations, our data show that a training programme based on a brief bout of high-intensity exercise, which lasted <10 min per session including warm-up, and performed three times per week for 6 weeks, improved peak oxygen uptake in young healthy subjects.

  4. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

    Science.gov (United States)

    Koht, Jeanette; Pihlstrøm, Lasse; Rengmark, Aina H.; Henriksen, Sandra P.; Tallaksen, Chantal M. E.; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process. PMID:28362824

  5. Metformin prevents aggressive ovarian cancer growth driven by high-energy diet: similarity with calorie restriction.

    Science.gov (United States)

    Al-Wahab, Zaid; Mert, Ismail; Tebbe, Calvin; Chhina, Jasdeep; Hijaz, Miriana; Morris, Robert T; Ali-Fehmi, Rouba; Giri, Shailendra; Munkarah, Adnan R; Rattan, Ramandeep

    2015-05-10

    Caloric restriction (CR) was recently demonstrated by us to restrict ovarian cancer growth in vivo. CR resulted in activation of energy regulating enzymes adenosine monophosphate activated kinase (AMPK) and sirtuin 1 (SIRT1) followed by downstream inhibition of Akt-mTOR. In the present study, we investigated the effects of metformin on ovarian cancer growth in mice fed a high energy diet (HED) and regular diet (RD) and compared them to those seen with CR in an immunocompetent isogeneic mouse model of ovarian cancer. Mice either on RD or HED diet bearing ovarian tumors were treated with 200 mg/kg metformin in drinking water. Metformin treatment in RD and HED mice resulted in a significant reduction in tumor burden in the peritoneum, liver, kidney, spleen and bowel accompanied by decreased levels of growth factors (IGF-1, insulin and leptin), inflammatory cytokines (MCP-1, IL-6) and VEGF in plasma and ascitic fluid, akin to the CR diet mice. Metformin resulted in activation of AMPK and SIRT1 and inhibition of pAkt and pmTOR, similar to CR. Thus metformin can closely mimic CR's tumor suppressing effects by inducing similar metabolic changes, providing further evidence of its potential not only as a therapeutic drug but also as a preventive agent.

  6. Masturbation Experiences of Swedish Senior High School Students: Gender Differences and Similarities.

    Science.gov (United States)

    Driemeyer, Wiebke; Janssen, Erick; Wiltfang, Jens; Elmerstig, Eva

    2016-05-04

    Research about masturbation tends to be limited to the assessment of masturbation incidence and frequency. Consequently, little is known about what people experience connected to masturbation. This might be one reason why theoretical approaches that specifically address the persistent gender gap in masturbation frequency are lacking. The aim of the current study was to explore several aspects of masturbation in young men and women, and to examine possible associations with their social backgrounds and sexual histories. Data from 1,566 women and 1,452 men (ages 18 to 22) from 52 Swedish senior high schools were analyzed. Comparisons between men and women were made regarding incidence of and age at first masturbation, the use of objects (e.g., sex toys), fantasies, and sexual functioning during masturbation, as well as about their attitudes toward masturbation and sexual fantasies. Cluster analysis was carried out to identify similarities between and differences within the gender groups. While overall more men than women reported experience with several of the investigated aspects, cluster analyses revealed that a large proportion of men and women reported similar experiences and that fewer experiences are not necessarily associated with negative attitudes toward masturbation. Implications of these findings are discussed in consideration of particular social backgrounds.

  7. Psychiatric disorders in individuals with high-functioning autism and Asperger's disorder: similarities and differences.

    Science.gov (United States)

    Mukaddes, Nahit Motavalli; Hergüner, Sabri; Tanidir, Canan

    2010-12-01

    To investigate and compare the rate and type of psychiatric co-morbidity in individuals with diagnosis of high functioning autism (HFA) and Asperger's disorder (AS). This study includes 30 children and adolescents with diagnosis of HFA and 30 with diagnosis of AS. Diagnoses of HFA and AS were made using strict DSM-IV criteria. Psychiatric co-morbidity was assessed using the Schedule for Affective Disorders and Schizophrenia for School Age Children-Present and Lifetime Version (K-SADS-PL-T). The rate of comorbid psychiatric disorders was very high in both groups (93.3% in HFA and 100% in AS). The most common disorder in both groups was attention deficit hyperactivity disorder. There was no statistically significant difference between groups in the rate of associated psychiatric disorders, except for major depressive disorder (P = 0.029) and ADHD-combined type (P = 0.030). The AS group displayed greater comorbidity with depressive disorders and ADHD-CT. From a clinical perspective, it could be concluded that both disorders involve a high risk for developing psychiatric disorders, with AS patients at greater risk for depression. From a nosological perspective, the substantial similarities in terms of psychiatric comorbidity may support the idea that both disorders are on the same spectrum and differs in some aspects.

  8. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jonas Binladen

    Full Text Available BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. METHODOLOGY: We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences. Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. CONCLUSIONS: We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%. Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of

  9. Remarkable similarity in genome nucleotide sequences between the Schwarz FF-8 and AIK-C measles virus vaccine strains and apparent nucleotide differences in the phosphoprotein gene.

    Science.gov (United States)

    Ito, Chie; Ohgimoto, Shinji; Kato, Seiichi; Sharma, Luna Bhatta; Ayata, Minoru; Komase, Katsuhiro; Takeuchi, Kaoru; Ihara, Toshiaki; Ogura, Hisashi

    2011-07-01

    The Schwarz FF-8 (FF-8) and AIK-C measles virus vaccine strains are currently used for vaccination in Japan. Here, the complete genome nucleotide sequence of the FF-8 strain has been determined and its genome sequence found to be remarkably similar to that of the AIK-C strain. These two strains are differentiated only by two nucleotide differences in the phosphoprotein gene. Since the FF-8 strain does not possess the amino acid substitutions in the phospho- and fusion proteins which are responsible for the temperature-sensitivity and small syncytium formation phenotypes of the AIK-C strain, respectively, other unidentified common mechanisms likely attenuate both the FF-8 and AIK-C strains.

  10. PHYRN: a robust method for phylogenetic analysis of highly divergent sequences.

    Directory of Open Access Journals (Sweden)

    Gaurav Bhardwaj

    Full Text Available Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity. Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian against a novel MSA-independent method (PHYRN described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position, PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

  11. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    Science.gov (United States)

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

  12. The carboxyl terminus of RAP30 is similar in sequence to region 4 of bacterial sigma factors and is required for function.

    Science.gov (United States)

    Garrett, K P; Serizawa, H; Hanley, J P; Bradsher, J N; Tsuboi, A; Arai, N; Yokota, T; Arai, K; Conaway, R C; Conaway, J W

    1992-11-25

    Transcription factor beta gamma (RAP30/74) from rat liver was previously shown in biochemical studies to control the binding of RNA polymerase II to promoters by a mechanism analogous to that utilized by bacterial sigma factors, by decreasing the affinity of polymerase for nonpromoter sites on DNA and by increasing the affinity of the enzyme for the preinitiation complex (Conaway, R. C., Garrett, K. P., Hanley, J. P., and Conaway, J. W. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 6205-6209). By constructing and analyzing mutants of beta gamma, we have identified a novel functional domain located in the carboxyl terminus of the gamma (RAP30) subunit. This domain shares sequence similarity with region 4 of bacterial sigma factors; in particular, it exhibits striking similarity to the carboxyl-terminal regions 4.1 and 4.2 of SpoIIIC (Bacillus subtilis sigma k). Evidence from biochemical studies argues that a mutant gamma (RAP30), lacking amino acid sequences similar to sigma homology region 4.2, is able to assemble with the beta (RAP74) subunit to form a mutant beta gamma (RAP30/74) with impaired ability to interact with RNA polymerase II.

  13. 基于序列匹配的作业相似度检测系统%Homework Similarity Detection System Based on Sequence Matching

    Institute of Scientific and Technical Information of China (English)

    王晓英; 靳力; 王晓青; 黄维通

    2012-01-01

    为辅助教师进行电子作业的批改和抄袭鉴别,设计并实现一种基于序列匹配的作业相似度检测系统.以班级为分组建立相似度计算模型,利用序列匹配算法计算公共子序列的长度,得到每组作业两两之间的相似度,并在此基础上进行聚类分析,给出可视化结果.实验结果表明,该系统具有较强的实用性,能够辅助教师在批改作业时快速高效地鉴别疑似抄袭的情况.%Aiming at helping teachers verify the originality of students reports during teaching, this paper presents the design and development of a similarity detection system based on sequence matching. An explicit similarity measurement model is established, the length of common subsequence is calculated based on the sequence matching algorithm, and the similarity between each pair of students documents in the same group is obtained. The similarity matrix is further normalized and classified into groups, incorporating the impact of document templates. Comparison results are visualized which are intuitively understandable for teachers to learn the similarity distribution across the whole class. Experimental results show the feasibility and practicability of the designed system, which can help teachers quickly detect the plagiarism.

  14. High throughput 16S rRNA gene amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup

    S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...... to the presence of filamentous microorganisms was monitored weekly over 4 months. Microthrix was identified as a causative filament and suitable control measures were introduced. The level of Microthrix was reduced after 1-2 months but a number of other filamentous species were still present, with most of them...

  15. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    Directory of Open Access Journals (Sweden)

    Zhongwei Liang

    2014-01-01

    Full Text Available It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transitional vector determination, the calibrated surface of light intensity reflective energy can be established, describing the shape topology and topography details of objective tablet. On this basis, the mathematical properties of these established surfaces have been proposed, and thereafter artificial neural network (ANN has been employed for classifying those moving targeted tablets by recognizing their different surface properties; therefore, the instantaneous coordinate positions of those drug tablets on one image frame can then be determined. By repeating identical pattern recognition on the next image frame, the real-time movements of objective tablet templates were successfully tracked in sequence. This paper provides reliable references and new research ideas for the real-time objective tracking in the case of drug production practices.

  16. Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model

    Directory of Open Access Journals (Sweden)

    Salha M. Alzahrani

    2015-07-01

    Full Text Available Highly obfuscated plagiarism cases contain unseen and obfuscated texts, which pose difficulties when using existing plagiarism detection methods. A fuzzy semantic-based similarity model for uncovering obfuscated plagiarism is presented and compared with five state-of-the-art baselines. Semantic relatedness between words is studied based on the part-of-speech (POS tags and WordNet-based similarity measures. Fuzzy-based rules are introduced to assess the semantic distance between source and suspicious texts of short lengths, which implement the semantic relatedness between words as a membership function to a fuzzy set. In order to minimize the number of false positives and false negatives, a learning method that combines a permission threshold and a variation threshold is used to decide true plagiarism cases. The proposed model and the baselines are evaluated on 99,033 ground-truth annotated cases extracted from different datasets, including 11,621 (11.7% handmade paraphrases, 54,815 (55.4% artificial plagiarism cases, and 32,578 (32.9% plagiarism-free cases. We conduct extensive experimental verifications, including the study of the effects of different segmentations schemes and parameter settings. Results are assessed using precision, recall, F-measure and granularity on stratified 10-fold cross-validation data. The statistical analysis using paired t-tests shows that the proposed approach is statistically significant in comparison with the baselines, which demonstrates the competence of fuzzy semantic-based model to detect plagiarism cases beyond the literal plagiarism. Additionally, the analysis of variance (ANOVA statistical test shows the effectiveness of different segmentation schemes used with the proposed approach.

  17. Study on similar model of high pressure water jet impacting coal rock

    Science.gov (United States)

    Liu, Jialiang; Wang, Mengjin; Zhang, Di

    2017-08-01

    Based on the similarity theory and dimensional analysis, the similarity criterion of the coal rock mechanical parameters were deduced. The similar materials were mainly built by the cement, sand, nitrile rubber powder and polystyrene, by controlling the water-cement ratio, cement-sand ratio, curing time and additives volume ratio. The intervals of the factors were obtained by carrying out series of material compression tests. By comparing the basic mechanical parameters such as the bulk density, compressive strength, Poisson ratio and elastic modulus between the coal rock prototype and similar materials, the optimal producing proposal of the coal rock similar materials was generated based on the orthogonal design tests finally.

  18. 一种事件序列相似性评估方法%An Estimate Method of Similarity of Event Sequences

    Institute of Scientific and Technical Information of China (English)

    张玮昕; 王耘波; 高俊雄

    2013-01-01

    The event sequences appear in various fields of industrial manufacturing and information science widely, such as the operating flow of the pipeline's, string, click-stream came from users access the site, DNA sequences, system maintenance records. Unlike digital signal processing in the time sequence, the independent variables of the event sequences are the representative sequences of finite positive integer. And the dependent variables are the characterization of the constants of the events which are no size differences. Based on the practical application, the problem into is converted a linear programming problem, and a local good adaptability similarity assessment methods is established. The methods applied in the examination and evaluation of a certain type of missile weapon system successfully.%事件序列广泛出现在工业制造和信息科学的各个领域,如流水线中的操作流、字符串、用户访问网站的点击流、DNA序列、系统维护记录等.不同于数字信号处理中的时间序列,事件序列的自变量是代表顺序的有限正整数,因变量是表征事件的常量,这样的常量没有大小之分,是定性的特征.文章立足于实际应用,通过把问题转化为一个线性规划问题,建立了一种局部特征适应性好的相似性评估方法,并成功应用于某型导弹武器系统的考核评估中.

  19. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...... splicing events and coding potential of isoforms from full isoform deconvolution software, such as Cufflinks (article II), is presented. Finally, a study using 5’-end RNA-seq for alternative promoter detection between healthy patients and patients with acute promyelocytic leukemia is presented (article III...

  20. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine ...... be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.......BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine...... template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. METHODOLOGY: We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through...

  1. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  2. Cross-language distributions of high frequency and phonetically similar cognates.

    Directory of Open Access Journals (Sweden)

    Job Schepens

    Full Text Available The coinciding form and meaning similarity of cognates, e.g. 'flamme' (French, 'Flamme' (German, 'vlam' (Dutch, meaning 'flame' in English, facilitates learning of additional languages. The cross-language frequency and similarity distributions of cognates vary according to evolutionary change and language contact. We compare frequency and orthographic (O, phonetic (P, and semantic similarity of cognates, automatically identified in semi-complete lexicons of six widely spoken languages. Comparisons of P and O similarity reveal inconsistent mappings in language pairs with deep orthographies. The frequency distributions show that cognate frequency is reduced in less closely related language pairs as compared to more closely related languages (e.g., French-English vs. German-English. These frequency and similarity patterns may support a better understanding of cognate processing in natural and experimental settings. The automatically identified cognates are available in the supplementary materials, including the frequency and similarity measurements.

  3. Identification of the genes encoding NAD(P)H-flavin oxidoreductases that are similar in sequence to Escherichia coli Fre in four species of luminous bacteria: Photorhabdus luminescens, Vibrio fischeri, Vibrio harveyi, and Vibrio orientalis.

    Science.gov (United States)

    Zenno, S; Saigo, K

    1994-06-01

    Genes encoding NAD(P)H-flavin oxidoreductases (flavin reductases) similar in both size and sequence to Fre, the most abundant flavin reductase in Escherichia coli, were identified in four species of luminous bacteria, Photorhabdus luminescens (ATCC 29999), Vibrio fischeri (ATCC 7744), Vibrio harveyi (ATCC 33843), and Vibrio orientalis (ATCC 33934). Nucleotide sequence analysis showed Fre-like flavin reductases in P. luminescens and V. fischeri to consist of 233 and 236 amino acids, respectively. As in E. coli Fre, Fre-like enzymes in luminous bacteria preferably used riboflavin as an electron acceptor when NADPH was used as an electron donor. These enzymes also were good suppliers of reduced flavin mononucleotide (FMNH2) to the bioluminescence reaction. In V. fischeri, the Fre-like enzyme is a minor flavin reductase representing Fre-like enzyme has no appreciable homology in amino acid sequence to the major flavin reductase in V. fischeri, FRase I, indicates that at least two different types of flavin reductases supply FMNH2 to the luminescence system in V. fischeri. Although Fre-like flavin reductases are highly similar in sequence to luxG gene products (LuxGs), Fre-like flavin reductases and LuxGs appear to constitute two separate groups of flavin-associated proteins.

  4. Fast filtering false active subspaces for efficient high dimensional similarity processing

    Institute of Scientific and Technical Information of China (English)

    WANG GuoRen; YU Ge; XIN JunChang; ZHAO YuHai; ZHANG EnDe

    2009-01-01

    The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre-sponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active sub-spaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be re-fined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.

  5. High doses of dextromethorphan, an NMDA antagonist, produce effects similar to classic hallucinogens

    Science.gov (United States)

    Carter, Lawrence P.; Johnson, Matthew W.; Mintzer, Miriam Z.; Klinedinst, Margaret A.; Griffiths, Roland R.

    2013-01-01

    Rationale Although reports of dextromethorphan (DXM) abuse have increased recently, few studies have examined the effects of high doses of DXM. Objective This study in humans evaluated the effects of supratherapeutic doses of DXM and triazolam. Methods Single, acute, oral doses of DXM (100, 200, 300, 400, 500, 600, 700, 800 mg/70 kg), triazolam (0.25, 0.5 mg/70kg), and placebo were administered to twelve healthy volunteers with histories of hallucinogen use, under double-blind conditions, using an ascending dose run-up design. Subjective, behavioral, and physiological effects were assessed repeatedly after drug administration for 6 hours. Results Triazolam produced dose-related increases in subject-rated sedation, observer-rated sedation, and behavioral impairment. DXM produced a profile of dose-related physiological and subjective effects differing from triazolam. DXM effects included increases in blood pressure, heart rate, and emesis, increases in observer-rated effects typical of classic hallucinogens (e.g. distance from reality, visual effects with eyes open and closed, joy, anxiety), and participant ratings of stimulation (e.g. jittery, nervous), somatic effects (e.g. tingling, headache), perceptual changes, end-of-session drug liking, and mystical-type experience. After 400 mg/70kg DXM, 11 of 12 participants indicated on a pharmacological class questionnaire that they thought they had received a classic hallucinogen (e.g. psilocybin). Drug effects resolved without significant adverse effects by the end of the session. In a 1-month follow up volunteers attributed increased spirituality and positive changes in attitudes, moods, and behavior to the session experiences. Conclusions High doses of DXM produced effects distinct from triazolam and had characteristics that were similar to the classic hallucinogen psilocybin. PMID:22526529

  6. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    Directory of Open Access Journals (Sweden)

    Frohme Marcus

    2009-10-01

    Full Text Available Abstract Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.

  7. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

    Science.gov (United States)

    Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

    2009-10-12

    Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.

  8. High Resolution Imaging of PHIBSS z~2 Main Sequence Galaxies in CO J=1-0

    CERN Document Server

    Bolatto, A D; Leroy, A K; Tacconi, L J; Bouché, N; Schreiber, N M Förster; Genzel, R; Cooper, M C; Fisher, D B; Combes, F; García-Burillo, S; Burkert, A; Bournaud, F; Weiss, A; Saintonge, A; Wuyts, S; Sternberg, A

    2015-01-01

    We present Karl G. Jansky Very Large Array observations of the CO J=1-0 transition in a sample of four $z\\sim2$ main sequence galaxies. These galaxies are in the blue sequence of star-forming galaxies at their redshift, and are part of the IRAM Plateau de Bure HIgh-$z$ Blue Sequence Survey (PHIBSS) which imaged them in CO J=3-2. Two galaxies are imaged here at high signal-to-noise, allowing determinations of their disk sizes, line profiles, molecular surface densities, and excitation. Using these and published measurements, we show that the CO and optical disks have similar sizes in main-sequence galaxies, and in the galaxy where we can compare CO J=1-0 and J=3-2 sizes we find these are also very similar. Assuming a Galactic CO-to-H$_2$ conversion, we measure surface densities of $\\Sigma_{mol}\\sim1200$ M$_\\odot$pc$^{-2}$ in projection and estimate $\\Sigma_{mol}\\sim500-900$ M$_\\odot$pc$^{-2}$ deprojected. Finally, our data yields velocity-integrated Rayleigh-Jeans brightness temperature line ratios $r_{31}$ th...

  9. Total and high molecular weight adiponectin have similar utility for the identification of insulin resistance

    Directory of Open Access Journals (Sweden)

    Aguilar-Salinas Carlos A

    2010-06-01

    Full Text Available Abstract Background Insulin resistance (IR and related metabolic disturbances are characterized by low levels of adiponectin. High molecular weight adiponectin (HMWA is considered the active form of adiponectin and a better marker of IR than total adiponectin. The objective of this study is to compare the utility of total adiponectin, HMWA and the HMWA/total adiponectin index (SA index for the identification of IR and related metabolic conditions. Methods A cross-sectional analysis was performed in a group of ambulatory subjects, aged 20 to 70 years, in Mexico City. Areas under the receiver operator characteristic (ROC curve for total, HMWA and the SA index were plotted for the identification of metabolic disturbances. Sensitivity and specificity, positive and negative predictive values, and accuracy for the identification of IR were calculated. Results The study included 101 men and 168 women. The areas under the ROC curve for total and HMWA for the identification of IR (0.664 vs. 0.669, P = 0.74, obesity (0.592 vs. 0.610, P = 0.32, hypertriglyceridemia (0.661 vs. 0.671, P = 0.50 and hypoalphalipoproteinemia (0.624 vs. 0.633, P = 0.58 were similar. A total adiponectin level of 8.03 μg/ml was associated with a sensitivity of 57.6%, a specificity of 65.9%, a positive predictive value of 50.0%, a negative predictive value of 72.4%, and an accuracy of 62.7% for the diagnosis of IR. The corresponding figures for a HMWA value of 4.25 μg/dl were 59.6%, 67.1%, 51.8%, 73.7% and 64.2%. The area under the ROC curve of the SA index for the identification of IR was 0.622 [95% CI 0.554-0.691], obesity 0.613 [95% CI 0.536-0.689], hypertriglyceridemia 0.616 [95% CI 0.549-0.683], and hypoalphalipoproteinemia 0.606 [95% CI 0.535-0.677]. Conclusions Total adiponectin, HMWA and the SA index had similar utility for the identification of IR and metabolic disturbances.

  10. Effect of k-tuple length on sample-comparison with high-throughput sequencing data.

    Science.gov (United States)

    Wang, Ying; Lei, Xiaoye; Wang, Shun; Wang, Zicheng; Song, Nianfeng; Zeng, Feng; Chen, Ting

    2016-01-22

    The high-throughput metagenomic sequencing offers a powerful technique to compare the microbial communities. Without requiring extra reference sequences, alignment-free models with short k-tuple (k = 2-10 bp) yielded promising results. Short k-tuples describe the overall statistical distribution, but is hard to capture the specific characteristics inside one microbial community. Longer k-tuple contains more abundant information. However, because the frequency vector of long k-tuple(k ≥ 30 bp) is sparse, the statistical measures designed for short k-tuples are not applicable. In our study, we considered each tuple as a meaningful word and then each sequencing data as a document composed of the words. Therefore, the comparison between two sequencing data is processed as "topic analysis of documents" in text mining. We designed a pipeline with long k-tuple features to compare metagenomic samples combined using algorithms from text mining and pattern recognition. The pipeline is available at http://culotuple.codeplex.com/. Experiments show that our pipeline with long k-tuple features: ①separates genomes with high similarity; ②outperforms short k-tuple models in all experiments. When k ≥ 12, the short k-tuple measures are not applicable anymore. When k is between 20 and 40, long k-tuple pipeline obtains much better grouping results; ③is free from the effect of sequencing platforms/protocols. ③We obtained meaningful and supported biological results on the 40-tuples selected for comparison.

  11. Self-similarity matrix based slow-time feature extraction for human target in high-resolution radar

    NARCIS (Netherlands)

    He, Y.; Aubry, P.; Le Chevalier, F.; Yarovoy, A.

    2014-01-01

    A new approach is proposed to extract the slow-time feature of human motion in high-resolution radars. The approach is based on the self-similarity matrix (SSM) of the radar signals. The Mutual Information is used as a measure of similarity. The SSMs of different radar signals (high-resolution range

  12. Similar health benefits of endurance and high-intensity interval training in obese children.

    Directory of Open Access Journals (Sweden)

    Ana Carolina Corte de Araujo

    Full Text Available PURPOSE: To compare two modalities of exercise training (i.e., Endurance Training [ET] and High-Intensity Interval Training [HIT] on health-related parameters in obese children aged between 8 and 12 years. METHODS: Thirty obese children were randomly allocated into either the ET or HIT group. The ET group performed a 30 to 60-minute continuous exercise at 80% of the peak heart rate (HR. The HIT group training performed 3 to 6 sets of 60-s sprint at 100% of the peak velocity interspersed by a 3-min active recovery period at 50% of the exercise velocity. HIT sessions last ~70% less than ET sessions. At baseline and after 12 weeks of intervention, aerobic fitness, body composition and metabolic parameters were assessed. RESULTS: BOTH THE ABSOLUTE (ET: 26.0%; HIT: 19.0% and the relative VO(2 peak (ET: 13.1%; HIT: 14.6% were significantly increased in both groups after the intervention. Additionally, the total time of exercise (ET: 19.5%; HIT: 16.4% and the peak velocity during the maximal graded cardiorespiratory test (ET: 16.9%; HIT: 13.4% were significantly improved across interventions. Insulinemia (ET: 29.4%; HIT: 30.5% and HOMA-index (ET: 42.8%; HIT: 37.0% were significantly lower for both groups at POST when compared to PRE. Body mass was significantly reduced in the HIT (2.6%, but not in the ET group (1.2%. A significant reduction in BMI was observed for both groups after the intervention (ET: 3.0%; HIT: 5.0%. The responsiveness analysis revealed a very similar pattern of the most responsive variables among groups. CONCLUSION: HIT and ET were equally effective in improving important health related parameters in obese youth.

  13. Comparison of highly repeated DNA sequences in some Lemuridae and taxonomic implications.

    Science.gov (United States)

    Montagnon, D; Crovella, S; Rumpler, Y

    1993-01-01

    Highly repeated DNA sequences of Eulemur fulvus mayottensis, E. coronatus, Lemur catta, and Hapalemur griseus griseus have been identified and compared. Sequence analysis of highly repeated DNA fragments isolated from L. catta and Hapalemur showed a high percentage of similarity (nearly 95%), as did fragments isolated from the two very close Eulemur species, whereas comparison of the DNA fragments isolated from the two Eulemur species and the L. catta/Hapalemur group showed a very low percentage (approximately 40%) of identity, as might be expected for distant species. These results confirm our previous data, obtained by Southern blot hybridization techniques on the same species, and strongly support the existence of a common trunk between L. catta and Hapalemur, but different from the leading to the Eulemur species.

  14. High order coherent control sequences of fat pulses

    CERN Document Server

    Pasini, S; Uhrig, G S

    2010-01-01

    We analyze the performance of sequences of fat pulses of various lengths and shapes for dynamic decoupling and we compare it with that of sequences of ideal, instantaneous pulses. The use of second order, shaped pulses represents a significant improvement. Non-equidistant sequences characterized by pulse durations scaled proportional to the duration T of the sequence strikingly outperform the sequences with pulses of constant length for small T. Interestingly, for longer durations sequences of pulses of substantial length are found to suppress dephasing better than sequences of ideal pulses.

  15. An Evaluation of a High-Probability Instructional Sequence to Increase Acceptance of Food and Decrease Inappropriate Behavior in Children with Pediatric Feeding Disorders

    Science.gov (United States)

    Patel, Meeta R.; Reed, Gregory K.; Piazza, Cathleen C.; Bachmeyer, Melainie H.; Layer, Stacy A.; Pabico, Ryan S.

    2006-01-01

    We evaluated the effects of escape extinction with and without a high-probability (high-p) instructional sequence on food acceptance and inappropriate behavior for children diagnosed with feeding problems. The high-p sequence consisted of three presentations of a response that was similar topographically (i.e., presentations of an empty nuk[R],…

  16. An Evaluation of a High-Probability Instructional Sequence to Increase Acceptance of Food and Decrease Inappropriate Behavior in Children with Pediatric Feeding Disorders

    Science.gov (United States)

    Patel, Meeta R.; Reed, Gregory K.; Piazza, Cathleen C.; Bachmeyer, Melainie H.; Layer, Stacy A.; Pabico, Ryan S.

    2006-01-01

    We evaluated the effects of escape extinction with and without a high-probability (high-p) instructional sequence on food acceptance and inappropriate behavior for children diagnosed with feeding problems. The high-p sequence consisted of three presentations of a response that was similar topographically (i.e., presentations of an empty nuk[R],…

  17. 番茄抗花叶病毒基因类似序列扩增研究%Studies on Gene Similar Sequence Amplification of Tomato Mosaic Virus

    Institute of Scientific and Technical Information of China (English)

    李敏

    2011-01-01

    [目的]研究番茄抗花叶病毒基因类似序列扩增.[方法]以选育和生产上广泛使用的20个番茄品种为试材,对其进行了田间和实验室抗花叶病毒病(ToMV)筛选、基因类似序列扩增、RAPD聚类分析及特异引物PCR扩增.[结果]美味樱桃番茄、中杂9号、旱红宝、W2624、OH-2-2-11、黄圣果和美国番茄对ToMV具有较强的抗性;根据聚类分析,受试品种可分为3类,从中选出9个抗性和非抗性品种进行抗ToMV,基因类似序列分析,其中非抗性和购买无抗性番茄品种未扩增出任何谱带,而抗性品种扩增出约300 bp的谱带,初步认为该谱带为抗ToMV类似序列基因.[结论]为培育抗花叶病毒病的番茄品种提供了理论依据.%[ Objective ] The aim was to study the gene similar sequence amplification of tomato mosaic virus. [ Method ] Taking 20 tomato varieties cultivating in breeding and production widely as tested materials,tomato mosaic virus (ToMV) was screened in field and laboratory,furthennore RAPD cluster analysis and gene amplification experiment were carried out,and specific primers of PCR amplification was carried outfinally. [ Result] Dilicioas cherry tomato,China tomato hybrid 9 ,early red jewel tomato,W262 -4,Oh -2 -2 - 11 ,yellow holy tomato and American tomato had stronger resistance to ToMV. According to the result of RAPD cluster analysis,the 20 tomato cultivars could be divided into three types, from which 9 varieties inclnding resistant and nonresistant varieties were selected amplify gene similar sequence. None band was found from those nonresistant varieties,while a band of about 300 bp was found from those resistant varieties,so the band was regarded as anti-ToMV gene similar sequence. [ Conclusion ] The reseach provides theoretical basis for breeding the tomato varieties which are resistant to ToMV.

  18. High frequency RNA recombination in porcine reproductive and respiratory syndrome virus occurs preferentially between parental sequences with high similarity

    DEFF Research Database (Denmark)

    van Vugt, Joke .J.F.A.; Storgaard, Torben; Oleksiewicz, Martin B.

    2001-01-01

    Two types of porcine reproductive and respiratory syndrome virus (PRRSV) exist, a North American type and a European type. The co-existence of both types in some countries, such as Denmark, Slovakia and Canada, creates a risk of inter-type recombination. To evaluate this risk, cell cultures were co...

  19. High frequency RNA recombination in porcine reproductive and respiratory syndrome virus occurs preferentially between parental sequences with high similarity

    DEFF Research Database (Denmark)

    van Vugt, Joke .J.F.A.; Storgaard, Torben; Oleksiewicz, Martin B.

    2001-01-01

    Two types of porcine reproductive and respiratory syndrome virus (PRRSV) exist, a North American type and a European type. The co-existence of both types in some countries, such as Denmark, Slovakia and Canada, creates a risk of inter-type recombination. To evaluate this risk, cell cultures were co......, but no recombination was detected between the European and North American types. Calculation of the maximum theoretical risk of European-American recombination, based on the sensitivity of the RT-PCR system, revealed that RNA recombination between the European and North American types of PRRSV is at least 10000 times...

  20. Protein profiling reveals inter-individual protein homogeneity of arachnoid cyst fluid and high qualitative similarity to cerebrospinal fluid

    Directory of Open Access Journals (Sweden)

    Berle Magnus

    2011-05-01

    the majority of abundant proteins in AC fluid also can be found in CSF. Compared to plasma, as many as 104 proteins in AC were not found in the list of 3017 plasma proteins. Conclusions Based on the protein content of AC fluid, our data indicate that temporal AC is a homogenous condition, pointing towards a similar AC filling mechanism for the 14 patients examined. Most of the proteins identified in AC fluid have been identified in CSF, indicating high similarity in the qualitative protein content of AC to CSF, whereas this was not the case between AC and plasma. This indicates that AC is filled with a liquid similar to CSF. As far as we know, this is the first proteomics study that explores the AC fluid proteome.

  1. Scaling and interaction of self-similar modes in models of high Reynolds number wall turbulence

    Science.gov (United States)

    Sharma, A. S.; Moarref, R.; McKeon, B. J.

    2017-03-01

    Previous work has established the usefulness of the resolvent operator that maps the terms nonlinear in the turbulent fluctuations to the fluctuations themselves. Further work has described the self-similarity of the resolvent arising from that of the mean velocity profile. The orthogonal modes provided by the resolvent analysis describe the wall-normal coherence of the motions and inherit that self-similarity. In this contribution, we present the implications of this similarity for the nonlinear interaction between modes with different scales and wall-normal locations. By considering the nonlinear interactions between modes, it is shown that much of the turbulence scaling behaviour in the logarithmic region can be determined from a single arbitrarily chosen reference plane. Thus, the geometric scaling of the modes is impressed upon the nonlinear interaction between modes. Implications of these observations on the self-sustaining mechanisms of wall turbulence, modelling and simulation are outlined.

  2. Expressed sequence tags analysis of a liver tissue cDNA library from a highly inbred minipig line

    Institute of Scientific and Technical Information of China (English)

    CHEN You-nan; TAN Wei-dong; LU Yan-rong; QIN Sheng-fang; LI Sheng-fu; ZENG Yang-zhi; BU Hong; LI You-ping; CHENG Jing-qiu

    2007-01-01

    Background Porcine liver performing efficient physiological functions in the human body is prerequisite for successful liver xenotransplantation. However, the protein differences between pig and human remain largely unexplored. Therefore,we investigated the liver expression profile of a highly inbred minipig line.Methods A cDNA library was constructed from liver tissue of an inbred Banna minipig. Two hundred randomly selected clones were sequenced then analysed by BLAST programme.Results Alignments of the sequences showed 44% encoded previously known porcine genes. Among the 56% unknown genes, sequences of 72 clones had high similarities with known genes of other species and the similarities to human were mostly above 0.80. The other 40 clones showing no similarity to genes in National Centre for Biotechnology Information are newly discovered, expressed sequence tags specific to liver of inbred Banna minipig. Twenty-two of the 200 clones had full length encoding regions, 38 complete 5' terminal sequences and 140 complete 3' terminal sequences.Conclusion These newly discovered expression sequences may be an important resource for research involving physiological characteristics and medical usage of inbred pigs and contribute to matching studies in xenotransplantation.

  3. Molecular phylogeny and species separation of five morphologically similar Holosticha-complex ciliates (Protozoa, Ciliophora) using ARDRA riboprinting and multigene sequence data

    Science.gov (United States)

    Gao, Feng; Yi, Zhenzhen; Gong, Jun; Al-Rasheid Khaled, A. S.; Song, Weibo

    2010-05-01

    To separate and redefine the ambiguous Holosticha-complex, a confusing group of hypotrichous ciliates, six strains belonging to five morphospecies of three genera, Holosticha heterofoissneri, Anteholosticha sp. pop1, Anteholosticha sp. pop2, A. manca, A. gracilis and Nothoholosticha fasciola, were analyzed using 12 restriction enzymes on the basis of amplified ribosomal DNA restriction analysis. Nine of the 12 enzymes could digest the DNA products, four ( Hinf I, Hind III, Msp I, Taq I) yielded species-specific restriction patterns, and Hind III and Taq I produced different patterns for two Anteholosticha sp. populations. Distinctly different restriction digestion haplotypes and similarity indices can be used to separate the species. The secondary structures of the five species were predicted based on the ITS2 transcripts and there were several minor differences among species, while two Anteholosticha sp. populations were identical. In addition, phylogenies based on the SSrRNA gene sequences were reconstructed using multiple algorithms, which grouped them generally into four clades, and exhibited that the genus Anteholosticha should be a convergent assemblage. The fact that Holosticha species clustered with the oligotrichs and choreotrichs, though with very low support values, indicated that the topology may be very divergent and unreliable when the number of sequence data used in the analyses is too low.

  4. On Measuring Process Model Similarity Based on High-Level Change Operations

    NARCIS (Netherlands)

    Li, C.; Reichert, M.U.; Wombacher, A.

    2008-01-01

    For various applications there is the need to compare the similarity between two process models. For example, given the as-is and to-be models of a particular business process, we would like to know how much they differ from each other and how we can efficiently transform the as-is to the to-be mode

  5. On Measuring Process Model Similarity based on High-level Change Operations

    NARCIS (Netherlands)

    Li, C.; Reichert, M.U.; Wombacher, A.

    2007-01-01

    For various applications there is the need to compare the similarity between two process models. For example, given the as-is and to-be models of a particular business process, we would like to know how much they differ from each other and how we can efficiently transform the as-is to the to-be mode

  6. Plasmodium falciparum antigenic variation. Mapping mosaic var gene sequences onto a network of shared, highly polymorphic sequence blocks.

    Science.gov (United States)

    Bull, Peter C; Buckee, Caroline O; Kyes, Sue; Kortok, Moses M; Thathy, Vandana; Guyah, Bernard; Stoute, José A; Newbold, Chris I; Marsh, Kevin

    2008-06-01

    Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) is a potentially important family of immune targets, encoded by an extremely diverse gene family called var. Understanding of the genetic organization of var genes is hampered by sequence mosaicism that results from a long history of non-homologous recombination. Here we have used software designed to analyse social networks to visualize the relationships between large collections of short var sequences tags sampled from clinical parasite isolates. In this approach, two sequences are connected if they share one or more highly polymorphic sequence blocks. The results show that the majority of analysed sequences including several var-like sequences from the chimpanzee parasite Plasmodium reichenowi can be either directly or indirectly linked together in a single unbroken network. However, the network is highly structured and contains putative subgroups of recombining sequences. The major subgroup contains the previously described group A var genes, previously proposed to be genetically distinct. Another subgroup contains sequences found to be associated with rosetting, a parasite virulence phenotype. The mosaic structure of the sequences and their division into subgroups may reflect the conflicting problems of maximizing antigenic diversity and minimizing epitope sharing between variants while maintaining their host cell binding functions.

  7. High throughput sequencing reveals a novel fabavirus infecting sweet cherry.

    Science.gov (United States)

    Villamor, D E V; Pillai, S S; Eastwell, K C

    2017-03-01

    The genus Fabavirus currently consists of five species represented by viruses that infect a wide range of hosts but none reported from temperate climate fruit trees. A virus with genomic features resembling fabaviruses (tentatively named Prunus virus F, PrVF) was revealed by high throughput sequencing of extracts from a sweet cherry tree (Prunus avium). PrVF was subsequently shown to be graft transmissible and further identified in three other non-symptomatic Prunus spp. from different geographical locations. Two genetic variants of RNA1 and RNA2 coexisted in the same samples. RNA1 consisted of 6,165 and 6,163 nucleotides, and RNA2 consisted of 3,622 and 3,468 nucleotides.

  8. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts

    Directory of Open Access Journals (Sweden)

    Ouyang Shu

    2005-09-01

    Full Text Available Abstract Background The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. Results All available ESTs and Expressed Transcripts (ETs, 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana, were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. Conclusion Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.

  9. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants...... insights is far from trivial. A key challenge is that these methods cannot read the input sequences in their entirety. Due to technological constraints, they instead provide the sequences of very many fragments of the input molecules. Furthermore, not all nucleotides in these fragments are measured...... correctly and the final output of a typical experiment thus consists of hundreds of millions of error-containing sequence fragments. This thesis concerns the development of methods for transforming such a raw sequencing signal into a simpler representation from which biological inferences can then be made...

  10. Similarities between Copper and Plutonium containing 'high T {sub c}' superconductors

    Energy Technology Data Exchange (ETDEWEB)

    Wachter, P. [Laboratorium fuer Festkoerperphysik, ETH Zuerich, 8093 Zuerich (Switzerland)]. E-mail: wachter@solid.phys.ethz.ch

    2007-09-13

    PuCoGa{sub 5} with 18.5 K has an extremely high T {sub c} for superconductivity compared with other actinide materials having T {sub c}s around 2-3 K. It appears to be a 'high T {sub c} superconductor' in the field of actinides. After nearly 20 years of research in high T {sub c} superconductors only Cu containing materials have T {sub c}s above about 30 K (exception MgB{sub 2}). BCS theory cannot explain such high transition temperatures, thus other or additional coupling mechanisms, like magnetic exchange are necessary. Mixed valence, spin holes in an antiferromagnetic lattice, small energy difference between the various valences and two-dimensionality are common features of Cu and Pu containing superconductors. It can be shown in this paper that the mechanism for superconductivity is the same for Cu and Pu containing materials.

  11. Similarities between Cu and Pu containing 'high T {sub c}' superconductors

    Energy Technology Data Exchange (ETDEWEB)

    Wachter, P. [Laboratorium fuer Festkoerperphysik, ETH Zuerich, 8093 Zurich (Switzerland)]. E-mail: wachter@solid.phys.ethz.ch

    2007-03-15

    PuCoGa{sub 5} has with 18.5 K an extremely high T {sub c} for superconductivity compared with other actinide materials having T{sub c}s around 2-3 K. It appears to be a 'high T {sub c} superconductor' in the field of actinides. After nearly 20 years of research in high T{sub c} superconductors, only Cu containing materials have T{sub c}s above about 30 K (exception MgB{sub 2}). BCS theory cannot explain such high transition temperatures, thus other or additional coupling mechanisms like magnetic exchange are necessary. Mixed valence, spin holes in an antiferromagnetic lattice, small energy difference between the various valences and two-dimensionality are common features of Cu and Pu containing superconductors. It can be shown in this paper that the mechanism for superconductivity is the same for Cu and Pu containing materials.

  12. Gene network activity in cultivated primary hepatocytes is highly similar to diseased mammalian liver tissue.

    Science.gov (United States)

    Godoy, Patricio; Widera, Agata; Schmidt-Heck, Wolfgang; Campos, Gisela; Meyer, Christoph; Cadenas, Cristina; Reif, Raymond; Stöber, Regina; Hammad, Seddik; Pütter, Larissa; Gianmoena, Kathrin; Marchan, Rosemarie; Ghallab, Ahmed; Edlund, Karolina; Nüssler, Andreas; Thasler, Wolfgang E; Damm, Georg; Seehofer, Daniel; Weiss, Thomas S; Dirsch, Olaf; Dahmen, Uta; Gebhardt, Rolf; Chaudhari, Umesh; Meganathan, Kesavan; Sachinidis, Agapios; Kelm, Jens; Hofmann, Ute; Zahedi, René P; Guthke, Reinhard; Blüthgen, Nils; Dooley, Steven; Hengstler, Jan G

    2016-10-01

    It is well known that isolation and cultivation of primary hepatocytes cause major gene expression alterations. In the present genome-wide, time-resolved study of cultivated human and mouse hepatocytes, we made the observation that expression changes in culture strongly resemble alterations in liver diseases. Hepatocytes of both species were cultivated in collagen sandwich and in monolayer conditions. Genome-wide data were also obtained from human NAFLD, cirrhosis, HCC and hepatitis B virus-infected tissue as well as mouse livers after partial hepatectomy, CCl4 intoxication, obesity, HCC and LPS. A strong similarity between cultivation and disease-induced expression alterations was observed. For example, expression changes in hepatocytes induced by 1-day cultivation and 1-day CCl4 exposure in vivo correlated with R = 0.615 (p < 0.001). Interspecies comparison identified predominantly similar responses in human and mouse hepatocytes but also a set of genes that responded differently. Unsupervised clustering of altered genes identified three main clusters: (1) downregulated genes corresponding to mature liver functions, (2) upregulation of an inflammation/RNA processing cluster and (3) upregulated migration/cell cycle-associated genes. Gene regulatory network analysis highlights overrepresented and deregulated HNF4 and CAR (Cluster 1), Krüppel-like factors MafF and ELK1 (Cluster 2) as well as ETF (Cluster 3) among the interspecies conserved key regulators of expression changes. Interventions ameliorating but not abrogating cultivation-induced responses include removal of non-parenchymal cells, generation of the hepatocytes' own matrix in spheroids, supplementation with bile salts and siRNA-mediated suppression of key transcription factors. In conclusion, this study shows that gene regulatory network alterations of cultivated hepatocytes resemble those of inflammatory liver diseases and should therefore be considered and exploited as disease models.

  13. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  14. Self-similar approach to the explosion of droplets by a high energy laser beam

    Energy Technology Data Exchange (ETDEWEB)

    Chitanvis, S.M.

    1987-09-25

    We have constructed a model in which a small droplet is exploded by the absorption of energy from a high energy laser beam. The beam flux is so high that we assume the formation of a plasma. We have a single-fluid model of a plasma droplet interacting with laser radiation. Selfsimilarity is invoked to reduce the spherically symmetric problem involving hydrodynamics and Maxwell's equations to quadrature. We show analytically that our model reproduces in a qualitative manner certain features observed experimentally by Eickmans et al.

  15. Sample-Align-D: A High Performance Multiple Sequence Alignment System using Phylogenetic Sampling and Domain Decomposition

    CERN Document Server

    Saeed, Fahad

    2009-01-01

    Multiple Sequence Alignment (MSA) is one of the most computationally intensive tasks in Computational Biology. Existing best known solutions for multiple sequence alignment take several hours (in some cases days) of computation time to align, for example, 2000 homologous sequences of average length 300. Inspired by the Sample Sort approach in parallel processing, in this paper we propose a highly scalable multiprocessor solution for the MSA problem in phylogenetically diverse sequences. Our method employs an intelligent scheme to partition the set of sequences into smaller subsets using kmer count based similarity index, referred to as k-mer rank. Each subset is then independently aligned in parallel using any sequential approach. Further fine tuning of the local alignments is achieved using constraints derived from a global ancestor of the entire set. The proposed Sample-Align-D Algorithm has been implemented on a cluster of workstations using MPI message passing library. The accuracy of the proposed solutio...

  16. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...

  17. Self-similar decay of high Reynolds number Taylor-Couette turbulence

    NARCIS (Netherlands)

    Verschoof, R.A.; Huisman, S.G.; Veen, van der R.C.A.; Sun, C.; Lohse, D.

    2016-01-01

    We study the decay of high-Reynolds-number Taylor-Couette turbulence, i.e., the turbulent flow between two coaxial rotating cylinders. To do so, the rotation of the inner cylinder (Re i =2×10 6 , the outer cylinder is at rest) is stopped within 12 s, thus fully removing the energy input to the syst

  18. Similarities between Students Receiving Dress Code Violations and Discipline Referrals at Newport Junior High School

    Science.gov (United States)

    Nicholson, Nikki

    2007-01-01

    Background: Looking at dress code violations and demographics surrounding kids breaking the rules. Purpose: To see if there is a connection between dress code violations and discipline referrals. Setting: Jr. High School; Study Sample: Students with dress code violations for one week; Intervention: N/A; Research Design: Correlational; and Control…

  19. Earliest Memories and Recent Memories of Highly Salient Events--Are They Similar?

    Science.gov (United States)

    Peterson, Carole; Fowler, Tania; Brandeau, Katherine M.

    2015-01-01

    Four- to 11-year-old children were interviewed about 2 different sorts of memories in the same home visit: recent memories of highly salient and stressful events--namely, injuries serious enough to require hospital emergency room treatment--and their earliest memories. Injury memories were scored for amount of unique information, completeness…

  20. Molecular characterization of recombinant T1, a non-allergenic periwinkle (Catharanthus roseus) protein, with sequence similarity to the Bet v 1 plant allergen family.

    Science.gov (United States)

    Laffer, Sylvia; Hamdi, Said; Lupinek, Christian; Sperr, Wolfgang R; Valent, Peter; Verdino, Petra; Keller, Walter; Grote, Monika; Hoffmann-Sommergruber, Karin; Scheiner, Otto; Kraft, Dietrich; Rideau, Marc; Valenta, Rudolf

    2003-07-01

    More than 25% of the population suffer from Type I allergy, an IgE-mediated hypersensitivity disease. Allergens with homology to the major birch ( Betula verrucosa ) pollen allergen, Bet v 1, belong to the most potent elicitors of IgE-mediated allergies. T1, a cytokinin-inducible cytoplasmic periwinkle ( Catharanthus roseus ) protein, with significant sequence similarity to members of the Bet v 1 plant allergen family, was expressed in Escherichia coli. Recombinant T1 (rT1) did not react with IgE antibodies from allergic patients, and failed to induce basophil histamine release and immediate-type skin reactions in Bet v 1-allergic patients. Antibodies raised against purified rT1 could be used for in situ localization of natural T1 by immunogold electron microscopy, but did not cross-react with most of the Bet v 1-related allergens. CD analysis showed significant differences regarding secondary structure and thermal denaturation behaviour between rT1 and recombinant Bet v 1, suggesting that these structural differences are responsible for the different allergenicity of the proteins. T1 represents a non-allergenic member of the Bet v 1 family that may be used to study structural requirements of allergenicity and to engineer hypo-allergenic plants by replacing Bet v 1-related allergens for primary prevention of allergy.

  1. Recent research on the high-probability instructional sequence: A brief review.

    Science.gov (United States)

    Lipschultz, Joshua; Wilder, David A

    2017-04-01

    The high-probability (high-p) instructional sequence consists of the delivery of a series of high-probability instructions immediately before delivery of a low-probability or target instruction. It is commonly used to increase compliance in a variety of populations. Recent research has described variations of the high-p instructional sequence and examined the conditions under which the sequence is most effective. This manuscript reviews the most recent research on the sequence and identifies directions for future research. Recommendations for practitioners regarding the use of the high-p instructional sequence are also provided. © 2017 Society for the Experimental Analysis of Behavior.

  2. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Directory of Open Access Journals (Sweden)

    Alexander T Dilthey

    2016-10-01

    Full Text Available Genetic variation at the Human Leucocyte Antigen (HLA genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG framework. First, we construct a PRG for 46 (mostly HLA genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1 and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data. Of 158 alleles tested, we correctly infer 157 alleles (99.4%. We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample remain a

  3. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Science.gov (United States)

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  4. Analysis of 4,664 high-quality sequence-finished poplar full-length

    Energy Technology Data Exchange (ETDEWEB)

    Ralph, S. [University of British Columbia, Vancouver; Gunter, Lee E [ORNL; Tuskan, Gerald A [ORNL; Douglas, Carl [University of British Columbia, Vancouver; Holt, Robert A. [Genome Sciences Centre, Vancouver, BC, Canada; Jones, Steven [Genome Sciences Centre, Vancouver, BC, Canada; Marra, Marco [Genome Sciences Centre, Vancouver, BC, Canada; Bohlmann, J. [University of British Columbia, Vancouver

    2008-01-01

    The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions. As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa x P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in

  5. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  6. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Directory of Open Access Journals (Sweden)

    Kathy N Lam

    Full Text Available High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  7. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Science.gov (United States)

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  8. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    2014-01-01

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit…

  9. High-throughput sequencing-based analysis of endogenetic fungal communities inhabiting the Chinese Cordyceps reveals unexpectedly high fungal diversity.

    Science.gov (United States)

    Xia, Fei; Chen, Xin; Guo, Meng-Yuan; Bai, Xiao-Hui; Liu, Yan; Shen, Guang-Rong; Li, Yu-Ling; Lin, Juan; Zhou, Xuan-Wei

    2016-09-14

    Chinese Cordyceps, known in Chinese as "DongChong XiaCao", is a parasitic complex of a fungus (Ophiocordyceps sinensis) and a caterpillar. The current study explored the endogenetic fungal communities inhabiting Chinese Cordyceps. Samples were collected from five different geographical regions of Qinghai and Tibet, and the nuclear ribosomal internal transcribed spacer-1 sequences from each sample were obtained using Illumina high-throughput sequencing. The results showed that Ascomycota was the dominant fungal phylum in Chinese Cordyceps and its soil microhabitat from different sampling regions. Among the Ascomycota, 65 genera were identified, and the abundant operational taxonomic units showed the strongest sequence similarity to Ophiocordyceps, Verticillium, Pseudallescheria, Candida and Ilyonectria Not surprisingly, the genus Ophiocordyceps was the largest among the fungal communities identified in the fruiting bodies and external mycelial cortices of Chinese Cordyceps. In addition, fungal communities in the soil microhabitats were clustered separately from the external mycelial cortices and fruiting bodies of Chinese Cordyceps from different sampling regions. There was no significant structural difference in the fungal communities between the fruiting bodies and external mycelial cortices of Chinese Cordyceps. This study revealed an unexpectedly high diversity of fungal communities inhabiting the Chinese Cordyceps and its microhabitats.

  10. Vocal neighbour-mate discrimination in female great tits despite high song similarity

    DEFF Research Database (Denmark)

    Blumenrath, Sandra H.; Dabelsteen, Torben; Pedersen, Simon Boel

    2007-01-01

    Discrimination between conspecifics is important in mediating social interactions between several individuals in a network environment. In great tits, Parus major, females readily distinguish between the songs of their mate and those of a stranger. The high degree of song sharing among neighbouring...... males, however, raises the question of whether females are also able to perceive differences between songs shared by their mate and a neighbour. The great tit is a socially monogamous, hole-nesting species with biparental care. Pair bond maintenance and coordination of the pair's reproductive efforts...... are important, and the female's ability to recognize her mate's song should therefore be adaptive. In a neighbour-mate discrimination playback experiment, we presented 13 incubating great tit females situated inside nestboxes with a song of their mate and the same song type from a neighbour. Each female...

  11. Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

    Science.gov (United States)

    Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...

  12. Family with sequence similarity 5, member C (FAM5C increases leukocyte adhesion molecules in vascular endothelial cells: implication in vascular inflammation.

    Directory of Open Access Journals (Sweden)

    Junya Sato

    Full Text Available Identification of the regulators of vascular inflammation is important if we are to understand the molecular mechanisms leading to atherosclerosis and consequent ischemic heart disease, including acute myocardial infarction. Gene polymorphisms in family with sequence similarity 5, member C (FAM5C are associated with an increased risk of acute myocardial infarction, but little is known about the function of this gene product in blood vessels. Here, we report that the regulation of the expression and function of FAM5C in endothelial cells. We show here that FAM5C is expressed in endothelial cells in vitro and in vivo. Immunofluorescence microscopy showed localization of FAM5C in the Golgi in cultured human endothelial cells. Immunohistochemistry on serial sections of human coronary artery showed that FAM5C-positive endothelium expressed intercellular adhesion molecule-1 (ICAM-1 or vascular cell adhesion molecule-1 (VCAM-1. In cultured human endothelial cells, the overexpression of FAM5C increased the reactive oxygen species (ROS production, nuclear factor-κB (NF-κB activity and the expression of ICAM-1, VCAM-1 and E-selectin mRNAs, resulting in enhanced monocyte adhesion. FAM5C was upregulated in response to inflammatory stimuli, such as TNF-α, in an NF-κB- and JNK-dependent manner. Knockdown of FAM5C by small interfering RNA inhibited the increase in the TNF-α-induced production of ROS, NF-κB activity and expression of these leukocyte adhesion molecule mRNAs, resulting in reduced monocyte adhesion. These results suggest that in endothelial cells, when FAM5C is upregulated in response to inflammatory stimuli, it increases the expression of leukocyte adhesion molecules by increasing ROS production and NF-κB activity.

  13. Development of a multilocus sequence typing tool for high-resolution genotyping of Enterocytozoon bieneusi.

    Science.gov (United States)

    Feng, Yaoyu; Li, Na; Dearen, Theresa; Lobo, Maria L; Matos, Olga; Cama, Vitaliano; Xiao, Lihua

    2011-07-01

    Thus far, genotyping of Enterocytozoon bieneusi has been based solely on DNA sequence analysis of the internal transcribed spacer (ITS) of the rRNA gene. Both host-adapted and zoonotic (human-pathogenic) genotypes of E. bieneusi have been identified. In this study, we searched for microsatellite and minisatellite sequences in the whole-genome sequence database of E. bieneusi isolate H348. Seven potential targets (MS1 to MS7) were identified. Testing of the seven targets by PCR using two human-pathogenic E. bieneusi genotypes (A and Peru10) led to the selection of four targets (MS1, MS3, MS4, and MS7). Further analysis of the four loci with an additional 24 specimens of both host-adapted and zoonotic E. bieneusi genotypes indicated that most host-adapted genotypes were not amplified by PCR targeting these loci. In contrast, 10 or 11 of the 13 specimens of the zoonotic genotypes were amplified by PCR at each locus. Altogether, 12, 8, 7, and 11 genotypes of were identified at MS1, MS3, MS4, and MS7, respectively. Phylogenetic analysis of the nucleotide sequences obtained produced a genetic relationship that was similar to the one at the ITS locus, with the formation of a large group of zoonotic genotypes that included most E. bieneusi genotypes in humans. Thus, a multilocus sequence typing tool was developed for high-resolution genotyping of E. bieneusi. Data obtained in the study should also have implications for understanding the taxonomy of Enterocytozoon spp., the public health significance of E. bieneusi in animals, and the sources of human E. bieneusi infections.

  14. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  15. High-throughput, high-fidelity HLA genotyping with deep sequencing.

    Science.gov (United States)

    Wang, Chunlin; Krishnakumar, Sujatha; Wilhelmy, Julie; Babrzadeh, Farbod; Stepanyan, Lilit; Su, Laura F; Levinson, Douglas; Fernandez-Viña, Marcelo A; Davis, Ronald W; Davis, Mark M; Mindrinos, Michael

    2012-05-29

    Human leukocyte antigen (HLA) genes are the most polymorphic in the human genome. They play a pivotal role in the immune response and have been implicated in numerous human pathologies, especially autoimmunity and infectious diseases. Despite their importance, however, they are rarely characterized comprehensively because of the prohibitive cost of standard technologies and the technical challenges of accurately discriminating between these highly related genes and their many allelles. Here we demonstrate a high-resolution, and cost-effective methodology to type HLA genes by sequencing, which combines the advantage of long-range amplification, the power of high-throughput sequencing platforms, and a unique genotyping algorithm. We calibrated our method for HLA-A, -B, -C, and -DRB1 genes with both reference cell lines and clinical samples and identified several previously undescribed alleles with mismatches, insertions, and deletions. We have further demonstrated the utility of this method in a clinical setting by typing five clinical samples in an Illumina MiSeq instrument with a 5-d turnaround. Overall, this technology has the capacity to deliver low-cost, high-throughput, and accurate HLA typing by multiplexing thousands of samples in a single sequencing run, which will enable comprehensive disease-association studies with large cohorts. Furthermore, this approach can also be extended to include other polymorphic genes.

  16. Differences and similarities in double special educational needs: high abilities/giftedness x Asperger’s Syndrome

    Directory of Open Access Journals (Sweden)

    Nara Joyce Wellausen Vieira

    2012-08-01

    Full Text Available The study was developed from a literature search in books, articles and theses that have been published since the year 2000 on the theme High Abilities / Giftedness and Asperger’s Syndrome. The objectives of this research were to conduct a search on publications from 2000 to 2011, about the common and different features to the person with Asperger syndrome and high ability gifted, and also relate the number of publications found in Education and Special Education. At theoretical we present the conception of High Abilities / Giftedness of Renzulli (2004 and Gardner (2000 and in the conception of Asperger Syndrome, Mello (2007 and Klin (2006. When analyzing the data, were perceived similarities and differences between the behavioral characteristics of individuals with High Abilities / Giftedness and those with Asperger’s Syndrome. It’s possible point out that there is much evidence that separate these two special educational needs and few similarities between them. But do not neglect that there may be a dual disability between these two particular special educational needs, because there are still few studies that verify theoretically the differences and similarities of these subjects, much less those that investigate these similarities and distinctions in the subjects themselves.

  17. Classification of highly similar crude oils using data sets from comprehensive two-dimensional gas chromatography and multivariate techniques

    NARCIS (Netherlands)

    Mispelaar, V.G. van; Smilde, A.K.; Noord, O.E. de; Blomberg, J.; Schoenmakers, P.J.

    2005-01-01

    Comprehensive two-dimensional gas chromatography (GC × GC) has proven to be an extremely powerful separation technique for the analysis of complex volatile mixtures. This separation power can be used to discriminate between highly similar samples. In this article we will describe the use of GC × GC

  18. Similarity Solution for High Weissenberg Number Flow of Upper-Convected Maxwell Fluid on a Linearly Stretching Sheet

    OpenAIRE

    Mohamadali, Meysam; Ashrafi, Nariman

    2016-01-01

    High Weissenberg boundary layer flow of viscoelastic fluids on a stretching surface has been studied. The flow is considered to be steady, low inertial, and two-dimensional. Upon proper scaling and by means of an exact similarity transformation, the nonlinear momentum and constitutive equations of each layer transform into the respective system of highly nonlinear and coupled ordinary differential equations. Numerical solutions to the resulting boundary value problem are obtained using an eff...

  19. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    Science.gov (United States)

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  20. Human Treponema pallidum 11q/j isolate belongs to subsp. endemicum but contains two loci with a sequence in TP0548 and TP0488 similar to subsp. pertenue and subsp. pallidum, respectively.

    Directory of Open Access Journals (Sweden)

    Lenka Mikalová

    2017-03-01

    Full Text Available Treponema pallidum subsp. endemicum (TEN is the causative agent of endemic syphilis (bejel. An unusual human TEN 11q/j isolate was obtained from a syphilis-like primary genital lesion from a patient that returned to France from Pakistan.The TEN 11q/j isolate was characterized using nested PCR followed by Sanger sequencing and/or direct Illumina sequencing. Altogether, 44 chromosomal regions were analyzed. Overall, the 11q/j isolate clustered with TEN strains Bosnia A and Iraq B as expected from previous TEN classification of the 11q/j isolate. However, the 11q/j sequence in a 505 bp-long region at the TP0488 locus was similar to Treponema pallidum subsp. pallidum (TPA strains, but not to TEN Bosnia A and Iraq B sequences, suggesting a recombination event at this locus. Similarly, the 11q/j sequence in a 613 bp-long region at the TP0548 locus was similar to Treponema pallidum subsp. pertenue (TPE strains, but not to TEN sequences.A detailed analysis of two recombinant loci found in the 11q/j clinical isolate revealed that the recombination event occurred just once, in the TP0488, with the donor sequence originating from a TPA strain. Since TEN Bosnia A and Iraq B were found to contain TPA-like sequences at the TP0548 locus, the recombination at TP0548 took place in a treponeme that was an ancestor to both TEN Bosnia A and Iraq B. The sequence of 11q/j isolate in TP0548 represents an ancestral TEN sequence that is similar to yaws-causing treponemes. In addition to the importance of the 11q/j isolate for reconstruction of the TEN phylogeny, this case emphasizes the possible role of TEN strains in development of syphilis-like lesions.

  1. Human Treponema pallidum 11q/j isolate belongs to subsp. endemicum but contains two loci with a sequence in TP0548 and TP0488 similar to subsp. pertenue and subsp. pallidum, respectively.

    Science.gov (United States)

    Mikalová, Lenka; Strouhal, Michal; Oppelt, Jan; Grange, Philippe Alain; Janier, Michel; Benhaddou, Nadjet; Dupin, Nicolas; Šmajs, David

    2017-03-01

    Treponema pallidum subsp. endemicum (TEN) is the causative agent of endemic syphilis (bejel). An unusual human TEN 11q/j isolate was obtained from a syphilis-like primary genital lesion from a patient that returned to France from Pakistan. The TEN 11q/j isolate was characterized using nested PCR followed by Sanger sequencing and/or direct Illumina sequencing. Altogether, 44 chromosomal regions were analyzed. Overall, the 11q/j isolate clustered with TEN strains Bosnia A and Iraq B as expected from previous TEN classification of the 11q/j isolate. However, the 11q/j sequence in a 505 bp-long region at the TP0488 locus was similar to Treponema pallidum subsp. pallidum (TPA) strains, but not to TEN Bosnia A and Iraq B sequences, suggesting a recombination event at this locus. Similarly, the 11q/j sequence in a 613 bp-long region at the TP0548 locus was similar to Treponema pallidum subsp. pertenue (TPE) strains, but not to TEN sequences. A detailed analysis of two recombinant loci found in the 11q/j clinical isolate revealed that the recombination event occurred just once, in the TP0488, with the donor sequence originating from a TPA strain. Since TEN Bosnia A and Iraq B were found to contain TPA-like sequences at the TP0548 locus, the recombination at TP0548 took place in a treponeme that was an ancestor to both TEN Bosnia A and Iraq B. The sequence of 11q/j isolate in TP0548 represents an ancestral TEN sequence that is similar to yaws-causing treponemes. In addition to the importance of the 11q/j isolate for reconstruction of the TEN phylogeny, this case emphasizes the possible role of TEN strains in development of syphilis-like lesions.

  2. High Depth, Whole-Genome Sequencing of Cholera Isolates from Haiti and the Dominican Republic

    Science.gov (United States)

    2012-09-11

    cholerae [21] and is a homolog of TagA, which has mucinase function [22]. Sequencing of additional isolates from this outbreak over time is likely to...eliminate paralogs , we required the next best hit to be less than 0.8 times as similar as the best hit. We constructed a multiple sequence alignment for

  3. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    correctly and the final output of a typical experiment thus consists of hundreds of millions of error-containing sequence fragments. This thesis concerns the development of methods for transforming such a raw sequencing signal into a simpler representation from which biological inferences can then be made....... Importantly, the fact that the fragments are short and contain errors implies that there may be significant uncertainty associated with the signal. By using probabilistic models, we are able to quantify this uncertainty and propagate it to downstream analyses. The first chapter describes a new method...

  4. Sequence-Specific Covalent Capture Coupled with High-Contrast Nanopore Detection of a Disease-Derived Nucleic Acid Sequence.

    Science.gov (United States)

    Nejad, Maryam Imani; Shi, Ruicheng; Zhang, Xinyue; Gu, Li-Qun; Gates, Kent S

    2017-07-18

    Hybridization-based methods for the detection of nucleic acid sequences are important in research and medicine. Short probes provide sequence specificity, but do not always provide a durable signal. Sequence-specific covalent crosslink formation can anchor probes to target DNA and might also provide an additional layer of target selectivity. Here, we developed a new crosslinking reaction for the covalent capture of specific nucleic acid sequences. This process involved reaction of an abasic (Ap) site in a probe strand with an adenine residue in the target strand and was used for the detection of a disease-relevant T→A mutation at position 1799 of the human BRAF kinase gene sequence. Ap-containing probes were easily prepared and displayed excellent specificity for the mutant sequence under isothermal assay conditions. It was further shown that nanopore technology provides a high contrast-in essence, digital-signal that enables sensitive, single-molecule sensing of the cross-linked duplexes. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    Directory of Open Access Journals (Sweden)

    Seung Hak Yang

    2015-09-01

    Full Text Available The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site.

  6. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    Science.gov (United States)

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  7. High-throughput sequencing reveals an altered T cell repertoire in X-linked agammaglobulinemia.

    Science.gov (United States)

    Ramesh, Manish; Simchoni, Noa; Hamm, David; Cunningham-Rundles, Charlotte

    2015-12-01

    To examine the T cell receptor structure in the absence of B cells, the TCR β CDR3 was sequenced from DNA of 15 X-linked agammaglobulinemia (XLA) subjects and 18 male controls, using the Illumina HiSeq platform and the ImmunoSEQ analyzer. V gene usage and the V-J combinations, derived from both productive and non-productive sequences, were significantly different between XLA samples and controls. Although the CDR3 length was similar for XLA and control samples, the CDR3 region of the XLA T cell receptor contained significantly fewer deletions and insertions in V, D, and J gene segments, differences intrinsic to the V(D)J recombination process and not due to peripheral T cell selection. XLA CDR3s demonstrated fewer charged amino acid residues, more sharing of CDR3 sequences, and almost completely lacked a population of highly modified Vβ gene segments found in control DNA, suggesting both a skewed and contracted T cell repertoire in XLA.

  8. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    Science.gov (United States)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  9. High sequence conservation among cucumber mosaic virus isolates from Lily

    NARCIS (Netherlands)

    Chen, Y.K.; Derks, A.F.L.M.; Langeveld, S.; Goldbach, R.; Prins, M.

    2001-01-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV i

  10. Putting Physics First: Three Case Studies of High School Science Department and Course Sequence Reorganization

    Science.gov (United States)

    Larkin, Douglas B.

    2016-01-01

    This article examines the process of shifting to a "Physics First" sequence in science course offerings in three school districts in the United States. This curricular sequence reverses the more common U.S. high school sequence of biology/chemistry/physics, and has gained substantial support in the physics education community over the…

  11. Zero-field nuclear magnetic resonance in high field by modulated rf sequences.

    Science.gov (United States)

    Nishiyama, Yusuke; Yamazaki, Toshio

    2007-04-07

    The authors propose a novel approach to design and evaluate sequences for zero-field NMR spectra in high field (ZFHF) by using amplitude and phase modulated rf sequences. ZFHF provide sharp peaks for the dipolar interaction between two nuclear spins even if the orientation of the molecules is distributed. The internuclear distance r can be directly obtained from the peak position which is proportional to r-3. Numerous ZFHF sequences are obtained. A sequence is selected from them by the systematic evaluation of the sequences. The new ZFHF sequence is less affected by chemical shift anisotropy (CSA) than the previous sequences; the sequence can be used for systems with large CSA such as a dipolar coupled 13C-pair system under realistically high field. 13C ZFHF spectra of 13C2 diammonium succinate and 13C2 diammonium oxalate were observed under the 9.4 T field.

  12. Towards self-similar propagation in a dispersion tailored and highly nonlinear segmented bandgap fiber at 2.8 micron

    CERN Document Server

    Biswas, Piyali; Biswas, Abhijit; Pal, Bishnu P

    2016-01-01

    We numerically demonstrate self-similar propagation of parabolic optical pulses through a highly nonlinear and passive specialty photonic bandgap fiber at 2.8 micron. In this context, we have proposed a scheme endowed with a rapidly varying, but of nearly-mean-zero longitudinal dispersion and modulated nonlinear profile in order to achieve self-similarity of the formed parabolic pulse propagating over longer distances. To implement the proposed scheme, we have designed a segmented bandgap fiber with suitably tapered counterparts to realize such customized dispersion with chalchogenide glass materials. A self-similar parabolic pulse with full-width-at-half-maxima of 4.12 ps and energy of ~ 39 pJ as been achieved at the output. Along with a linear chirp spanning over the entire pulse duration, 3dB spectral broadening of about 38 nm at the output has been reported.

  13. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    Science.gov (United States)

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  14. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... and computational protocol for detecting the reverse transcription termination sites (RTTS-Seq). This protocol was subsequently applied to hydroxyl radical footprinting of three dimensional RNA structures to give a probing signal that correlates well with the RNA backbone solvent accessibility. Moreover, we applied...

  15. 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification.

    Science.gov (United States)

    Arif, Muhammad; Basalamah, Saleh

    2013-06-01

    In real life biomedical classification applications, it is difficult to visualize the feature space due to high dimensionality of the feature space. In this paper, we have proposed 3D similarity-dissimilarity plot to project the high dimensional space to a three dimensional space in which important information about the feature space can be extracted in the context of pattern classification. In this plot it is possible to visualize good data points (data points near to their own class as compared to other classes) and bad data points (data points far away from their own class) and outlier points (data points away from both their own class and other classes). Hence separation of classes can easily be visualized. Density of the data points near each other can provide some useful information about the compactness of the clusters within certain class. Moreover, an index called percentage of data points above the similarity-dissimilarity line (PAS) is proposed which is the fraction of data points above the similarity-dissimilarity line. Several synthetic and real life biomedical datasets are used to show the effectiveness of the proposed 3D similarity-dissimilarity plot.

  16. Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire

    Directory of Open Access Journals (Sweden)

    Cheng Cheng

    2011-02-01

    Full Text Available Abstract Background Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessments of the errors formed during high throughput repertoire analyses are limited. Results We analyzed 3 monoclonal TCR from TCR transgenic, Rag-/- mice using Illumina® sequencing. A total of 27 sequencing reactions were performed for each TCR using a trifurcating design in which samples were divided into 3 at significant processing junctures. More than 20 million complementarity determining region (CDR 3 sequences were analyzed. Filtering for lower quality sequences diminished but did not eliminate sequence errors, which occurred within 1-6% of sequences. Erroneous sequences were pre-dominantly of correct length and contained single nucleotide substitutions. Rates of specific substitutions varied dramatically in a position-dependent manner. Four substitutions, all purine-pyrimidine transversions, predominated. Solid phase amplification and sequencing rather than liquid sample amplification and preparation appeared to be the primary sources of error. Analysis of polyclonal repertoires demonstrated the impact of error accumulation on data parameters. Conclusions Caution is needed in interpreting repertoire data due to potential contamination with mis-sequence reads. However, a high association of errors with phred score, high relatedness of erroneous sequences with the parental sequence, dominance of specific nt substitutions, and skewed ratio of forward to reverse reads among erroneous sequences indicate approaches to filter erroneous sequences from repertoire data sets.

  17. Using the synergism strategy for highly sensitive and specific electrochemical sensing of Streptococcus pneumoniae Lyt-1 gene sequence.

    Science.gov (United States)

    Li, Fengqin; Yu, Zhigang; Xu, Yanmei; Ma, Huiyuan; Zhang, Guiling; Song, Yongbin; Yan, Hong; He, Xunjun

    2015-07-30

    With the help of the interaction mode of capture probe-target-signal probe (CP-T-SP), an electrochemical sensing method based on the synergism strategy of dual-hybridized signaling probes modified with 6 MB (methylene blue), background suppression and large surface area Au electrode is developed for the detection of Streptococcus pneumoniae (S. pneumoniae) Lyt-1 gene sequence. The proposed sensor features a very low detection limit (LOD) of ∼0.5 fM for the target. This method also exhibits highly versatility and can apply to the construction of other sensors for the analysis of similar designated pathogenic bacteria gene sequence (PBGS).

  18. Manufacturing of High-Strength and High-Ductility Pearlitic Steel Wires Using Noncircular Drawing Sequence

    Energy Technology Data Exchange (ETDEWEB)

    Baek, Hyun Moo; Joo, Ho Seon; Im, Yong-Taek [KAIST, Daejeon (Korea, Republic of); Hwang, Sun Kwang [KITECH, Cheonan (Korea, Republic of); Son, Il-Heon; Bae, Chul Min [POSCO, Pohang (Korea, Republic of)

    2014-07-15

    In this study, a noncircular drawing (NCD) sequence for manufacturing high-strength and high-ductility pearlitic steel wires was investigated. Multipass NCD was conducted up to the 12th pass at room temperature with two processing routes (defined as the NCDA and NCDB), and compared with the wire drawing (WD). During the torsion test, delamination fracture in the drawn wire was observed in the 10th pass of the WD whereas it was not observed until the 12th pass of the NCDB. From X-ray diffraction, the circular texture component that increases the likelihood of delamination fracture of the drawn wire was rarely observed in the NCDB. Thus, the improved ability of the multipass NCDB to manufacture high-strength pearlitic steel wires with high torsional ductility compared to the WD (by reducing the likelihood of delamination fracture) was demonstrated.

  19. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  20. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers

    Science.gov (United States)

    Bessaud, Maël; Sadeuh-Mba, Serge A.; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  1. Two highly similar LAEDDTNAQKT and LTDKIGTEI epitopes in G glycoprotein may be useful for effective epitope based vaccine design against pathogenic Henipavirus.

    Science.gov (United States)

    Parvege, Md Masud; Rahman, Monzilur; Nibir, Yead Morshed; Hossain, Mohammad Shahnoor

    2016-04-01

    Nipah virus and Hendra virus, two members of the genus Henipavirus, are newly emerging zoonotic pathogens which cause acute respiratory illness and severe encephalitis in human. Lack of the effective antiviral therapy endorses the urgency for the development of vaccine against these deadly viruses. In this study, we employed various computational approaches to identify epitopes which has the potential for vaccine development. By analyzing the immune parameters of the conserved sequences of G glycoprotein using various databases and bioinformatics tools, we identified two potential epitopes which may be used as peptide vaccines. Using different B cell epitope prediction servers, four highly similar B cell epitopes were identified. Immunoinformatics analyses revealed that LAEDDTNAQKT is a highly flexible and accessible B-cell epitope to antibody. Highly similar putative CTL epitopes were analyzed for their binding with the HLA-C 12*03 molecule. Docking simulation assay revealed that LTDKIGTEI has significantly lower binding energy, which bolstered its potential as epitope-based vaccine design. Finally, cytotoxicity analysis has also justified their potential as promising epitope-based vaccine candidate. In sum, our computational analysis indicates that either LAEDDTNAQKT or LTDKIGTEI epitope holds a promise for the development of universal vaccine against all kinds of pathogenic Henipavirus. Further in vivo and in vitro studies are necessary to validate the obtained findings.

  2. Web Similarity

    NARCIS (Netherlands)

    Cohen, A.R.; Vitányi, P.M.B.

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale fr

  3. High-functioning autism patients share similar but more severe impairments in verbal theory of mind than schizophrenia patients.

    Science.gov (United States)

    Tin, L N W; Lui, S S Y; Ho, K K Y; Hung, K S Y; Wang, Y; Yeung, H K H; Wong, T Y; Lam, S M; Chan, R C K; Cheung, E F C

    2017-09-18

    Evidence suggests that autism and schizophrenia share similarities in genetic, neuropsychological and behavioural aspects. Although both disorders are associated with theory of mind (ToM) impairments, a few studies have directly compared ToM between autism patients and schizophrenia patients. This study aimed to investigate to what extent high-functioning autism patients and schizophrenia patients share and differ in ToM performance. Thirty high-functioning autism patients, 30 schizophrenia patients and 30 healthy individuals were recruited. Participants were matched in age, gender and estimated intelligence quotient. The verbal-based Faux Pas Task and the visual-based Yoni Task were utilised to examine first- and higher-order, affective and cognitive ToM. The task/item difficulty of two paradigms was examined using mixed model analyses of variance (ANOVAs). Multiple ANOVAs and mixed model ANOVAs were used to examine group differences in ToM. The Faux Pas Task was more difficult than the Yoni Task. High-functioning autism patients showed more severely impaired verbal-based ToM in the Faux Pas Task, but shared similar visual-based ToM impairments in the Yoni Task with schizophrenia patients. The findings that individuals with high-functioning autism shared similar but more severe impairments in verbal ToM than individuals with schizophrenia support the autism-schizophrenia continuum. The finding that verbal-based but not visual-based ToM was more impaired in high-functioning autism patients than schizophrenia patients could be attributable to the varied task/item difficulty between the two paradigms.

  4. High resolution MR angiography with rephasing and dephasing sequences for selective vascular imaging of arteries

    Energy Technology Data Exchange (ETDEWEB)

    Seiderer, M.; Laub, G.; Staebler, A.; Yousry, P.; Lauterjung, L.

    1988-03-01

    With rephasing and dephasing sequences the vascular system is imaged with high or low signal intensity whereas stationary tissue is imaged with identical signal intensity. With images recorded in systole and diastole followed by image subtraction separate imaging of arteries or veins without background superposition is possible. 13 patients with vascular lesions of the lower extremities and 7 volunteers were examined. Vascular stenosis, aneurysm, dilatation, occlusion and collateral vessels could be imaged similar to digital subtraction angiography. Vessels with a diameter down to 1 mm could be imaged. The large slice thickness up to 80 mm results in projection type images where the vascular tree is imaged over the whole field of view and without partial volume effects.

  5. High-resolution Imaging of PHIBSS z ˜ 2 Main-sequence Galaxies in CO J = 1 → 0

    Science.gov (United States)

    Bolatto, A. D.; Warren, S. R.; Leroy, A. K.; Tacconi, L. J.; Bouché, N.; Förster Schreiber, N. M.; Genzel, R.; Cooper, M. C.; Fisher, D. B.; Combes, F.; García-Burillo, S.; Burkert, A.; Bournaud, F.; Weiss, A.; Saintonge, A.; Wuyts, S.; Sternberg, A.

    2015-08-01

    We present Karl Jansky Very Large Array observations of the CO J=1-0 transition in a sample of four z˜ 2 main-sequence galaxies. These galaxies are in the blue sequence of star-forming galaxies at their redshift, and are part of the IRAM Plateau de Bure HIgh-z Blue Sequence Survey which imaged them in CO J=3-2. Two galaxies are imaged here at high signal-to-noise, allowing determinations of their disk sizes, line profiles, molecular surface densities, and excitation. Using these and published measurements, we show that the CO and optical disks have similar sizes in main-sequence galaxies, and in the galaxy where we can compare CO J=1-0 and J=3-2 sizes we find these are also very similar. Assuming a Galactic CO-to-H2 conversion, we measure surface densities of {{{Σ }}}{mol}˜ 1200 {M}⊙ pc-2 in projection and estimate {{{Σ }}}{mol}˜ 500-900 {M}⊙ pc-2 deprojected. Finally, our data yields velocity-integrated Rayleigh-Jeans brightness temperature line ratios r31 that are approximately at unity. In addition to the similar disk sizes, the very similar line profiles in J=1-0 and J=3-2 indicate that both transitions sample the same kinematics, implying that their emission is coextensive. We conclude that in these two main-sequence galaxies there is no evidence for significant excitation gradients or a large molecular reservoir that is diffuse or cold and not involved in active star formation. We suggest that r31 in very actively star-forming galaxies is likely an indicator of how well-mixed the star formation activity and the molecular reservoir are.

  6. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  7. Megraft: a software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets.

    Science.gov (United States)

    Bengtsson, Johan; Hartmann, Martin; Unterseher, Martin; Vaishampayan, Parag; Abarenkov, Kessy; Durso, Lisa; Bik, Elisabeth M; Garey, James R; Eriksson, K Martin; Nilsson, R Henrik

    2012-07-01

    Metagenomic libraries represent subsamples of the total DNA found at a study site and offer unprecedented opportunities to study ecological and functional aspects of microbial communities. To examine the depth of a community sequencing effort, rarefaction analysis of the ribosomal small subunit (SSU/16S/18S) gene in the metagenome is usually performed. The fragmentary, non-overlapping nature of SSU sequences in metagenomic libraries poses a problem for this analysis, however. We introduce a software package - Megraft - that grafts SSU fragments onto full-length SSU sequences, accounting for observed and unobserved variability, for accurate assessment of species richness and sequencing depth in metagenomics endeavors.

  8. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing.

    Science.gov (United States)

    Russo, Giancarlo; Patrignani, Andrea; Poveda, Lucy; Hoehn, Frederic; Scholtka, Bettina; Schlapbach, Ralph; Garvin, Alex M

    2015-12-01

    Colorectal cancer (CRC) represents one of the most prevalent and lethal malignant neoplasms and every individual of age 50 and above should undergo regular CRC screening. Currently, the most effective preventive screening procedure to detect adenomatous polyps, the precursors to CRC, is colonoscopy. Since every colorectal cancer starts as a polyp, detecting all polyps and removing them is crucial. By exactly doing that, colonoscopy reduces CRC incidence by 80%, however it is an invasive procedure that might have unpleasant and, in rare occasions, dangerous side effects. Despite numerous efforts over the past two decades, a non-invasive screening method for the general population with detection rates for adenomas and CRC similar to that of colonoscopy has not yet been established. Recent advances in next generation sequencing technologies have yet to be successfully applied to this problem, because the detection of rare mutations has been hindered by the systematic biases due to sequencing context and the base calling quality of NGS. We present the first study that applies the high read accuracy and depth of single molecule, real time, circular consensus sequencing (SMRT-CCS) to the detection of mutations in stool DNA in order to provide a non-invasive, sensitive and accurate test for CRC. In stool DNA isolated from patients diagnosed with adenocarcinoma, we are able to detect mutations at frequencies below 0.5% with no false positives. This approach establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and the early stage adenomas that lead to CRC.

  9. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    Science.gov (United States)

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  10. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

    Science.gov (United States)

    Gu, W; Crawford, E D; O'Donovan, B D; Wilson, M R; Chow, E D; Retallack, H; DeRisi, J L

    2016-03-04

    Next-generation sequencing has generated a need for a broadly applicable method to remove unwanted high-abundance species prior to sequencing. We introduce DASH (Depletion of Abundant Sequences by Hybridization). Sequencing libraries are 'DASHed' with recombinant Cas9 protein complexed with a library of guide RNAs targeting unwanted species for cleavage, thus preventing them from consuming sequencing space. We demonstrate a more than 99 % reduction of mitochondrial rRNA in HeLa cells, and enrichment of pathogen sequences in patient samples. We also demonstrate an application of DASH in cancer. This simple method can be adapted for any sample type and increases sequencing yield without additional cost.

  11. A SNP based high-density linkage map of Apis cerana reveals a high recombination rate similar to Apis mellifera.

    Directory of Open Access Journals (Sweden)

    Yuan Yuan Shi

    Full Text Available BACKGROUND: The Eastern honey bee, Apis cerana Fabricius, is distributed in southern and eastern Asia, from India and China to Korea and Japan and southeast to the Moluccas. This species is also widely kept for honey production besides Apis mellifera. Apis cerana is also a model organism for studying social behavior, caste determination, mating biology, sexual selection, and host-parasite interactions. Few resources are available for molecular research in this species, and a linkage map was never constructed. A linkage map is a prerequisite for quantitative trait loci mapping and for analyzing genome structure. We used the Chinese honey bee, Apis cerana cerana to construct the first linkage map in the Eastern honey bee. RESULTS: F2 workers (N = 103 were genotyped for 126,990 single nucleotide polymorphisms (SNPs. After filtering low quality and those not passing the Mendel test, we obtained 3,000 SNPs, 1,535 of these were informative and used to construct a linkage map. The preliminary map contains 19 linkage groups, we then mapped the 19 linkage groups to 16 chromosomes by comparing the markers to the genome of A. mellfiera. The final map contains 16 linkage groups with a total of 1,535 markers. The total genetic distance is 3,942.7 centimorgans (cM with the largest linkage group (180 loci measuring 574.5 cM. Average marker interval for all markers across the 16 linkage groups is 2.6 cM. CONCLUSION: We constructed a high density linkage map for A. c. cerana with 1,535 markers. Because the map is based on SNP markers, it will enable easier and faster genotyping assays than randomly amplified polymorphic DNA or microsatellite based maps used in A. mellifera.

  12. Very high resolution single pass HLA genotyping using amplicon sequencing on the 454 next generation DNA sequencers: Comparison with Sanger sequencing.

    Science.gov (United States)

    Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L

    2015-12-01

    Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. Copyright © 2015. Published by Elsevier Inc.

  13. Pressure ratio effects on self-similar scalar mixing of high-pressure turbulent jets in a pressurized volume

    Science.gov (United States)

    Ruggles, Adam; Pickett, Lyle; Frank, Jonathan

    2014-11-01

    Many real world combustion devices model fuel scalar mixing by assuming the self-similar argument established in atmospheric free jets. This allows simple prediction of the mean and rms fuel scalar fields to describe the mixing. This approach has been adopted in super critical liquid injections found in diesel engines where the liquid behaves as a dense fluid. The effect of pressure ratio (injection to ambient) when the ambient is greater than atmospheric pressure, upon the self-similar collapse has not been well characterized, particularly the effect upon mixing constants, jet spreading rates, and virtual origins. Changes in these self-similar parameters control the reproduction of the scalar mixing statistics. This experiment investigates the steady state mixing of high pressure ethylene jets in a pressurized pure nitrogen environment for various pressure ratios and jet orifice diameters. Quantitative laser Rayleigh scattering imaging was performed utilizing a calibration procedure to account for the pressure effects upon scattering interference within the high-pressure vessel.

  14. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    Directory of Open Access Journals (Sweden)

    Elena Marmesat

    Full Text Available The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95, yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43, and revealed more alleles at a population level (13 vs 12. Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.

  15. STAMP: Extensions to the STADEN sequence analysis package for high throughput interactive microsatellite marker design.

    Science.gov (United States)

    Kraemer, Lars; Beszteri, Bánk; Gäbler-Schwarz, Steffi; Held, Christoph; Leese, Florian; Mayer, Christoph; Pöhlmann, Kevin; Frickenhaus, Stephan

    2009-01-30

    Microsatellites (MSs) are DNA markers with high analytical power, which are widely used in population genetics, genetic mapping, and forensic studies. Currently available software solutions for high-throughput MS design (i) have shortcomings in detecting and distinguishing imperfect and perfect MSs, (ii) lack often necessary interactive design steps, and (iii) do not allow for the development of primers for multiplex amplifications. We present a set of new tools implemented as extensions to the STADEN package, which provides the backbone functionality for flexible sequence analysis workflows. The possibility to assemble overlapping reads into unique contigs (provided by the base functionality of the STADEN package) is important to avoid developing redundant markers, a feature missing from most other similar tools. Our extensions to the STADEN package provide the following functionality to facilitate microsatellite (and also minisatellite) marker design: The new modules (i) integrate the state-of-the-art tandem repeat detection and analysis software PHOBOS into workflows, (ii) provide two separate repeat detection steps - with different search criteria - one for masking repetitive regions during assembly of sequencing reads and the other for designing repeat-flanking primers for MS candidate loci, (iii) incorporate the widely used primer design program PRIMER3 into STADEN workflows, enabling the interactive design and visualization of flanking primers for microsatellites, and (iv) provide the functionality to find optimal locus- and primer pair combinations for multiplex primer design. Furthermore, our extensions include a module for storing analysis results in an SQLite database, providing a transparent solution for data access from within as well as from outside of the STADEN Package. The STADEN package is enhanced by our modules into a highly flexible, high-throughput, interactive tool for conventional and multiplex microsatellite marker design. It gives the user

  16. Fusion protein gene nucleotide sequence similarities, shared antigenic sites and phylogenetic analysis suggest that phocid distemper virus 2 and canine distemper virus belong to the same virus entity.

    NARCIS (Netherlands)

    I.K.G. Visser (Ilona); R.W.J. van der Heijden (Roger); M.W.G. van de Bildt (Marco); M.J.H. Kenter (Marcel); C. Örvell; A.D.M.E. Osterhaus (Albert)

    1993-01-01

    textabstractNucleotide sequencing of the fusion protein (F) gene of phocid distemper virus-2 (PDV-2), recently isolated from Baikal seals (Phoca sibirica), revealed an open reading frame (nucleotides 84 to 2075) with two potential in-frame ATG translation initiation codons. We suggest that the secon

  17. RNA recombination in Porcine Reproductive and Respiratory Syndrome Virus is restricted to parental sequences with high similarity

    DEFF Research Database (Denmark)

    Vugt, J.J.F.A. van; Storgaard, T.; Oleksiewicz, M. B.

    2001-01-01

    Two types of porcine reproductive and respiratory syndrome virus (PRRSV) exist, a North American type and a European type. The co-existence of both types in some countries, such as Denmark, Slovakia and Canada, creates a risk of inter-type recombination. To evaluate this risk, cell cultures were co...

  18. RNA recombination in Porcine Reproductive and Respiratory Syndrome Virus is restricted to parental sequences with high similarity

    DEFF Research Database (Denmark)

    Vugt, J.J.F.A. van; Storgaard, T.; Oleksiewicz, M. B.

    2001-01-01

    Two types of porcine reproductive and respiratory syndrome virus (PRRSV) exist, a North American type and a European type. The co-existence of both types in some countries, such as Denmark, Slovakia and Canada, creates a risk of inter-type recombination. To evaluate this risk, cell cultures were co......, but no recombination was detected between the European and North American types. Calculation of the maximum theoretical risk of European–American recombination, based on the sensitivity of the RT–PCR system, revealed that RNA recombination between the European and North American types of PRRSV is at least 10000 times...

  19. The proximal first exon architecture of the murine ghrelin gene is highly similar to its human orthologue

    Directory of Open Access Journals (Sweden)

    Seim Inge

    2009-05-01

    Full Text Available Abstract Background The murine ghrelin gene (Ghrl, originally sequenced from stomach tissue, contains five exons and a single transcription start site in a short, 19 bp first exon (exon 0. We recently isolated several novel first exons of the human ghrelin gene and found evidence of a complex transcriptional repertoire. In this report, we examined the 5' exons of the murine ghrelin orthologue in a range of tissues using 5' RACE. Findings 5' RACE revealed two transcription start sites (TSSs in exon 0 and four TSSs in intron 0, which correspond to 5' extensions of exon 1. Using quantitative, real-time RT-PCR (qRT-PCR, we demonstrated that extended exon 1 containing Ghrl transcripts are largely confined to the spleen, adrenal gland, stomach, and skin. Conclusion We demonstrate that multiple transcription start sites are present in exon 0 and an extended exon 1 of the murine ghrelin gene, similar to the proximal first exon organisation of its human orthologue. The identification of several transcription start sites in intron 0 of mouse ghrelin (resulting in an extension of exon 1 raises the possibility that developmental-, cell- and tissue-specific Ghrl mRNA species are created by employing alternative promoters and further studies of the murine ghrelin gene are warranted.

  20. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the

  1. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    1998-01-01

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the formatio

  2. Gait in ducks (Anas platyrhynchos and chickens (Gallus gallus – similarities in adaptation to high growth rate

    Directory of Open Access Journals (Sweden)

    B. M. Duggan

    2016-08-01

    Full Text Available Genetic selection for increased growth rate and muscle mass in broiler chickens has been accompanied by mobility issues and poor gait. There are concerns that the Pekin duck, which is on a similar selection trajectory (for production traits to the broiler chicken, may encounter gait problems in the future. In order to understand how gait has been altered by selection, the walking ability of divergent lines of high- and low-growth chickens and ducks was objectively measured using a pressure platform, which recorded various components of their gait. In both species, lines which had been selected for large breast muscle mass moved at a slower velocity and with a greater step width than their lighter conspecifics. These high-growth lines also spent more time supported by two feet in order to improve balance when compared with their lighter, low-growth conspecifics. We demonstrate that chicken and duck lines which have been subjected to intense selection for high growth rates and meat yields have adapted their gait in similar ways. A greater understanding of which components of gait have been altered in selected lines with impaired walking ability may lead to more effective breeding strategies to improve gait in poultry.

  3. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA.

    Directory of Open Access Journals (Sweden)

    Jesper Buchhave Poulsen

    Full Text Available Stored neonatal dried blood spot (DBS samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA. Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject we analysed a neonatal DBS sample and corresponding adult whole-blood (WB reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2 and raw DNA extract of the WB reference sample (WB_ref. Pilot 2: DBS_2x3.2, WB_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity-the concordance rate. Concordance rates were slightly lower when comparing DBS vs WB sample types than for any two WB sample types of the same subject before filtering of the variant calls. The overall concordance rates were dependent on the variant type, with SNPs performing best. Post-filtering, the comparisons of DBS vs WB and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference-whole-blood DNA-based on concordance rates calculated from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects.

  4. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    Science.gov (United States)

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  5. Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species.

    Science.gov (United States)

    Kuipers, A G J; Kamstra, S A; de Jeu, M J; Visser, R G F

    2002-01-01

    Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragments with sizes varying from 68-127 bp, and constituted a larger HinfI repeat of approximately 400 bp. Southern hybridization showed a similar molecular organization of the tandem repeats in each of the Brazilian Alstroemeria species tested. None of the repeats hybridized with DNA from Chilean Alstroemeria species, which indicates that they are specific for the Brazilian species. In-situ localization studies revealed the tandem repeats to be localized in clusters on the chromosomes of A. inodora and A. psittacina: distal hybridization sites were found on chromosome arms 2PS, 6PL, 7PS, 7PL and 8PL, interstitial sites on chromosome arms 2PL, 3PL, 4PL and 5PL. The applicability of the tandem repeats for cytogenetic analysis of interspecific hybrids and their role in heterochromatin organization are discussed.

  6. High Throughput Sequencing of T Cell Antigen Receptors Reveals a Conserved TCR Repertoire

    Science.gov (United States)

    Hou, Xianliang; Lu, Chong; Chen, Sisi; Xie, Qian; Cui, Guangying; Chen, Jianing; Chen, Zhi; Wu, Zhongwen; Ding, Yulong; Ye, Ping; Dai, Yong; Diao, Hongyan

    2016-01-01

    Abstract The T-cell receptor (TCR) repertoire is a mirror of the human immune system that reflects processes caused by infections, cancer, autoimmunity, and aging. Next-generation sequencing has become a powerful tool for deep TCR profiling. Herein, we used this technology to study the repertoire features of TCR beta chain in the blood of healthy individuals. Peripheral blood samples were collected from 10 healthy donors. T cells were isolated with anti-human CD3 magnetic beads according to the manufacturer's protocol. We then combined multiplex-PCR, Illumina sequencing, and IMGT/High V-QUEST to analyze the characteristics and polymorphisms of the TCR. Most of the individual T cell clones were present at very low frequencies, suggesting that they had not undergone clonal expansion. The usage frequencies of the TCR beta variable, beta joining, and beta diversity gene segments were similar among T cells from different individuals. Notably, the usage frequency of individual nucleotides and amino acids within complementarity-determining region (CDR3) intervals was remarkably consistent between individuals. Moreover, our data show that terminal deoxynucleotidyl transferase activity was biased toward the insertion of G (31.92%) and C (27.14%) over A (21.82%) and T (19.12%) nucleotides. Some conserved features could be observed in the composition of CDR3, which may inform future studies of human TCR gene recombination. PMID:26962778

  7. Bacterioplankton community analysis in tilapia ponds by Illumina high-throughput sequencing.

    Science.gov (United States)

    Fan, Li Min; Barry, Kamira; Hu, Geng Dong; Meng, Shun long; Song, Chao; Wu, Wei; Chen, Jia Zhang; Xu, Pao

    2016-01-01

    The changes of microbial community in aquaculture systems under the effects of stocking densities and seasonality were investigated in tilapia ponds. Total DNAs were extracted from the water samples, 16S rRNA gene was amplified and the bacterial community analyzed by Illumina high-throughput sequencing obtaining 3486 OTUs, from a total read of 715,842 sequences. Basing on the analysis of bacterial compositions, richness, diversity, bacterial 16S rRNA gene abundance, water sample comparisons and existence of specific bacterial taxa within three fish ponds in a 4 months period, the study conclusively observed that the dominant phylum in all water samples were similar, and they included; Proteobacteria, Cyanobacteria, Bacteroidetes, Actinobacteria, Planctomycetes and Chlorobi, distributed in different proportions in the different months and ponds. The seasonal changes had a more pronounced effect on the bacterioplankton community than the stocking densities; however some differences between the ponds were more likely caused by feed coefficient than by stocking densities. At the same time, most bacterial communities were affected by the nutrient input except phylum Cyanobacteria that was also affected by the feed control of tilapia.

  8. Sequence analyses of ITS2 and CO1 genes of Paragonimus proliferus obtained in Yunnan province, China and their similarities with those of P. hokuoensis.

    Science.gov (United States)

    Zhou, Ben-Jiang; Yang, Bin-Bin; Doanh, Pham Ngoc; Yang, Zhao-Qing; Xiang, Zheng; Li, Cui-Ying; Shinohara, Akio; Horii, Yoichiro; Nawa, Yukifumi

    2008-05-01

    Among about 50 Paragonimus species, Paragonimus proliferus is a rare species characterized by extremely large metacercariae, most of which are present excysted in the crab hosts. Recently, this species was discovered by us in northern Vietnam as the first record outside of China. DNA sequences of both second internal transcribed spacer region (ITS2) and cytochrome oxidase subunit 1 gene (CO1) genes of the metacercariae and adult worms of P. proliferus of the Vietnamese isolates were identical with those of Paragonimus hokuoensis in the DNA database of the GenBank. To confirm those observations and to clarify the molecular phylogenetic status of P. proliferus, we determined the ITS2 and CO1 sequences of the metacercariae of P. proliferus obtained in Yunnan province, China where the original specimen was discovered. The results show that both ITS2 and CO1 sequences of P. proliferus of the Chinese isolates are identical with those of P. proliferus of the Vietnamese isolates and are also identical with those of P. hokuoensis that appeared in the DNA database (obtained in Yunnan province), suggesting the synonymy of P. hokuoensis with P. proliferus. By phylogenetic tree analyses, all samples of P. proliferus from China and Vietnam together with P. hokuoensis constructed a distinct group within, or very close to, Paragonimus skrjabini complex in both trees.

  9. Recent Progress Using High-throughput Sequencing Technologies in Plant Molecular Breeding

    Institute of Scientific and Technical Information of China (English)

    Qiang Gao; Guidong Yue; Wenqi Li; Junyi Wang; Jiaohui Xu; Ye Yin

    2012-01-01

    High-throughput sequencing is a revolutionary technological innovation in DNA sequencing.This technology has an ultra-low cost per base of sequencing and an overwhelmingly high data output.High-throughput sequencing has brought novel research methods and solutions to the research fields of genomics and post-genomics.Furthermore,this technology is leading to a new molecular breeding revolution that has landmark significance for scientific research and enables us to launch multi-level,multifaceted,and multi-extent studies in the fields of crop genetics,genomics,and crop breeding.In this paper,we review progress in the application of high-throughput sequencing technologies to plant molecular breeding studies.

  10. High frequency of HMW-GS sequence variation through somatic hybridization between Agropyron elongatum and common wheat.

    Science.gov (United States)

    Gao, Xin; Liu, Shu Wei; Sun, Qun; Xia, Guang Min

    2010-01-01

    A symmetric somatic hybridization was performed to combine the protoplasts of tall wheatgrass (Agropyron elongatum) and bread wheat (Triticum aestivum). Fertile regenerants were obtained which were morphologically similar to tall wheatgrass, but which contained some introgression segments from wheat. An SDS-PAGE analysis showed that a number of non-parental high-molecular weight glutenin subunits (HMW-GS) were present in the symmetric somatic hybridization derivatives. These sequences were amplified, cloned and sequenced, to deliver 14 distinct HMW-GS coding sequences, eight of which were of the y-type (Hy1-Hy8) and six x-type (Hx1-Hx6). Five of the cloned HMW-GS sequences were successfully expressed in E. coli. The analysis of their deduced peptide sequences showed that they all possessed the typical HMW-GS primary structure. Sequence alignments indicated that Hx5 and Hy1 were probably derived from the tall wheatgrass genes Aex5 and Aey6, while Hy2, Hy3, Hx1 and Hy6 may have resulted from slippage in the replication of a related biparental gene. We found that both symmetric and asymmetric somatic hybridization could promote the emergence of novel alleles. We discussed the origination of allelic variation of HMW-GS genes in somatic hybridization, which might be the result from the response to genomic shock triggered by the merger and interaction of biparent genomes.

  11. Sources of PCR-induced distortions in high-throughput sequencing data sets

    Science.gov (United States)

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  12. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    Directory of Open Access Journals (Sweden)

    Khalid K Alam

    2015-01-01

    Full Text Available High-throughput sequence (HTS analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.. FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html.

  13. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Directory of Open Access Journals (Sweden)

    Banfield Jillian F

    2010-03-01

    Full Text Available Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

  14. Similarity Solution for High Weissenberg Number Flow of Upper-Convected Maxwell Fluid on a Linearly Stretching Sheet

    Directory of Open Access Journals (Sweden)

    Meysam Mohamadali

    2016-01-01

    Full Text Available High Weissenberg boundary layer flow of viscoelastic fluids on a stretching surface has been studied. The flow is considered to be steady, low inertial, and two-dimensional. Upon proper scaling and by means of an exact similarity transformation, the nonlinear momentum and constitutive equations of each layer transform into the respective system of highly nonlinear and coupled ordinary differential equations. Numerical solutions to the resulting boundary value problem are obtained using an efficient shooting technique in conjunction with a variable stepping method for different values of pressure gradients. It is observed that, unlike the Newtonian flows, in order to maintain a potential flow, normal stresses must inevitably develop. The velocity field and stresses distributions over plate are presented for difference values of pressure gradient and Weissenberg numbers.

  15. Global transcriptional profiles of beating clusters derived from human induced pluripotent stem cells and embryonic stem cells are highly similar

    Directory of Open Access Journals (Sweden)

    Gupta Manoj K

    2010-09-01

    Full Text Available Abstract Background Functional and molecular integrity of cardiomyocytes (CMs derived from induced pluripotent stem (iPS cells is essential for their use in tissue repair, disease modelling and drug screening. In this study we compared global transcriptomes of beating clusters (BCs microdissected from differentiating human iPS cells and embryonic stem (ES cells. Results Hierarchical clustering and principal component analysis revealed that iPS-BCs and ES-BCs cluster together, are similarly enriched for cardiospecific genes and differ in expression of only 1.9% of present transcripts. Similarly, sarcomeric organization, electrophysiological properties and calcium handling of iPS-CMs were indistinguishable from those of ES-CMs. Gene ontology analysis revealed that among 204 genes that were upregulated in iPS-BCs vs ES-BCs the processes related to extracellular matrix, cell adhesion and tissue development were overrepresented. Interestingly, 47 of 106 genes that were upregulated in undifferentiated iPS vs ES cells remained enriched in iPS-BCs vs ES-BCs. Most of these genes were found to be highly expressed in fibroblasts used for reprogramming and 34% overlapped with the recently reported iPS cell-enriched genes. Conclusions These data suggest that iPS-BCs are transcriptionally highly similar to ES-BCs. However, iPS-BCs appear to share some somatic cell signature with undifferentiated iPS cells. Thus, iPS-BCs may not be perfectly identical to ES-BCs. These minor differences in the expression profiles may occur due to differential cellular composition of iPS-BCs and ES-BCs, due to retention of some genetic profile of somatic cells in differentiated iPS cell-derivatives, or both.

  16. Sequence similarity between the viral cp gene and the transgene in transgenic papayas Similaridade de seqüência entre o gene cp do vírus e do transgene presente em mamoeiros transgênicos

    Directory of Open Access Journals (Sweden)

    Manoel Teixeira Souza Júnior

    2005-05-01

    Full Text Available The Papaya ringspot virus (PRSV coat protein transgene present in 'Rainbow' and 'SunUp' papayas disclose high sequence similarity (>89% to the cp gene from PRSV BR and TH. Despite this, both isolates are able to break down the resistance in 'Rainbow', while only the latter is able to do so in 'SunUp'. The objective of this work was to evaluate the degree of sequence similarity between the cp gene in the challenge isolate and the cp transgene in transgenic papayas resistant to PRSV. The production of a hybrid virus containing the genome backbone of PRSV HA up to the Apa I site in the NIb gene, and downstream from there, the sequence of PRSV TH was undertaken. This hybrid virus, PRSV HA/TH, was obtained and used to challenge 'Rainbow', 'SunUp', and an R2 population derived from line 63-1, all resistant to PRSV HA. PRSV HA/TH broke down the resistance in both papaya varieties and in the 63-1 population, demonstrating that sequence similarity is a major factor in the mechanism of resistance used by transgenic papayas expressing the cp gene. A comparative analysis of the cp gene present in line 55-1 and 63-1-derived transgenic plants and in PRSV HA, BR, and TH was also performed.O gene da capa protéica (cp do vírus da mancha anelar do mamoeiro (Papaya ringspot virus, PRSV, presente nos mamoeiros 'Rainbow' e 'SunUp', tem alta similaridade de seqüência (>89% com o gene cp dos isolados PRSV BR e TH. Apesar deste alto grau de similaridade, ambos isolados são capazes de quebrar a resistência observada em 'Rainbow', ao passo que TH quebra a resistência em 'SunUp'. O objetivo deste trabalho foi avaliar o grau de similaridade de seqüência entre o gene cp do vírus desafiante e do transgene em mamoeiros transgênicos resistentes a PRSV. Produziu-se um vírus híbrido contendo o genoma do isolado PRSV HA até o sítio de restrição Apa I no gene NIb, e, a partir deste ponto, este vírus continha o genoma do isolado PRSV TH. PRSV HA/TH foi utilizado

  17. High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach

    Directory of Open Access Journals (Sweden)

    Allard Marc W

    2012-01-01

    Full Text Available Abstract Background Next-Generation Sequencing (NGS is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study. Results Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data. Conclusions Implementation of a validated pipeline for NGS data acquisition and

  18. Effective DNA fragmentation technique for simple sequence repeat detection with a microsatellite-enriched library and high-throughput sequencing.

    Science.gov (United States)

    Tanaka, Keisuke; Ohtake, Rumi; Yoshida, Saki; Shinohara, Takashi

    2017-04-01

    Two different techniques for genomic DNA fragmentation before microsatellite-enriched library construction-restriction enzyme (NlaIII and MseI) digestion and sonication-were compared to examine their effects on simple sequence repeat (SSR) detection using high-throughput sequencing. Tens of thousands of SSR regions from 5 species of the plant family Myrtaceae were detected when the output of individual samples was >1 million paired-end reads. Comparison of the two DNA fragmentation techniques showed that restriction enzyme digestion was superior to sonication for identification of heterozygous genotypes, whereas sonication was superior for detection of various SSR flanking regions with both species-specific and common characteristics. Therefore, choosing the most suitable DNA fragmentation method depends on the type of analysis that is planned.

  19. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low...... biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material....

  20. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    OpenAIRE

    Frohme Marcus; Schnölzer Martina; Engelmann Julia C; Beisser Daniela; Shkumatov Alexander; Liang Chuanguang; Förster Frank; Müller Tobias; Schill Ralph O; Dandekar Thomas

    2009-01-01

    Abstract Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade ...

  1. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    OpenAIRE

    Frohme Marcus; Schnölzer Martina; Engelmann Julia C; Beisser Daniela; Shkumatov Alexander; Liang Chuanguang; Förster Frank; Müller Tobias; Schill Ralph O; Dandekar Thomas

    2009-01-01

    Abstract Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade ...

  2. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

    Science.gov (United States)

    Mayjonade, Baptiste; Gouzy, Jérôme; Donnadieu, Cécile; Pouilly, Nicolas; Marande, William; Callot, Caroline; Langlade, Nicolas; Muños, Stéphane

    2016-10-01

    De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.

  3. Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species

    NARCIS (Netherlands)

    Kuipers, A.G.J.; Kamstra, S.A.; Jeu, de M.J.; Jacobsen, E.

    2002-01-01

    Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragm

  4. Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom

    Science.gov (United States)

    Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.

    2017-01-01

    With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called "The…

  5. Multiple Teaching Approaches, Teaching Sequence and Concept Retention in High School Physics Education

    Science.gov (United States)

    Fogarty, Ian; Geelan, David

    2013-01-01

    Students in 4 Canadian high school physics classes completed instructional sequences in two key physics topics related to motion--Straight Line Motion and Newton's First Law. Different sequences of laboratory investigation, teacher explanation (lecture) and the use of computer-based scientific visualizations (animations and simulations) were…

  6. Multiple Teaching Approaches, Teaching Sequence and Concept Retention in High School Physics Education

    Science.gov (United States)

    Fogarty, Ian; Geelan, David

    2013-01-01

    Students in 4 Canadian high school physics classes completed instructional sequences in two key physics topics related to motion--Straight Line Motion and Newton's First Law. Different sequences of laboratory investigation, teacher explanation (lecture) and the use of computer-based scientific visualizations (animations and simulations) were…

  7. Further evaluation of the high-probability instructional sequence with and without programmed reinforcement.

    Science.gov (United States)

    Wilder, David A; Majdalany, Lina; Sturkie, Latasha; Smeltz, Lindsay

    2015-09-01

    In 2 experiments, we examined the effects of programmed reinforcement for compliance with high-probability (high-p) instructions on compliance with low-probability (low-p) instructions. In Experiment 1, we compared the high-p sequence with and without programmed reinforcement (i.e., edible items) for compliance with high-p instructions. Results showed that the high-p sequence increased compliance with low-p instructions only when compliance with high-p instructions was followed by reinforcement. In Experiment 2, we examined the role of reinforcer quality by delivering a lower quality reinforcer (praise) for compliance with high-p instructions. Results of Experiment 2 showed that the high-p sequence with lower quality reinforcement did not improve compliance with low-p instructions; the addition of a higher quality reinforcer (i.e., edible items) contingent on compliance with high-p instructions did increase compliance with low-p instructions.

  8. Activity/inactivity circadian rhythm shows high similarities between young obesity-induced rats and old rats.

    Science.gov (United States)

    Bravo Santos, R; Delgado, J; Cubero, J; Franco, L; Ruiz-Moyano, S; Mesa, M; Rodríguez, A B; Uguz, C; Barriga, C

    2016-03-01

    The objective of the present study was to compare differences between elderly rats and young obesity-induced rats in their activity/inactivity circadian rhythm. The investigation was motivated by the differences reported previously for the circadian rhythms of both obese and elderly humans (and other animals), and those of healthy, young or mature individuals. Three groups of rats were formed: a young control group which was fed a standard chow for rodents; a young obesity-induced group which was fed a high-fat diet for four months; and an elderly control group with rats aged 2.5 years that was fed a standard chow for rodents. Activity/inactivity data were registered through actimetry using infrared actimeter systems in each cage to detect activity. Data were logged on a computer and chronobiological analysis were performed. The results showed diurnal activity (sleep time), nocturnal activity (awake time), amplitude, acrophase, and interdaily stability to be similar between the young obesity-induced group and the elderly control group, but different in the young control group. We have concluded that obesity leads to a chronodisruption status in the body similar to the circadian rhythm degradation observed in the elderly.

  9. Characterization of expressed Pgip genes in rice and wheat reveals similar extent of sequence variation to dicot PGIPs and identifies an active PGIP lacking an entire LRR repeat.

    Science.gov (United States)

    Janni, Michela; Di Giovanni, Michela; Roberti, Serena; Capodicasa, Cristina; D'Ovidio, Renato

    2006-11-01

    Polygalacturonase-inhibiting proteins (PGIPs) are leucine-rich repeat (LRR) proteins involved in plant defence. A number of PGIPs have been characterized from dicot species, whereas only a few data are available from monocots. Database searches and genome-specific cloning strategies allowed the identification of four rice (Oryza sativa L.) and two wheat (Triticum aestivum L.) Pgip genes. The rice Pgip genes (Ospgip1, Ospgip2, Ospgip3 and Ospgip4) are distributed over a 30 kbp region of the short arm of chromosome 5, whereas the wheat Pgip genes, Tapgip1 and Tapgip2, are localized on the short arm of chromosome 7B and 7D, respectively. Deduced amino acid sequences show the typical LRR modular organization and a conserved distribution of the eight cysteines at the N- and C-terminal regions. Sequence comparison suggests that monocot and dicot PGIPs form two separate clusters sharing about 40% identity and shows that this value is close to the extent of variability observed within each cluster. Gene-specific RT-PCR and biochemical analyses demonstrate that both Ospgips and Tapgips are expressed in the whole plant or in a tissue-specific manner, and that OsPGIP1, lacking an entire LRR repeat, is an active inhibitor of fungal polygalacturonases. This last finding can contribute to define the molecular features of PG-PGIP interactions and highlights that the genetic events that can generate variability at the Pgip locus are not only limited to substitutions or small insertions/deletions, as so far reported, but can also involve variation in the number of LRRs.

  10. Similarity Scaling

    Science.gov (United States)

    Schnack, Dalton D.

    In Lecture 10, we introduced a non-dimensional parameter called the Lundquist number, denoted by S. This is just one of many non-dimensional parameters that can appear in the formulations of both hydrodynamics and MHD. These generally express the ratio of the time scale associated with some dissipative process to the time scale associated with either wave propagation or transport by flow. These are important because they define regions in parameter space that separate flows with different physical characteristics. All flows that have the same non-dimensional parameters behave in the same way. This property is called similarity scaling.

  11. Screening and Identification of DNA Aptamers to Tyramine Using in Vitro Selection and High-Throughput Sequencing.

    Science.gov (United States)

    Valenzano, Stefania; De Girolamo, Annalisa; DeRosa, Maria C; McKeague, Maureen; Schena, Roberto; Catucci, Lucia; Pascale, Michelangelo

    2016-06-13

    Aptamers are synthetic single-stranded DNA or RNA sequences that can fold into tertiary structures allowing them to interact with and bind to targets with high affinity and specificity. This paper describes the first selection and identification of DNA aptamers able to recognize the biogenic amine tyramine. To successfully isolate aptamers to this challenging small molecule target, the SELEX methodology was adapted by combining a systematic strategy to increase the selection stringency and monitor enrichment success. As the benefits of applying high-throughput sequencing (HTS) in SELEX experiments is becoming more clear, this method was employed in combination with bioinformatics analysis to evaluate the utility of the selection strategy and to uncover new potential high affinity sequences. On the basis of the presence of consensus regions (sequence families) and family similarities (clusters), 15 putative aptamers to tyramine were identified. A recently described workflow approach to perform a primary screening and characterization of the aptamer candidates by microequilibrium dialysis and by microscale thermophoresis was next leveraged. These candidate aptamers exhibited dissociation constant (Kd) values in the range of 0.2-152 μM with aptamer Tyr_10 as the most promising one followed by aptamer Tyr_14. These aptamers could be used as promising molecular recognition tools for the development of inexpensive, robust and innovative biosensor platforms for the detection of tyramine in food and beverages.

  12. Highly parallel translation of DNA sequences into small molecules.

    Directory of Open Access Journals (Sweden)

    Rebecca M Weisinger

    Full Text Available A large body of in vitro evolution work establishes the utility of biopolymer libraries comprising 10(10 to 10(15 distinct molecules for the discovery of nanomolar-affinity ligands to proteins. Small-molecule libraries of comparable complexity will likely provide nanomolar-affinity small-molecule ligands. Unlike biopolymers, small molecules can offer the advantages of cell permeability, low immunogenicity, metabolic stability, rapid diffusion and inexpensive mass production. It is thought that such desirable in vivo behavior is correlated with the physical properties of small molecules, specifically a limited number of hydrogen bond donors and acceptors, a defined range of hydrophobicity, and most importantly, molecular weights less than 500 Daltons. Creating a collection of 10(10 to 10(15 small molecules that meet these criteria requires the use of hundreds to thousands of diversity elements per step in a combinatorial synthesis of three to five steps. With this goal in mind, we have reported a set of mesofluidic devices that enable DNA-programmed combinatorial chemistry in a highly parallel 384-well plate format. Here, we demonstrate that these devices can translate DNA genes encoding 384 diversity elements per coding position into corresponding small-molecule gene products. This robust and efficient procedure yields small molecule-DNA conjugates suitable for in vitro evolution experiments.

  13. Molecular cytogenetic mapping of Cucumis sativus and C. melo using highly repetitive DNA sequences.

    Science.gov (United States)

    Koo, Dal-Hoe; Nam, Young-Woo; Choi, Doil; Bang, Jae-Wook; de Jong, Hans; Hur, Yoonkang

    2010-04-01

    Chromosomes often serve as one of the most important molecular aspects of studying the evolution of species. Indeed, most of the crucial mutations that led to differentiation of species during the evolution have occurred at the chromosomal level. Furthermore, the analysis of pachytene chromosomes appears to be an invaluable tool for the study of evolution due to its effectiveness in chromosome identification and precise physical gene mapping. By applying fluorescence in situ hybridization of 45S rDNA and CsCent1 probes to cucumber pachytene chromosomes, here, we demonstrate that cucumber chromosomes 1 and 2 may have evolved from fusions of ancestral karyotype with chromosome number n = 12. This conclusion is further supported by the centromeric sequence similarity between cucumber and melon, which suggests that these sequences evolved from a common ancestor. It may be after or during speciation that these sequences were specifically amplified, after which they diverged and specific sequence variants were homogenized. Additionally, a structural change on the centromeric region of cucumber chromosome 4 was revealed by fiber-FISH using the mitochondrial-related repetitive sequences, BAC-E38 and CsCent1. These showed the former sequences being integrated into the latter in multiple regions. The data presented here are useful resources for comparative genomics and cytogenetics of Cucumis and, in particular, the ongoing genome sequencing project of cucumber.

  14. [Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches].

    Science.gov (United States)

    Lu, Cairui; Zou, Changsong; Song, Guoli

    2015-08-01

    Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.

  15. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L

    2017-06-19

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5'-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5'-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  16. Gene Flow Results in High Genetic Similarity Between Sibiraea (Rosaceae species in the Qinghai-Tibetan Plateau

    Directory of Open Access Journals (Sweden)

    Peng-Cheng Fu

    2016-10-01

    Full Text Available Studying closely related species and divergent populations provides insight into the process of speciation. Previous studies showed that the Sibiraea complex's evolutionary history on the Qinghai-Tibetan Plateau (QTP was confusing and could not be distinguishable on the molecular level. In this study, the genetic structure and gene flow of S. laevigata and S. angustata on the QTP was examined across 45 populations using 8 microsatellite loci. Microsatellites revealed high genetic diversity in Sibiraea populations. Most of the variance was detected within populations (87.45% rather than between species (4.39%. We found no significant correlations between genetic and geographical distances among populations. Bayesian cluster analysis grouped all individuals in the sympatric area of Sibiraea into one cluster and other individuals of S. angustata into another. Divergence history analysis based on the approximate Bayesian computation method indicated that the populations of S. angustata at the sympatric area derived from the admixture of 2 species. The assignment test assigned all individuals to populations of their own species rather than its congeneric species. Consistently, intraspecies were detected rather than interspecies first-generation migrants. The bidirectional gene flow in long-term patterns between the 2 species was asymmetric, with more from S. angustata to S. laevigata. In conclusion, the Sibiraea complex was distinguishable on the molecular level using microsatellite loci. We found that the high genetic similarity of these 2 species resulted from huge bidirectional gene flow, especially on the sympatric area where population admixtures between the species occurred.

  17. Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs.

    Science.gov (United States)

    Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao

    2014-09-01

    Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.

  18. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies

    Directory of Open Access Journals (Sweden)

    Charbonnel Nathalie

    2010-05-01

    Full Text Available Abstract Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. Results DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. Conclusions This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied

  19. Newcastle Disease Viruses Causing Recent Outbreaks Worldwide Show Unexpectedly High Genetic Similarity to Historical Virulent Isolates from the 1940s.

    Science.gov (United States)

    Dimitrov, Kiril M; Lee, Dong-Hun; Williams-Coplin, Dawn; Olivier, Timothy L; Miller, Patti J; Afonso, Claudio L

    2016-05-01

    Virulent strains of Newcastle disease virus (NDV) cause Newcastle disease (ND), a devastating disease of poultry and wild birds. Phylogenetic analyses clearly distinguish historical isolates (obtained prior to 1960) from currently circulating viruses of class II genotypes V, VI, VII, and XII through XVIII. Here, partial and complete genomic sequences of recent virulent isolates of genotypes II and IX from China, Egypt, and India were found to be nearly identical to those of historical viruses isolated in the 1940s. Phylogenetic analysis, nucleotide distances, and rates of change demonstrate that these recent isolates have not evolved significantly from the most closely related ancestors from the 1940s. The low rates of change for these virulent viruses (7.05 × 10(-5) and 2.05 × 10(-5) per year, respectively) and the minimal genetic distances existing between these and historical viruses (0.3 to 1.2%) of the same genotypes indicate an unnatural origin. As with any other RNA virus, Newcastle disease virus is expected to evolve naturally; thus, these findings suggest that some recent field isolates should be excluded from evolutionary studies. Furthermore, phylogenetic analyses show that these recent virulent isolates are more closely related to virulent strains isolated during the 1940s, which have been and continue to be used in laboratory and experimental challenge studies. Since the preservation of viable viruses in the environment for over 6 decades is highly unlikely, it is possible that the source of some of the recent virulent viruses isolated from poultry and wild birds might be laboratory viruses. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  20. Current impact and future directions of high throughput sequencing in plant virus diagnostics.

    Science.gov (United States)

    Massart, Sebastien; Olmos, Antonio; Jijakli, Haissam; Candresse, Thierry

    2014-08-08

    The ability to provide a fast, inexpensive and reliable diagnostic for any given viral infection is a key parameter in efforts to fight and control these ubiquitous pathogens. The recent developments of high-throughput sequencing (also called Next Generation Sequencing - NGS) technologies and bioinformatics have drastically changed the research on viral pathogens. It is now raising a growing interest for virus diagnostics. This review provides a snapshot vision on the current use and impact of high throughput sequencing approaches in plant virus characterization. More specifically, this review highlights the potential of these new technologies and their interplay with current protocols in the future of molecular diagnostic of plant viruses. The current limitations that will need to be addressed for a wider adoption of high-throughput sequencing in plant virus diagnostics are thoroughly discussed.

  1. Expression of a new chimeric protein with a highly repeated sequence in tobacco cells.

    Science.gov (United States)

    Saumonneau, Amélie; Rottier, Karine; Conrad, Udo; Popineau, Yves; Guéguen, Jacques; Francin-Allami, Mathilde

    2011-07-01

    In wheat, the high-molecular weight (HMW) glutenin subunits are known to contribute to gluten viscoelasticity, and show some similarities to elastomeric animal proteins as elastin. When combining the sequence of a glutenin with that of elastin is a way to create new chimeric functional proteins, which could be expressed in plants. The sequence of a glutenin subunit was modified by the insertion of several hydrophobic and elastic motifs derived from elastin (elastin-like peptide, ELP) into the hydrophilic repetitive domain of the glutenin subunit to create a triblock protein, the objective being to improve the mechanical (elastomeric) properties of this wheat storage protein. In this study, we investigated an expression model system to analyze the expression and trafficking of the wild-type HMW glutenin subunit (GS(W)) and an HMW glutenin subunit mutated by the insertion of elastin motifs (GS(M)-ELP). For this purpose, a series of constructs was made to express wild-type subunits and subunits mutated by insertion of elastin motifs in fusion with green fluorescent protein (GFP) in tobacco BY-2 cells. Our results showed for the first time the expression of HMW glutenin fused with GFP in tobacco protoplasts. We also expressed and localized the chimeric protein composed of plant glutenin and animal elastin-like peptides (ELP) in BY-2 protoplasts, and demonstrated its presence in protein body-like structures in the endoplasmic reticulum. This work, therefore, provides a basis for heterologous production of the glutenin-ELP triblock protein to characterize its mechanical properties.

  2. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  3. Fibroblasts from phenotypically normal palmar fascia exhibit molecular profiles highly similar to fibroblasts from active disease in Dupuytren's Contracture

    Directory of Open Access Journals (Sweden)

    Satish Latha

    2012-05-01

    Full Text Available Abstract Background Dupuytren's contracture (DC is a fibroproliferative disorder characterized by the progressive development of a scar-like collagen-rich cord that affects the palmar fascia of the hand and leads to digital flexion contractures. DC is most commonly treated by surgical resection of the diseased tissue, but has a high reported recurrence rate ranging from 27% to 80%. We sought to determine if the transcriptomic profiles of fibroblasts derived from DC-affected palmar fascia, adjacent phenotypically normal palmar fascia, and non-DC palmar fascial tissues might provide mechanistic clues to understanding the puzzle of disease predisposition and recurrence in DC. Methods To achieve this, total RNA was obtained from fibroblasts derived from primary DC-affected palmar fascia, patient-matched unaffected palmar fascia, and palmar fascia from non-DC patients undergoing carpal tunnel release (6 patients in each group. These cells were grown on a type-1 collagen substrate (to better mimic their in vivo environments. Microarray analyses were subsequently performed using Illumina BeadChip arrays to compare the transcriptomic profiles of these three cell populations. Data were analyzed using Significance Analysis of Microarrays (SAM v3.02, hierarchical clustering, concordance mapping and Venn diagram. Results We found that the transcriptomic profiles of DC-disease fibroblasts and fibroblasts from unaffected fascia of DC patients exhibited a much greater overlap than fibroblasts derived from the palmar fascia of patients undergoing carpal tunnel release. Quantitative real time RT-PCR confirmed the differential expression of select genes validating the microarray data analyses. These data are consistent with the hypothesis that predisposition and recurrence in DC may stem, at least in part, from intrinsic similarities in the basal gene expression of diseased and phenotypically unaffected palmar fascia fibroblasts. These data also demonstrate that

  4. Fibroblasts from phenotypically normal palmar fascia exhibit molecular profiles highly similar to fibroblasts from active disease in Dupuytren's Contracture

    Science.gov (United States)

    2012-01-01

    Background Dupuytren's contracture (DC) is a fibroproliferative disorder characterized by the progressive development of a scar-like collagen-rich cord that affects the palmar fascia of the hand and leads to digital flexion contractures. DC is most commonly treated by surgical resection of the diseased tissue, but has a high reported recurrence rate ranging from 27% to 80%. We sought to determine if the transcriptomic profiles of fibroblasts derived from DC-affected palmar fascia, adjacent phenotypically normal palmar fascia, and non-DC palmar fascial tissues might provide mechanistic clues to understanding the puzzle of disease predisposition and recurrence in DC. Methods To achieve this, total RNA was obtained from fibroblasts derived from primary DC-affected palmar fascia, patient-matched unaffected palmar fascia, and palmar fascia from non-DC patients undergoing carpal tunnel release (6 patients in each group). These cells were grown on a type-1 collagen substrate (to better mimic their in vivo environments). Microarray analyses were subsequently performed using Illumina BeadChip arrays to compare the transcriptomic profiles of these three cell populations. Data were analyzed using Significance Analysis of Microarrays (SAM v3.02), hierarchical clustering, concordance mapping and Venn diagram. Results We found that the transcriptomic profiles of DC-disease fibroblasts and fibroblasts from unaffected fascia of DC patients exhibited a much greater overlap than fibroblasts derived from the palmar fascia of patients undergoing carpal tunnel release. Quantitative real time RT-PCR confirmed the differential expression of select genes validating the microarray data analyses. These data are consistent with the hypothesis that predisposition and recurrence in DC may stem, at least in part, from intrinsic similarities in the basal gene expression of diseased and phenotypically unaffected palmar fascia fibroblasts. These data also demonstrate that a collagen

  5. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander C Outhred

    Full Text Available Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.

  6. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    Science.gov (United States)

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-01

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits.

  7. High-speed automated DNA sequencing utilizing from-the-side laser excitation

    Science.gov (United States)

    Westphall, Michael S.; Brumley, Robert L., Jr.; Buxton, Erin C.; Smith, Lloyd M.

    1995-04-01

    The Human Genome Initiative is an ambitious international effort to map and sequence the three billion bases of DNA encoded in the human genome. If successfully completed, the resultant sequence database will be a tool of unparalleled power for biomedical research. One of the major challenges of this project is in the area of DNA sequencing technology. At this time, virtually all DNA sequencing is based upon the separation of DNA fragments in high resolution polyacrylamide gels. This method, as generally practiced, is one to two orders of magnitude too slow and expensive for the successful completion of the Human Genome projection. One reasonable approach is improved sequencing of DNA fragments is to increase the performance of such gel-based sequencing methods. Decreased sequencing times may be obtained by increasing the magnitude of the electric field employed. This is not possible with conventional sequencing, due to the fact that the additional heat associated with the increased electric field cannot be adequately dissipated. Recent developments in the use of thin gels have addressed this problem. Performing electrophoresis in ultrathin (50 to 100 microns) gels greatly increases the heat transfer efficiency, thus allowing the benefits of larger electric fields to be obtained. An increase in separation speed of about an order of magnitude is readily achieved. Thin gels have successfully been used in capillary and slab formats. A detection system has been designed for use with a multiple fluorophore sequencing strategy in horizontal ultrathin slab gels. The system employs laser through-the-side excitation and a cooled CCD detector; this allows for the parallel detection of up to 24 sets of four fluorescently labeled DNA sequencing reactions during their electrophoretic separation in ultrathin (115 micrometers ) denaturing polyacrylamide gels. Four hundred bases of sequence information is obtained from 100 ng of M13 template DNA in an hour, corresponding to an

  8. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  9. Use of high throughput sequencing to study oomycete communities in soil and roots

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    communities, DNA extracted from carrot tissue samples with symptoms of Pythium infection and soil samples collected from agricultural fields. Sequence data from Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of operational...... earlier in similar studies but with limited success, were used in this study with an improved protocol. Our result shows that the proportion of retrieved oomycete sequences dramatically increased, mainly by increasing the annealing temperature during PCR. The optimized protocol was validated using mock...

  10. Highly Iterated Palindromic Sequences (HIPs and Their Relationship to DNA Methyltransferases

    Directory of Open Access Journals (Sweden)

    Jeff Elhai

    2015-03-01

    Full Text Available The sequence GCGATCGC (Highly Iterated Palindrome, HIP1 is commonly found in high frequency in cyanobacterial genomes. An important clue to its function may be the presence of two orphan DNA methyltransferases that recognize internal sequences GATC and CGATCG. An examination of genomes from 97 cyanobacteria, both free-living and obligate symbionts, showed that there are exceptional cases in which HIP1 is at a low frequency or nearly absent. In some of these cases, it appears to have been replaced by a different GC-rich palindromic sequence, alternate HIPs. When HIP1 is at a high frequency, GATC- and CGATCG-specific methyltransferases are generally present in the genome. When an alternate HIP is at high frequency, a methyltransferase specific for that sequence is present. The pattern of 1-nt deviations from HIP1 sequences is biased towards the first and last nucleotides, i.e., those distinguish CGATCG from HIP1. Taken together, the results point to a role of DNA methylation in the creation or functioning of HIP sites. A model is presented that postulates the existence of a GmeC-dependent mismatch repair system whose activity creates and maintains HIP sequences.

  11. Coverage recommendation for genotyping analysis of highly heterologous species using next-generation sequencing technology

    Science.gov (United States)

    Song, Kai; Li, Li; Zhang, Guofan

    2016-01-01

    Next-generation sequencing (NGS) technology is being applied to an increasing number of non-model species and has been used as the primary approach for accurate genotyping in genetic and evolutionary studies. However, inferring genotypes from sequencing data is challenging, particularly for organisms with a high degree of heterozygosity. This is because genotype calls from sequencing data are often inaccurate due to low sequencing coverage, and if this is not accounted for, genotype uncertainty can lead to serious bias in downstream analyses, such as quantitative trait locus mapping and genome-wide association studies. Here, we used high-coverage reference data sets from Crassostrea gigas to simulate sequencing data with different coverage, and we evaluate the influence of genotype calling rate and accuracy as a function of coverage. Having initially identified the appropriate parameter settings for filtering to ensure genotype accuracy, we used two different single-nucleotide polymorphism (SNP) calling pipelines, single-sample and multi-sample. We found that a coverage of 15× was suitable for obtaining sufficient numbers of SNPs with high accuracy. Our work provides guidelines for the selection of sequence coverage when using NGS to investigate species with a high degree of heterozygosity and rapid decay of linkage disequilibrium. PMID:27760996

  12. High-throughput sequencing for the identification of binding molecules from DNA-encoded chemical libraries.

    Science.gov (United States)

    Buller, Fabian; Steiner, Martina; Scheuermann, Jörg; Mannocci, Luca; Nissen, Ina; Kohler, Manuel; Beisel, Christian; Neri, Dario

    2010-07-15

    DNA-encoded chemical libraries are large collections of small organic molecules, individually coupled to DNA fragments that serve as amplifiable identification bar codes. The isolation of specific binders requires a quantitative analysis of the distribution of DNA fragments in the library before and after capture on an immobilized target protein of interest. Here, we show how Illumina sequencing can be applied to the analysis of DNA-encoded chemical libraries, yielding over 10 million DNA sequence tags per flow-lane. The technology can be used in a multiplex format, allowing the encoding and subsequent sequencing of multiple selections in the same experiment. The sequence distributions in DNA-encoded chemical library selections were found to be similar to the ones obtained using 454 technology, thus reinforcing the concept that DNA sequencing is an appropriate avenue for the decoding of library selections. The large number of sequences obtained with the Illumina method now enables the study of very large DNA-encoded chemical libraries (>500,000 compounds) and reduces decoding costs.

  13. High signals in the uterine cervix on T2-weighted MRI sequences

    Energy Technology Data Exchange (ETDEWEB)

    Graef, De M.; Karam, R.; Daclin, P.Y.; Rouanet, J.P. [Department of Radiology, C.M.C. Beausoleil, 119 avenue de Lodeve, 34000 Montpellier (France); Juhan, V. [Department of Radiology, C.H.U. Timone, 13000 Marseille (France); Maubon, A.J. [Department of Radiology, C.H.U. Dupuytren, 87000 Limoges (France)

    2003-01-01

    The aim of this pictorial review was to illustrate the normal cervix appearance on T2-weighted images, and give a review of common or less common disorders of the uterine cervix that appear as high signal intensity lesions on T2-weighted sequences. Numerous aetiologies dominated by cervical cancer are reviewed and discussed. This gamut is obviously incomplete; however, radiologists who perform MR women's imaging should perform T2-weighted sequences in the sagittal plane regardless of the indication for pelvic MR. Those sequences will diagnose some previously unknown cervical cancers as well as many other unknown cervical or uterine lesions. (orig.)

  14. High-throughput sequencing, characterization and detection of new and conserved cucumber miRNAs.

    Directory of Open Access Journals (Sweden)

    Germán Martínez

    Full Text Available Micro RNAS (miRNAs are a class of endogenous small non coding RNAs involved in the post-transcriptional regulation of gene expression. In plants, a great number of conserved and specific miRNAs, mainly arising from model species, have been identified to date. However less is known about the diversity of these regulatory RNAs in vegetal species with agricultural and/or horticultural importance. Here we report a combined approach of bioinformatics prediction, high-throughput sequencing data and molecular methods to analyze miRNAs populations in cucumber (Cucumis sativus plants. A set of 19 conserved and 6 known but non-conserved miRNA families were found in our cucumber small RNA dataset. We also identified 7 (3 with their miRNA* strand not previously described miRNAs, candidates to be cucumber-specific. To validate their description these new C. sativus miRNAs were detected by northern blot hybridization. Additionally, potential targets for most conserved and new miRNAs were identified in cucumber genome.In summary, in this study we have identified, by first time, conserved, known non-conserved and new miRNAs arising from an agronomically important species such as C. sativus. The detection of this complex population of regulatory small RNAs suggests that similarly to that observe in other plant species, cucumber miRNAs may possibly play an important role in diverse biological and metabolic processes.

  15. Perilla Oil Has Similar Protective Effects of Fish Oil on High-Fat Diet-Induced Nonalcoholic Fatty Liver Disease and Gut Dysbiosis.

    Science.gov (United States)

    Tian, Yu; Wang, Hualin; Yuan, Fahu; Li, Na; Huang, Qiang; He, Lei; Wang, Limei; Liu, Zhiguo

    2016-01-01

    Nonalcoholic fatty liver disease (NAFLD) is the most prevalent chronic liver disease in developed countries. Recent studies indicated that the modification of gut microbiota plays an important role in the progression from simple steatosis to steatohepatitis. Epidemiological studies have demonstrated consumption of fish oil or perilla oil rich in n-3 polyunsaturated fatty acids (PUFAs) protects against NAFLD. However, the underlying mechanisms remain unclear. In the present study, we adopted 16s rRNA amplicon sequencing technique to investigate the impacts of fish oil and perilla oil on gut microbiomes modification in rats with high-fat diet- (HFD-) induced NAFLD. Both fish oil and perilla oil ameliorated HFD-induced hepatic steatosis and inflammation. In comparison with the low-fat control diet, HFD feeding significantly reduced the relative abundance of Gram-positive bacteria in the gut, which was slightly reversed by either fish oil or perilla oil. Additionally, fish oil and perilla oil consumption abrogated the elevated abundance of Prevotella and Escherichia in the gut from HFD fed animals. Interestingly, the relative abundance of antiobese Akkermansia was remarkably increased only in animals fed fish oil compared with HFD group. In conclusion, compared with fish oil, perilla oil has similar but slightly weaker potency against HFD-induced NAFLD and gut dysbiosis.

  16. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  17. High-throughput amplicon sequencing and stream benthic bacteria: identifying the best taxonomic level for multiple-stressor research

    Science.gov (United States)

    Salis, R. K.; Bruder, A.; Piggott, J. J.; Summerfield, T. C.; Matthaei, C. D.

    2017-03-01

    Disentangling the individual and interactive effects of multiple stressors on microbial communities is a key challenge to our understanding and management of ecosystems. Advances in molecular techniques allow studying microbial communities in situ and with high taxonomic resolution. However, the taxonomic level which provides the best trade-off between our ability to detect multiple-stressor effects versus the goal of studying entire communities remains unknown. We used outdoor mesocosms simulating small streams to investigate the effects of four agricultural stressors (nutrient enrichment, the nitrification inhibitor dicyandiamide (DCD), fine sediment and flow velocity reduction) on stream bacteria (phyla, orders, genera, and species represented by Operational Taxonomic Units with 97% sequence similarity). Community composition was assessed using amplicon sequencing (16S rRNA gene, V3-V4 region). DCD was the most pervasive stressor, affecting evenness and most abundant taxa, followed by sediment and flow velocity. Stressor pervasiveness was similar across taxonomic levels and lower levels did not perform better in detecting stressor effects. Community coverage decreased from 96% of all sequences for abundant phyla to 28% for species. Order-level responses were generally representative of responses of corresponding genera and species, suggesting that this level may represent the best compromise between stressor sensitivity and coverage of bacterial communities.

  18. The Quest for Rheological Similarity in Analogue Models: new Data on the Rheology of Highly Filled Silicon Polymers

    Science.gov (United States)

    Boutelier, D. A.; Schrank, C.; Cruden, A. R.

    2006-12-01

    The selection of appropriate analogue materials is a central consideration in the design of realistic physical models. Hence, information on the rheology of materials and potential materials is essential to evaluate their suitability as rock analogues. Silicon polymers have long been used to model ductile rocks that deform by diffusion or dislocation creep. Temperature and compositional variations that control the effective viscosity and density of rocks in the crust and mantle are simulated in the laboratory by multiple layers of various silicon polymers mixed with granular fillers, plasticines or bouncing putties. Since dislocation creep is a power law, strain rate-softening flow mechanism, we have been investigating the rheology of highly filled silicon polymers as suitable new analogue materials with similar deformation behavior. The materials actually exhibit strain rate softening behavior but with increasing amounts of filler the mixtures also become non-linear. We report the rheological properties of the analogue materials as functions of the filler content. For the linear viscous materials the flow laws are presented (viscosity coefficient and power law exponent). For non-linear materials the relative importance of strain and strain-rate softening/hardening has been investigated doing multiple creep tests that allow mapping of the effective viscosity in the stress-strain space. Our study reveals that most of the currently used silicon-based analogue materials have a linear or quasi-linear rheology but are also Newtonian or nearly-Newtonian viscous fluid, which makes them more appropriate for simulating natural rocks deforming by diffusion creep.

  19. Extended self-similarity in moment-generating-functions in wall-bounded turbulence at high Reynolds number

    Science.gov (United States)

    Yang, X. I. A.; Meneveau, C.; Marusic, I.; Biferale, L.

    2016-08-01

    In wall-bounded turbulence, the moment generating functions (MGFs) of the streamwise velocity fluctuations develop power-law scaling as a function of the wall normal distance z /δ . Here u is the streamwise velocity fluctuation, + indicates normalization in wall units (averaged friction velocity), z is the distance from the wall, q is an independent variable, and δ is the boundary layer thickness. Previous work has shown that this power-law scaling exists in the log-region 3 Reτ0.5≲z+,z ≲0.15 δ where Reτ is the friction velocity-based Reynolds number. Here we present empirical evidence that this self-similar scaling can be extended, including bulk and viscosity-affected regions 30 reference value, qo. ESS also improves the scaling properties, leading to more precise measurements of the scaling exponents. The analysis is based on hot-wire measurements from boundary layers at Reτ ranging from 2700 to 13 000 from the Melbourne High-Reynolds-Number-Turbulent-Boundary-Layer-Wind-Tunnel. Furthermore, we investigate the scalings of the filtered, large-scale velocity fluctuations uzL and of the remaining small-scale component, uzS=uz-uzL . The scaling of uzL falls within the conventionally defined log region and depends on a scale that is proportional to l+˜Reτ1/2 ; the scaling of uzS extends over a much wider range from z+≈30 to z ≈0.5 δ . Last, we present a theoretical construction of two multiplicative processes for uzL and uzS that reproduce the empirical findings concerning the scalings properties as functions of z+ and in the ESS sense.

  20. High-channel-count plasmonic filter with the metal-insulator-metal Fibonacci-sequence gratings.

    Science.gov (United States)

    Gong, Yongkang; Liu, Xueming; Wang, Leiran

    2010-02-01

    Fibonacci-sequence gratings based on metal-insulator-metal waveguides are proposed. The spectrum properties of this structure are numerically investigated by using the transfer matrix method. Numerical results demonstrate that the proposed structure can generate high-channel-count plasmonic stop bands and can find significant applications in highly integrated dense wavelength division multiplexing networks.

  1. Draft Genome Sequencing of the Highly Halotolerant and Allopolyploid Yeast Zygosaccharomyces rouxii NBRC 1876

    Science.gov (United States)

    Matsushima, Kenichiro; Oshima, Kenshiro; Hattori, Masahira; Koyama, Yasuji

    2017-01-01

    ABSTRACT The highly halotolerant and allopolyploid yeast Zygosaccharomyces rouxii is industrially used for the food production in high concentrations of salt, such as brewing soy sauce and miso paste. Here, we report the draft genome sequence of Z. rouxii NBRC 1876 isolated from miso paste. PMID:28209823

  2. Exome sequencing generates high quality data in non-target regions

    Directory of Open Access Journals (Sweden)

    Guo Yan

    2012-05-01

    Full Text Available Abstract Background Exome sequencing using next-generation sequencing technologies is a cost efficient approach to selectively sequencing coding regions of human genome for detection of disease variants. A significant amount of DNA fragments from the capture process fall outside target regions, and sequence data for positions outside target regions have been mostly ignored after alignment. Result We performed whole exome sequencing on 22 subjects using Agilent SureSelect capture reagent and 6 subjects using Illumina TrueSeq capture reagent. We also downloaded sequencing data for 6 subjects from the 1000 Genomes Project Pilot 3 study. Using these data, we examined the quality of SNPs detected outside target regions by computing consistency rate with genotypes obtained from SNP chips or the Hapmap database, transition-transversion (Ti/Tv ratio, and percentage of SNPs inside dbSNP. For all three platforms, we obtained high-quality SNPs outside target regions, and some far from target regions. In our Agilent SureSelect data, we obtained 84,049 high-quality SNPs outside target regions compared to 65,231 SNPs inside target regions (a 129% increase. For our Illumina TrueSeq data, we obtained 222,171 high-quality SNPs outside target regions compared to 95,818 SNPs inside target regions (a 232% increase. For the data from the 1000 Genomes Project, we obtained 7,139 high-quality SNPs outside target regions compared to 1,548 SNPs inside target regions (a 461% increase. Conclusions These results demonstrate that a significant amount of high quality genotypes outside target regions can be obtained from exome sequencing data. These data should not be ignored in genetic epidemiology studies.

  3. Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

    DEFF Research Database (Denmark)

    Karst, Soeren M; Dueholm, Morten S; McIlroy, Simon J

    2016-01-01

    Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies...... (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human...... gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity...

  4. Reverse Engineering of Vaccine Antigens Using High Throughput Sequencing-enhanced mRNA Display

    Directory of Open Access Journals (Sweden)

    Nini Guo

    2015-08-01

    Research in Context: We used a large number of randomly produced small proteins (“peptides” to identify peptides containing specific protein sequences that bind efficiently to an antibody that can prevent hepatitis C virus infection in cell culture. After the identified peptides were injected into mice, the mice produced their own antibodies with characteristics similar to the original antibody. This approach can provide previously unavailable information about antibody binding and could also be useful in developing new vaccines.

  5. On the optimal trimming of high-throughput mRNA sequence data

    Directory of Open Access Journals (Sweden)

    Matthew D MacManes

    2014-01-01

    Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

  6. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    Science.gov (United States)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  7. Prediction of new high pressure structural sequence in thorium carbide: A first principles study

    Energy Technology Data Exchange (ETDEWEB)

    Sahoo, B. D., E-mail: bdsahoo@barc.gov.in; Joshi, K. D.; Gupta, Satish C. [Applied Physics Division, Bhabha Atomic Research Centre, Mumbai 400085 (India)

    2015-05-14

    In the present work, we report the detailed electronic band structure calculations on thorium monocarbide. The comparison of enthalpies, derived for various phases using evolutionary structure search method in conjunction with first principles total energy calculations at several hydrostatic compressions, yielded a high pressure structural sequence of NaCl type (B1) → Pnma → Cmcm → CsCl type (B2) at hydrostatic pressures of ∼19 GPa, 36 GPa, and 200 GPa, respectively. However, the two high pressure experimental studies by Gerward et al. [J. Appl. Crystallogr. 19, 308 (1986); J. Less-Common Met. 161, L11 (1990)] one up to 36 GPa and other up to 50 GPa, on substoichiometric thorium carbide samples with carbon deficiency of ∼20%, do not report any structural transition. The discrepancy between theory and experiment could be due to the non-stoichiometry of thorium carbide samples used in the experiment. Further, in order to substantiate the results of our static lattice calculations, we have determined the phonon dispersion relations for these structures from lattice dynamic calculations. The theoretically calculated phonon spectrum reveal that the B1 phase fails dynamically at ∼33.8 GPa whereas the Pnma phase appears as dynamically stable structure around the B1 to Pnma transition pressure. Similarly, the Cmcm structure also displays dynamic stability in the regime of its structural stability. The B2 phase becomes dynamically stable much below the Cmcm to B2 transition pressure. Additionally, we have derived various thermophysical properties such as zero pressure equilibrium volume, bulk modulus, its pressure derivative, Debye temperature, thermal expansion coefficient and Gruneisen parameter at 300 K and compared these with available experimental data. Further, the behavior of zero pressure bulk modulus, heat capacity and Helmholtz free energy has been examined as a function temperature and compared with the experimental data of Danan [J

  8. Prediction of new high pressure structural sequence in thorium carbide: A first principles study

    Science.gov (United States)

    Sahoo, B. D.; Joshi, K. D.; Gupta, Satish C.

    2015-05-01

    In the present work, we report the detailed electronic band structure calculations on thorium monocarbide. The comparison of enthalpies, derived for various phases using evolutionary structure search method in conjunction with first principles total energy calculations at several hydrostatic compressions, yielded a high pressure structural sequence of NaCl type (B1) → Pnma → Cmcm → CsCl type (B2) at hydrostatic pressures of ˜19 GPa, 36 GPa, and 200 GPa, respectively. However, the two high pressure experimental studies by Gerward et al. [J. Appl. Crystallogr. 19, 308 (1986); J. Less-Common Met. 161, L11 (1990)] one up to 36 GPa and other up to 50 GPa, on substoichiometric thorium carbide samples with carbon deficiency of ˜20%, do not report any structural transition. The discrepancy between theory and experiment could be due to the non-stoichiometry of thorium carbide samples used in the experiment. Further, in order to substantiate the results of our static lattice calculations, we have determined the phonon dispersion relations for these structures from lattice dynamic calculations. The theoretically calculated phonon spectrum reveal that the B1 phase fails dynamically at ˜33.8 GPa whereas the Pnma phase appears as dynamically stable structure around the B1 to Pnma transition pressure. Similarly, the Cmcm structure also displays dynamic stability in the regime of its structural stability. The B2 phase becomes dynamically stable much below the Cmcm to B2 transition pressure. Additionally, we have derived various thermophysical properties such as zero pressure equilibrium volume, bulk modulus, its pressure derivative, Debye temperature, thermal expansion coefficient and Gruneisen parameter at 300 K and compared these with available experimental data. Further, the behavior of zero pressure bulk modulus, heat capacity and Helmholtz free energy has been examined as a function temperature and compared with the experimental data of Danan [J. Nucl. Mater. 57, 280

  9. Complete mitochondrial genome sequence of a Hungarian red deer (Cervus elaphus hippelaphus) from high-throughput sequencing data and its phylogenetic position within the family Cervidae.

    Science.gov (United States)

    Frank, Krisztián; Barta, Endre; Bana, Nóra Á; Nagy, János; Horn, Péter; Orosz, László; Stéger, Viktor

    2016-06-01

    Recently, there has been considerable interest in genetic differentiation in the Cervidae family. A common tool used to determine genetic variation in different species, breeds and populations is mitochondrial DNA analysis, which can be used to estimate phylogenetic relationships among animal taxa and for molecular phylogenetic evolution analysis. With the development of sequencing technology, more and more mitochondrial sequences have been made available in public databases, including whole mitochondrial DNA sequences. These data have been used for phylogenetic analysis of animal species, and for studies of evolutionary processes. We determined the complete mitochondrial genome of a Central European red deer, Cervus elaphus hippelaphus, from Hungary by a next generation sequencing technology. The mitochondrial genome is 16 354 bp in length and contains 13 protein-coding genes, two rRNA genes, 22 tRNA genes and a control region, all of which are arranged similar as in other vertebrates. We made phylogenetic analyses with the new sequence and 76 available mitochondrial sequences of Cervidae, using Bos taurus mitochondrial sequence as outgroup. We used 'neighbor joining' and 'maximum likelihood' methods on whole mitochondrial genome sequences; the consensus phylogenetic trees supported monophyly of the family Cervidae; it was divided into two subfamilies, Cervinae and Capreolinae, and five tribes, Cervini, Muntiacini, Alceini, Odocoileini, and Capreolini. The evolutionary structure of the family Cervidae can be reconstructed by phylogenetic analysis based on whole mitochondrial genomes; which method could be used broadly in phylogenetic evolutionary analysis of animal taxa.

  10. Fibre-specific responses to endurance and low volume high intensity interval training: striking similarities in acute and chronic adaptation.

    Directory of Open Access Journals (Sweden)

    Trisha D Scribbans

    Full Text Available The current study involved the completion of two distinct experiments. Experiment 1 compared fibre specific and whole muscle responses to acute bouts of either low-volume high-intensity interval training (LV-HIT or moderate-intensity continuous endurance exercise (END in a randomized crossover design. Experiment 2 examined the impact of a six-week training intervention (END or LV-HIT; 4 days/week, on whole body and skeletal muscle fibre specific markers of aerobic and anaerobic capacity. Six recreationally active men (Age: 20.7 ± 3.8 yrs; VO2peak: 51.9 ± 5.1 mL/kg/min reported to the lab on two separate occasions for experiment 1. Following a muscle biopsy taken in a fasted state, participants completed an acute bout of each exercise protocol (LV-HIT: 8, 20-second intervals at ∼ 170% of VO2peak separated by 10 seconds of rest; END: 30 minutes at ∼ 65% of VO2peak, immediately followed by a muscle biopsy. Glycogen content of type I and IIA fibres was significantly (p<0.05 reduced, while p-ACC was significantly increased (p<0.05 following both protocols. Nineteen recreationally active males (n = 16 and females (n = 3 were VO2peak-matched and assigned to either the LV-HIT (n = 10; 21 ± 2 yrs or END (n = 9; 20.7 ± 3.8 yrs group for experiment 2. After 6 weeks, both training protocols induced comparable increases in aerobic capacity (END: Pre: 48.3 ± 6.0, Mid: 51.8 ± 6.0, Post: 55.0 ± 6.3 mL/kg/min LV-HIT: Pre: 47.9 ± 8.1, Mid: 50.4 ± 7.4, Post: 54.7 ± 7.6 mL/kg/min, fibre-type specific oxidative and glycolytic capacity, glycogen and IMTG stores, and whole-muscle capillary density. Interestingly, only LV-HIT induced greater improvements in anaerobic performance and estimated whole-muscle glycolytic capacity. These results suggest that 30 minutes of END exercise at ∼ 65% VO2peak or 4 minutes of LV-HIT at ∼ 170% VO2peak induce comparable changes in the intra-myocellular environment (glycogen content and signaling activation

  11. Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing

    Science.gov (United States)

    Dilernia, Dario A.; Chien, Jung-Ting; Monaco, Daniela C.; Brown, Michael P.S.; Ende, Zachary; Deymier, Martin J.; Yue, Ling; Paxinos, Ellen E.; Allen, Susan; Tirado-Ramos, Alfredo; Hunter, Eric

    2015-01-01

    Single Molecule, Real-Time (SMRT®) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. PMID:26101252

  12. Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

    Science.gov (United States)

    Dilernia, Dario A; Chien, Jung-Ting; Monaco, Daniela C; Brown, Michael P S; Ende, Zachary; Deymier, Martin J; Yue, Ling; Paxinos, Ellen E; Allen, Susan; Tirado-Ramos, Alfredo; Hunter, Eric

    2015-11-16

    Single Molecule, Real-Time (SMRT) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.

  13. Scaling and interaction of self-similar modes in models of high-Reynolds number wall turbulence

    CERN Document Server

    Sharma, A S; McKeon, B J

    2016-01-01

    Previous work has established the usefulness of the resolvent operator that maps the terms nonlinear in the turbulent fluctuations to the fluctuations themselves. Further work has described the self-similarity of the resolvent arising from that of the mean velocity profile. The orthogonal modes provided by the resolvent analysis describe the wall-normal coherence of the motions and inherit that self-similarity. In this contribution, we present the implications of this similarity for the nonlinear interaction between modes with different scales and wall-normal locations. By considering the nonlinear interactions between modes, it is shown that much of the turbulence scaling behaviour in the logarithmic region can be determined from a single arbitrarily chosen reference plane. Thus, the geometric scaling of the modes is impressed upon the nonlinear interaction between modes. Implications of these observations on the self-sustaining mechanisms of wall turbulence, modelling and simulation are outlined.

  14. Efficient Similarity Retrieval in Music Databases

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    object is modeled as a time sequence of high-dimensional feature vectors, and dynamic time warping (DTW) is used as the similarity measure. To accomplish this, the paper extends techniques for time-series-length reduction and lower bounding of DTW distance to the multi-dimensional case. Further...

  15. High-performance permanent magnet brushless motors with balanced concentrated windings and similar slot and pole numbers

    Science.gov (United States)

    Štumberger, Bojan; Štumberger, Gorazd; Hadžiselimović, Miralem; Hamler, Anton; Trlep, Mladen; Goričan, Viktor; Jesenik, Marko

    2006-09-01

    The paper presents a comparison between the performances of exterior-rotor permanent magnet brushless motors with distributed windings and the performances of exterior-rotor permanent magnet brushless motors with concentrated windings. Finite element method analysis is employed to determine the performance of each motor. It is shown that motors with concentrated windings and similar slot and pole numbers exhibit similar or better performances than motors with distributed windings for brushless AC (BLAC) operation mode and brushless DC (BLDC) operation mode as well.

  16. High-performance permanent magnet brushless motors with balanced concentrated windings and similar slot and pole numbers

    Energy Technology Data Exchange (ETDEWEB)

    Stumberger, Bojan [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia)]. E-mail: bojan.stumberger@uni-mb.si; Stumberger, Gorazd [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia); Hadziselimovic, Miralem [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia); Hamler, Anton [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia); Trlep, Mladen [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia); Gorican, Viktor [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia); Jesenik, Marko [Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000 Maribor (Slovenia)

    2006-09-15

    The paper presents a comparison between the performances of exterior-rotor permanent magnet brushless motors with distributed windings and the performances of exterior-rotor permanent magnet brushless motors with concentrated windings. Finite element method analysis is employed to determine the performance of each motor. It is shown that motors with concentrated windings and similar slot and pole numbers exhibit similar or better performances than motors with distributed windings for brushless AC (BLAC) operation mode and brushless DC (BLDC) operation mode as well.

  17. Investigation of the fungal community structures of imported wheat using high-throughput sequencing technology

    Science.gov (United States)

    Wang, Ying; Zhang, Guiming; Gao, Ruifang; Xiang, Caiyu; Feng, Jianjun; Lou, Dingfeng; Liu, Ying

    2017-01-01

    This study introduced the application of high-throughput sequencing techniques to the investigation of microbial diversity in the field of plant quarantine. It examined the microbial diversity of wheat imported into China, and established a bioinformatics database of wheat pathogens based on high-throughput sequencing results. This study analyzed the nuclear ribosomal internal transcribed spacer (ITS) region of fungi through Illumina Miseq sequencing to investigate the fungal communities of both seeds and sieve-through. A total of 758,129 fungal ITS sequences were obtained from ten samples collected from five batches of wheat imported from the USA. These sequences were classified into 2 different phyla, 15 classes, 33 orders, 41 families, or 78 genera, suggesting a high fungal diversity across samples. Apairwise analysis revealed that the diversity of the fungal community in the sieve-through is significantly higher than those in the seeds. Taxonomic analysis showed that at the class level, Dothideomycetes dominated in the seeds and Sordariomycetes dominated in the sieve-through. In all, this study revealed the fungal community composition in the seeds and sieve-through of the wheat, and identified key differences in the fungal community between the seeds and sieve-through. PMID:28241020

  18. New measuring concepts using integrated online analysis of color and monochrome digital high-speed camera sequences

    Science.gov (United States)

    Renz, Harald

    1997-05-01

    High speed sequences allow a subjective assessment of very fast processes and serve as an important basis for the quantitative analysis of movements. Computer systems help to acquire, handle, display and store digital image sequences as well as to perform measurement tasks automatically. High speed cameras have been used since several years for safety tests, material testing or production optimization. To get the very high speed of 1000 or more images per second, three have been used mainly 16 mm film cameras, which could provide an excellent image resolution and the required time resolution. But up to now, most results have been only judged by viewing. For some special applications like safety tests using crash or high-g sled tests in the automobile industry there have been used image analyzing techniques to measure also the characteristic of moving objects inside images. High speed films, shot during the short impact, allow judgement of the dynamic scene. Additionally they serve as an important basis for the quantitative analysis of the very fast movements. Thus exact values of the velocity and acceleration, the dummies or vehicles are exposed to, can be derived. For analysis of the sequences the positions of signalized points--mostly markers, which are fixed by the test engineers before a test--have to be measured frame by frame. The trajectories show the temporal sequence of the test objects and are the base for calibrated diagrams of distance, velocity and acceleration. Today there are replaced more and more 16 mm film cameras by electronic high speed cameras. The development of high-speed recording systems is very far advanced and the prices of these systems are more and more comparable to those of traditional film cameras. Also the resolution has been increased very greatly. The new cameras are `crashproof' and can be used for similar tasks as the 16 mm film cameras at similar sizes. High speed video cameras now offer an easy setup and direct access to

  19. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit in the STEM degree production rate needed to fill the demand of the current job market and remain competitive as a nation. The purpose of the study was to make a difference in the number of students who have access to information about the benefits of completing a full sequence of science courses. This dissertation study employed qualitative research methodology to gain a broad perspective of staff through a questionnaire and document review and then a deeper understanding through semi-structured interview protocol. The data revealed that a universal sequence of science courses in the high school district did not exist. It also showed that not all students had access to all science courses; students were sorted and tracked according to prerequisites that did not necessarily match the skill set needed for the courses. In addition, the study showed a desire for more support and direction from the district office. It was also apparent that there was a disconnect that existed between who staff members believed should enroll in a full sequence of science courses and who actually enrolled. Finally, communication about science was shown to occur mainly through counseling and peers. A common science sequence, detracking of science courses, increased communication about the postsecondary and academic benefits of a science education, increased district direction and realistic mathematics alignment were all discussed as solutions to the problem.

  20. A robust, simple genotyping-by-sequencing (GBS approach for high diversity species.

    Directory of Open Access Journals (Sweden)

    Robert J Elshire

    Full Text Available Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs. This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM and barley (Oregon Wolfe Barley recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

  1. Intron sequences provide a tool for high-resolution phylogenetic analysis of volvocine algae.

    Science.gov (United States)

    Liss, M; Kirk, D L; Beyser, K; Fabry, S

    1997-03-01

    Three nuclear spliceosomal introns in conserved locations were amplified and sequenced from 28 strains representing 14 species and 4 genera of volvocalean green algae. Data derived from the three different introns yielded congruent results in nearly all cases. In pairwise comparisons, a spectrum of taxon-specific sequence differences ranging from complete identity to no significant similarity was observed, with the most distantly related organisms lacking any conserved elements apart from exon-intron boundaries and a pyrimidine-rich stretch near the 3' splice site. A metric (SI50), providing a measure of the degree of similarity of any pair of intron sequences, was defined and used to calculate phylogenetic distances between organisms whose introns displayed statistically significant similarities. The rate of sequences divergence in the introns was great enough to provide useful information about relationships among different geographical isolates of a single species, but in most cases was too great to provide reliable guides to relationships above the species level. A substitution rate of approximately 3 x 10(-8) per intron position per year was estimated, which is about 150-fold higher than in nuclear genes encoding rRNA and about 10-fold higher than the synonymous substitution rate in protein-coding regions. Thus, these homologous introns not only provide useful information about intraspecific phylogenetic relationships, but also illustrate the concept that different parts of a gene may be subject to extremely different intensities of selection. The intron data generated here (1) reliably resolve for the first time the relationships among the five most extensively studied strains of Volvox, (2) reveal that two other Volvox species may be more closely related than had previously been suspected, (3) confirm prior evidence that particular isolates of Eudorina elegans and Pleodorina illinoisensis appear to be sibling taxa, and (4) contribute to the resolution of

  2. Outer-membrane cytochrome-c, OmcF from Geobacter sulfurreducens: high structural similarity to an algal cytochrome c6.

    Energy Technology Data Exchange (ETDEWEB)

    Pokkuluri, P. R.; Londer, Y. Y.; Wood, S. J.; Duke, N. E. C.; Morgado, L.; Salgueiro, C. A.; Schiffer, M.; Biosciences Division; Univ. Nova de Lisboa

    2009-01-01

    Putative outer membrane c-type cytochromes have been implicated in metal ion reducing properties of Geobacter sulfurreducens. OmcF (GSU2432), OmcB (GSU2731), and OmcC (GSU2737) are three such proteins that have predicted lipid anchors. OmcF is a monoheme cytochrome, whereas OmcB and OmcC are multiheme cytochromes. Deletion of OmcF was reported to affect the expression of OmcB and OmcC in G. sulfurreducens. The OmcF deficient strain was impaired in its ability to both reduce and grow on Fe(III) citrate probably because the expression of OmcB, which is crucial for iron reduction, is low in this strain. U(VI) reduction activity of this bacterium is also lower on deletion of OmcB or OmcF. The U(VI) reduction activity is affected more by the deletion of OmcF than by the deletion of OmcB. The soluble part of OmcF (residues 20-104, referred to as OmcF{sub S} hereafter) has sequence similarity to soluble cytochromes c{sub 6} of photosynthetic algae and cyanobacteria. The cytochrome c{sub 6} proteins in algae and cyanobacteria are electron transport proteins that mediate the transfer of electrons from cytochrome b{sub 6}f to photosystem I and have high reduction potentials of about +350 mV and low pI. The structures of seven cytochromes c{sub 6} have been previously determined. Further, a c{sub 6}-like cytochrome (PetJ2) of unknown function was recently identified in Synechoccus sp. PCC 7002 with a reduction potential of +148 mV and high pI. Here, we report the structure of OmcF{sub S} and its remarkable structural similarity to that of cytochrome c{sub 6} from the green alga, Monoraphidium braunii. To our knowledge, OmcF{sub S} is the first example of a cytochrome c{sub 6}-like structure from a nonphotosynthetic organism.

  3. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris)

    Indian Academy of Sciences (India)

    Wenping Zhang; Zhihe Zhang; Fujun Shen; Rong Hou; Xiaoping Lv; Bisong Yue

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8–17 million years ago in the tiger and 4.6–16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular ‘fossils’ that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup.

  4. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  5. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  6. Complex high-resolution linkage disequilibrium and haplotype patterns of single-nucleotide polymorphisms in 2.5 Mb of sequence on human chromosome 21.

    Science.gov (United States)

    Olivier, M; Bustos, V I; Levy, M R; Smick, G A; Moreno, I; Bushard, J M; Almendras, A A; Sheppard, K; Zierten, D L; Aggarwal, A; Carlson, C S; Foster, B D; Vo, N; Kelly, L; Liu, X; Cox, D R

    2001-11-01

    One approach to identify potentially important segments of the human genome is to search for DNA regions with nonrandom patterns of human sequence variation. Previous studies have investigated these patterns primarily in and around candidate gene regions. Here, we determined patterns of DNA sequence variation in 2.5 Mb of finished sequence from five regions on human chromosome 21. By sequencing 13 individual chromosomes, we identified 1460 single-nucleotide polymorphisms (SNPs) and obtained unambiguous haplotypes for all chromosomes. For all five chromosomal regions, we observed segments with high linkage disequilibrium (LD), extending from 1.7 to>81 kb (average 21.7 kb), disrupted by segments of similar or larger size with no significant LD between SNPs. At least 25% of the contig sequences consisted of segments with high LD between SNPs. Each of these segments was characterized by a restricted number of observed haplotypes,with the major haplotype found in over 60% of all chromosomes. In contrast, the interspersed segments with low LD showed significantly more haplotype patterns. The position and extent of the segments of high LD with restricted haplotype variability did not coincide with the location of coding sequences. Our results indicate that LD and haplotype patterns need to be investigated with closely spaced SNPs throughout the human genome, independent of the location of coding sequences, to reliably identify regions with significant LD useful for disease association studies.

  7. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Directory of Open Access Journals (Sweden)

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  8. Plasmidome-analysis of ESBL-producing escherichia coli using conventional typing and high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Alma Brolund

    Full Text Available Infections caused by Extended spectrum β-lactamase (ESBL-producing E. coli are an emerging global problem, threatening the effectiveness of the extensively used β-lactam antibiotics. ESBL dissemination is facilitated by plasmids, transposons, and other mobile elements. We have characterized the plasmid content of ESBL-producing E. coli from human urinary tract infections. Ten diverse isolates were selected; they had unrelated pulsed-field gel electrophoresis (PFGE types (<90% similarity, were from geographically dispersed locations and had diverging antibiotic resistance profiles. Three isolates belonged to the globally disseminated sequence type ST131. ESBL-genes of the CTX-M-1 and CTX-M-9 phylogroups were identified in all ten isolates. The plasmid content (plasmidome of each strain was analyzed using a combination of molecular methods and high-throughput sequencing. Hidden Markov Model-based analysis of unassembled sequencing reads was used to analyze the genetic diversity of the plasmid samples and to detect resistance genes. Each isolate contained between two and eight distinct plasmids, and at least 22 large plasmids were identified overall. The plasmids were variants of pUTI89, pKF3-70, pEK499, pKF3-140, pKF3-70, p1ESCUM, pEK204, pHK17a, p083CORR, R64, pLF82, pSFO157, and R721. In addition, small cryptic high copy-number plasmids were frequent, containing one to seven open reading frames per plasmid. Three clustered groups of such small cryptic plasmids could be distinguished based on sequence similarity. Extrachromosomal prophages were found in three isolates. Two of them resembled the E. coli P1 phage and one was previously unknown. The present study confirms plasmid multiplicity in multi-resistant E. coli. We conclude that high-throughput sequencing successfully provides information on the extrachromosomal gene content and can be used to generate a genetic fingerprint of possible use in epidemiology. This could be a valuable tool for

  9. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  10. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome.

    Science.gov (United States)

    Ren, Yi; Zhao, Hong; Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F(8) population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits.

  11. Low environmental impact bleaching sequences for attaining high brightness level with eucalyptus SPP pulp

    Directory of Open Access Journals (Sweden)

    M. M. Costa

    2009-03-01

    Full Text Available The alternatives used for minimizing the usage of chlorine dioxide in bleaching sequences included a hot acid hydrolysis (Ahot stage, the use of hot chlorine dioxide (Dhot and ozone stages at medium consistency and high consistency (Zmc and Zhc, in addition to stages with atmospheric hydrogen peroxide (P and pressurized hydrogen peroxide (PO. The results were interpreted based on the cost of the chemical products, bleaching process yields and on minimizing the environmental impact of the bleaching process. In spite of some process restrictions, high ISO brightness levels were kept around 90 % brightness. Additionally, the inclusion of stages like acid hydrolysis, pressurized peroxide and ozone in the bleaching sequences provided an increase in operating flexibility, aimed at reducing environmental impact (ECF Light. The Dhot(EOPD(PO sequence presented lower operating cost for ISO brightness above 92 %. However, this kind of sequence was not allowed for closing the wastewater circuit, even partially. For ISO brightness level around 91%, the AhotZhcDP sequence presented a lower operating cost than the others.

  12. Research progress of plant population genomics based on high-throughput sequencing.

    Science.gov (United States)

    Yunsheng, Wang

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  13. Mitochondrial genome sequences of Artemia tibetiana and Artemia urmiana: assessing molecular changes for high plateau adaptation.

    Science.gov (United States)

    Zhang, Hangxiao; Luo, Qibin; Sun, Jing; Liu, Fei; Wu, Gang; Yu, Jun; Wang, Weiwei

    2013-05-01

    Brine shrimps, Artemia (Crustacea, Anostraca), inhabit hypersaline environments and have a broad geographical distribution from sea level to high plateaus. Artemia therefore possess significant genetic diversity, which gives them their outstanding adaptability. To understand this remarkable plasticity, we sequenced the mitochondrial genomes of two Artemia tibetiana isolates from the Tibetan Plateau in China and one Artemia urmiana isolate from Lake Urmia in Iran and compared them with the genome of a low-altitude Artemia, A. franciscana. We compared the ratio of the rate of nonsynonymous (Ka) and synonymous (Ks) substitutions (Ka/Ks ratio) in the mitochondrial protein-coding gene sequences and found that atp8 had the highest Ka/Ks ratios in comparisons of A. franciscana with either A. tibetiana or A. urmiana and that atp6 had the highest Ka/Ks ratio between A. tibetiana and A. urmiana. Atp6 may have experienced strong selective pressure for high-altitude adaptation because although A. tibetiana and A. urmiana are closely related they live at different altitudes. We identified two extended termination-associated sequences and three conserved sequence blocks in the D-loop region of the mitochondrial genomes. We propose that sequence variations in the D-loop region and in the subunits of the respiratory chain complexes independently or collectively contribute to the adaptation of Artemia to different altitudes.

  14. Semi-Automated Library Preparation for High-Throughput DNA Sequencing Platforms

    Directory of Open Access Journals (Sweden)

    Eveline Farias-Hesson

    2010-01-01

    Full Text Available Next-generation sequencing platforms are powerful technologies, providing gigabases of genetic information in a single run. An important prerequisite for high-throughput DNA sequencing is the development of robust and cost-effective preprocessing protocols for DNA sample library construction. Here we report the development of a semi-automated sample preparation protocol to produce adaptor-ligated fragment libraries. Using a liquid-handling robot in conjunction with Carboxy Terminated Magnetic Beads, we labeled each library sample using a unique 6 bp DNA barcode, which allowed multiplex sample processing and sequencing of 32 libraries in a single run using Applied Biosystems' SOLiD sequencer. We applied our semi-automated pipeline to targeted medical resequencing of nuclear candidate genes in individuals affected by mitochondrial disorders. This novel method is capable of preparing as much as 32 DNA libraries in 2.01 days (8-hour workday for emulsion PCR/high throughput DNA sequencing, increasing sample preparation production by 8-fold.

  15. Norm-Transgression Sequences in the Classroom Interaction at a Madrid High School

    Science.gov (United States)

    Alcala Recuerda, Esther

    2010-01-01

    This paper studies high school classroom sequences, compiled through critical sociolinguistic ethnography, where norm-transgression is made explicit, and how authority is recovered by the teacher after an open period where class participants generally seize to digress. This way, we will be able to approach several dimensions of linguistic…

  16. Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome

    NARCIS (Netherlands)

    Montaña, José Salvador; Jiménez Avella, Diego; Angel, Tatiana; Hernández, Mónica; Baena, Sandra

    2012-01-01

    Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clon

  17. The Importance of Agriculture Science Course Sequencing in High Schools: A View from Collegiate Agriculture Students

    Science.gov (United States)

    Wheelus, Robin P.

    2009-01-01

    The objective of this study was to investigate the importance of Agriculture Science course sequencing in high schools, as a preparatory factor for students enrolled in collegiate agriculture classes. With the variety of courses listed in the Texas Essential Knowledge and Skills (TEKS) for Agriculture Science, it has been possible for counselors,…

  18. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

    NARCIS (Netherlands)

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R.; Verstrepen, Kevin J.; Thevelein, Johan M.; Tohme, Joe

    2014-01-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still

  19. Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies

    NARCIS (Netherlands)

    Jazayeri, Omid

    2016-01-01

    The research presented in this thesis focuses on using Whole Exome Sequencing (WES) to unravel the genetic basis of human hereditary disorders with different inheritance patterns. We set out to apply WES as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group

  20. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform

    DEFF Research Database (Denmark)

    Fordyce, Sarah L; Avila-Arcos, Maria C; Rockenbauer, Eszter;

    2011-01-01

    The analysis and profiling of short tandem repeat (STR) loci is routinely used in forensic genetics. Current methods to investigate STR loci, including PCR-based standard fragment analyses and capillary electrophoresis, only provide amplicon lengths that are used to estimate the number of STR...... repeat units. These methods do not allow for the full resolution of STR base composition that sequencing approaches could provide. Here we present an STR profiling method based on the use of the Roche Genome Sequencer (GS) FLX to simultaneously sequence multiple core STR loci. Using this method...

  1. Sequence similarity between the erythrocyte binding domain 1 of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals binding residues for the Duffy Antigen Receptor for Chemokines

    Directory of Open Access Journals (Sweden)

    Garry Robert F

    2011-01-01

    Full Text Available Abstract Background The surface glycoprotein (SU, gp120 of the human immunodeficiency virus (HIV must bind to a chemokine receptor, CCR5 or CXCR4, to invade CD4+ cells. Plasmodium vivax uses the Duffy Binding Protein (DBP to bind the Duffy Antigen Receptor for Chemokines (DARC and invade reticulocytes. Results Variable loop 3 (V3 of HIV-1 SU and domain 1 of the Plasmodium vivax DBP share a sequence similarity. The site of amino acid sequence similarity was necessary, but not sufficient, for DARC binding and contained a consensus heparin binding site essential for DARC binding. Both HIV-1 and P. vivax can be blocked from binding to their chemokine receptors by the chemokine, RANTES and its analog AOP-RANTES. Site directed mutagenesis of the heparin binding motif in members of the DBP family, the P. knowlesi alpha, beta and gamma proteins abrogated their binding to erythrocytes. Positively charged residues within domain 1 are required for binding of P. vivax and P. knowlesi erythrocyte binding proteins. Conclusion A heparin binding site motif in members of the DBP family may form part of a conserved erythrocyte receptor binding pocket.

  2. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data.

    Science.gov (United States)

    Junttila, Sini; Rudd, Stephen

    2012-10-30

    Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first glimpse into the molecular nature of the lichen symbiosis and

  3. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data

    Directory of Open Access Journals (Sweden)

    Junttila Sini

    2012-10-01

    Full Text Available Abstract Background Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. Results Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. Conclusions Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first

  4. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom.

    Science.gov (United States)

    Turton, Jane F; Wright, Laura; Underwood, Anthony; Witney, Adam A; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-08-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies.

  5. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus;

    2016-01-01

    with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates......Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards...

  6. Division of high resolution sequence stratigraphy units with wavelet transform of logs in Dagang Oilfield

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Division of high resolution sequence stratigraphy units based on wavelet transform of logging data is found to be good at identifying subtle cycles of geological process in Kongnan area of Dagang Oilfield. The analysis of multi-scales gyre of formation with 1-D continuous Dmey wavelet transform of log curve (GR) and 1-D discrete Daubechies wavelet transform of log curve (Rt) all make the division of sequence interfaces more objective and precise, which avoids the artificial influence with core analysis and the uncertainty with seismic data and core analysis.

  7. A mini-IRES sequence for stringent selection of high producers

    Indian Academy of Sciences (India)

    Jun Yan; Hailin Yang; Guohua Yue; Wenda Gao

    2013-06-01

    Internal Ribosome Entry Site (IRES) sequences have been widely used to link the expression of two independent proteins on the same mRNA transcript. Genes encoding fluorescent proteins or drug-resistance enzymes are usually placed downstream of IRES, serving as expression indicators or selection markers. In biological applications where the upstream gene-of-interest is to be expressed at extremely high levels, it is often desirable to purposely reduce IRES downstream gene expression to economize the cellular resources and/or to generate more stringent selection pressure. Here we describe a miniature IRES mutant sequence (IRESmut3) with dramatically diminished co-translational efficiency to fulfill these purposes.

  8. High-quality genome sequence and description of Paenibacillus dakarensis sp. nov.

    Directory of Open Access Journals (Sweden)

    C.I. Lo

    2016-03-01

    Full Text Available Strain FF9T was isolated in Dakar (Senegal from a blood-culture taken from a 16-month-old child. MALDI-TOF analysis did not allow for identification. After sequencing, strain FF9T exhibited 98.18% similarity with the 16SrRNA sequence of Paenibacillus uliginis. A polyphasic study of phenotypic and genomic analyses showed that strain FF9T is Gram variable, catalase-positive, and presents a genome of 4,569,428 bp (one chromosome but no plasmid with 4,427genes (4,352 protein-coding and 75 RNA genes (including 3 rRNA operons. The G+C content is 45.7%. On the basis of these genomic and phenotypic data analyses, we propose the creation of Paenibacillus dakarensis strain FF9T.

  9. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology

    Science.gov (United States)

    Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M

    2007-01-01

    Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the

  10. Achieving high throughput sequencing of a cDNA library utilizing an alternative protocol for the bench top next-generation sequencing system.

    Science.gov (United States)

    Wan, Minxi; Faruq, Junaid; Rosenberg, Julian N; Xia, Jinlan; Oyler, George A; Betenbaugh, Michael J

    2013-02-15

    The development of next-generation sequencing (NGS) technologies has provided novel tools for genome analysis and expression profiling. A high throughput cDNA sequencing method using a bench top next-generation sequencing system, GS Junior, is now available. Here, we used an alternative protocol to the standard method for generating the cDNA library. This protocol can decrease the number of processing steps to manipulate RNA when constructing a cDNA library from an RNA sample, and does not require mRNA isolation from total RNA. Thus it can decrease the risk of RNA degradation and the cost for preparing a cDNA library. Also, the efficiency of sequencing data obtained with this approach is comparable to the standard method as verified by sequencing characteristics and expression levels of the reference gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH).

  11. Genes encoding two Theileria parva antigens recognized by CD8+ T-cells exhibit sequence diversity in South Sudanese cattle populations but the majority of alleles are similar to the Muguga component of the live vaccine cocktail

    Science.gov (United States)

    Pelle, Roger; Mwacharo, Joram M.; Njahira, Moses N.; Marcellino, Wani L.; Kiara, Henry; Malak, Agol K.; EL Hussein, Abdel Rahim M.; Bishop, Richard; Skilton, Robert A.

    2017-01-01

    East Coast fever (ECF), caused by Theileria parva infection, is a frequently fatal disease of cattle in eastern, central and southern Africa, and an emerging disease in South Sudan. Immunization using the infection and treatment method (ITM) is increasingly being used for control in countries affected by ECF, but not yet in South Sudan. It has been reported that CD8+ T-cell lymphocytes specific for parasitized cells play a central role in the immunity induced by ITM and a number of T. parva antigens recognized by parasite-specific CD8+ T-cells have been identified. In this study we determined the sequence diversity among two of these antigens, Tp1 and Tp2, which are under evaluation as candidates for inclusion in a sub-unit vaccine. T. parva samples (n = 81) obtained from cattle in four geographical regions of South Sudan were studied for sequence polymorphism in partial sequences of the Tp1 and Tp2 genes. Eight positions (1.97%) in Tp1 and 78 positions (15.48%) in Tp2 were shown to be polymorphic, giving rise to four and 14 antigen variants in Tp1 and Tp2, respectively. The overall nucleotide diversity in the Tp1 and Tp2 genes was π = 1.65% and π = 4.76%, respectively. The parasites were sampled from regions approximately 300 km apart, but there was limited evidence for genetic differentiation between populations. Analyses of the sequences revealed limited numbers of amino acid polymorphisms both overall and in residues within the mapped CD8+ T-cell epitopes. Although novel epitopes were identified in the samples from South Sudan, a large number of the samples harboured several epitopes in both antigens that were similar to those in the T. parva Muguga reference stock, which is a key component in the widely used live vaccine cocktail. PMID:28231338

  12. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Science.gov (United States)

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  13. High-resolution mapping and transcriptional activity analysis of chicken centromere sequences on giant lampbrush chromosomes.

    Science.gov (United States)

    Krasikova, Alla; Fukagawa, Tatsuo; Zlotina, Anna

    2012-12-01

    Exploration into morphofunctional organisation of centromere DNA sequences is important for understanding the mechanisms of kinetochore specification and assembly. In-depth epigenetic analysis of DNA fragments associated with centromeric nucleosome proteins has demonstrated unique features of centromere organisation in chicken karyotype: there are both mature centromeres, which comprise chromosome-specific homogeneous arrays of tandem repeats, and recently evolved primitive centromeres, which consist of non-tandemly organised DNA sequences. In this work, we describe the arrangement and transcriptional activity of chicken centromere repeats for Cen1, Cen2, Cen3, Cen4, Cen7, Cen8, and Cen11 and non-repetitive centromere sequences of chromosomes 5, 27, and Z using highly elongated lampbrush chromosomes, which are characteristic of the diplotene stage of oogenesis. The degree of chromatin packaging and fine spatial organisations of tandemly repetitive and non-tandemly repetitive centromeric sequences significantly differ at the lampbrush stage. Using DNA/RNA FISH, we have demonstrated that during the lampbrush stage, DNA sequences are transcribed within the centromere regions of chromosomes that lack centromere-specific tandem repeats. In contrast, chromosome-specific centromeric repeats Cen1, Cen2, Cen3, Cen4, Cen7, Cen8, and Cen11 do not demonstrate any transcriptional activity during the lampbrush stage. In addition, we found that CNM repeat cluster localises adjacent to non-repetitive centromeric sequences in chicken microchromosome 27 indicating that centromere region in this chromosome is repeat-rich. Cross-species FISH allowed localisation of the sequences homologous to centromeric DNA of chicken chromosomes 5 and 27 in centromere regions of quail orthologous chromosomes.

  14. An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones

    Science.gov (United States)

    Butterfield, Yaron S. N.; Marra, Marco A.; Asano, Jennifer K.; Chan, Susanna Y.; Guin, Ranabir; Krzywinski, Martin I.; Lee, Soo Sen; MacDonald, Kim W. K.; Mathewson, Carrie A.; Olson, Teika E.; Pandoh, Pawan K.; Prabhu, Anna-Liisa; Schnerch, Angelique; Skalska, Ursula; Smailus, Duane E.; Stott, Jeff M.; Tsai, Miranda I.; Yang, George S.; Zuyderduyn, Scott D.; Schein, Jacqueline E.; Jones, Steven J. M.

    2002-01-01

    We describe an efficient high-throughput method for accurate DNA sequencing of entire cDNA clones. Developed as part of our involvement in the Mammalian Gene Collection full-length cDNA sequencing initiative, the method has been used and refined in our laboratory since September 2000. Amenable to large scale projects, we have used the method to generate >7 Mb of accurate sequence from 3695 candidate full-length cDNAs. Sequencing is accomplished through the insertion of Mu transposon into cDNAs, followed by sequencing reactions primed with Mu-specific sequencing primers. Transposon insertion reactions are not performed with individual cDNAs but rather on pools of up to 96 clones. This pooling strategy reduces the number of transposon insertion sequencing libraries that would otherwise be required, reducing the costs and enhancing the efficiency of the transposon library construction procedure. Sequences generated using transposon-specific sequencing primers are assembled to yield the full-length cDNA sequence, with sequence editing and other sequence finishing activities performed as required to resolve sequence ambiguities. Although analysis of the many thousands (22 785) of sequenced Mu transposon insertion events revealed a weak sequence preference for Mu insertion, we observed insertion of the Mu transposon into 1015 of the possible 1024 5mer candidate insertion sites. PMID:12034834

  15. Extended self-similarity in moment-generating-functions in wall-bounded turbulence at high Reynolds number

    CERN Document Server

    Yang, Xiang I A; Marusic, Ivan; Biferale, Luca

    2016-01-01

    In wall-bounded turbulence, the moment generating functions (MGFs) of the streamwise velocity fluctuations $\\left$ develop power-law scaling as a function of the wall normal distance $z/\\delta$. Here $u$ is the streamwise velocity fluctuation, $+$ indicates normalization in wall units (averaged friction velocity), $z$ is the distance from the wall, $q$ is an independent variable and $\\delta$ is the boundary layer thickness. Previous work has shown that this power-law scaling exists in the log-region {\\small $3Re_\\tau^{0.5}\\lesssim z^+$, $z\\lesssim 0.15\\delta$}, where $Re_\\tau$ is the friction velocity-based Reynolds numbers. Here we present empirical evidence that this self-similar scaling can be extended, including bulk and viscosity-affected regions $30Similarity (ESS), i.e. self-scaling of the MGFs as a function of one reference value, $q_o$. ESS also improves the scaling properties, leading to more precise measurements of th...

  16. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  17. HIGH RESOLUTION IMAGE PROJECTION IN FREQUENCY DOMAIN FOR CONTINUOUS IMAGE SEQUENCE

    Directory of Open Access Journals (Sweden)

    M. Nagaraju Naik

    2010-09-01

    Full Text Available Unlike most other information technologies, which have enjoyed an exponential growth for the past several decades, display resolution has largely stagnated. Low display resolution has in turn limited the resolution of digital images. Scaling is a non-trivial process that involves a trade-off between efficiency, smoothness and sharpness. As the size of an image is increased, so the pixels, which comprise the image, become increasingly visible, making the image to appear soft. Super scalar representation of image sequence is limited due to image information present in low dimensional image sequence. To project a image frame sequence into high-resolution static or fractional scalingvalue, a scaling approach is developed based on energy spectral interpolation and frequency spectral interpolation techniques. To realize the frequency spectral resolution Cubic-B-Spline method is used.

  18. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing

    DEFF Research Database (Denmark)

    Gamba, Cristina; Hanghøj, Kristian Ebbesen; Gaunitz, Charleen

    2016-01-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs...... of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning...... a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules...

  19. High sequence variability among hemocyte-specific Kazal-type proteinase inhibitors in decapod crustaceans.

    Science.gov (United States)

    Cerenius, Lage; Liu, Haipeng; Zhang, Yanjiao; Rimphanitchayakit, Vichien; Tassanakajon, Anchalee; Gunnar Andersson, M; Söderhäll, Kenneth; Söderhäll, Irene

    2010-01-01

    Crustacean hemocytes were found to produce a large number of transcripts coding for Kazal-type proteinase inhibitors (KPIs). A detailed study performed with the crayfish Pacifastacus leniusculus and the shrimp Penaeus monodon revealed the presence of at least 26 and 20 different Kazal domains from the hemocyte KPIs, respectively. Comparisons with KPIs from other taxa indicate that the sequences of these domains evolve rapidly. A few conserved positions, e.g. six invariant cysteines were present in all domain sequences whereas the position of P1 amino acid, a determinant for substrate specificity, varied highly. A study with a single crayfish animal suggested that even at the individual level considerable sequence variability among hemocyte KPIs produced exist. Expression analysis of four crayfish KPI transcripts in hematopoietic tissue cells and different hemocyte types suggest that some of these KPIs are likely to be involved in hematopoiesis or hemocyte release as they were produced in particular hemocyte types or maturation stages only.

  20. High-throughput novel microsatellite marker of faba bean via next generation sequencing

    Directory of Open Access Journals (Sweden)

    Yang Tao

    2012-11-01

    Full Text Available Abstract Background Faba bean (Vicia faba L. is an important food legume crop, grown for human consumption globally including in China, Turkey, Egypt and Ethiopia. Although genetic gain has been made through conventional selection and breeding efforts, this could be substantially improved through the application of molecular methods. For this, a set of reliable molecular markers representative of the entire genome is required. Results A library with 125,559 putative SSR sequences was constructed and characterized for repeat type and length from a mixed genome of 247 spring and winter sown faba bean genotypes using 454 sequencing. A suit of 28,503 primer pair sequences were designed and 150 were randomly selected for validation. Of these, 94 produced reproducible amplicons that were polymorphic among 32 faba bean genotypes selected from diverse geographical locations. The number of alleles per locus ranged from 2 to 8, the expected heterozygocities ranged from 0.0000 to 1.0000, and the observed heterozygosities ranged from 0.0908 to 0.8410. The validation by UPGMA cluster analysis of 32 genotypes based on Nei's genetic distance, showed high quality and effectiveness of those novel SSR markers developed via next generation sequencing technology. Conclusions Large scale SSR marker development was successfully achieved using next generation sequencing of the V. faba genome. These novel markers are valuable for constructing genetic linkage maps, future QTL mapping, and marker-assisted trait selection in faba bean breeding efforts.

  1. Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

    Science.gov (United States)

    Lee, Seungeun; Yamamoto, Naomichi

    2015-12-01

    This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers.

  2. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  3. Similar changes in muscle lipid metabolism are induced by chronic high-fructose feeding and high-fat feeding in C57BL/J6 mice.

    Science.gov (United States)

    Song, Guang-Yao; Ren, Lu-Ping; Chen, Shu-Chun; Wang, Chao; Liu, Na; Wei, Li-Min; Li, Fan; Sun, Wen; Peng, Lan-Bo; Tang, Yong

    2012-12-01

    The aim of the present study was to investigate the effects of high fructose and high fat feeding on muscle lipid metabolism and to illustrate the mechanisms by which the two different dietary factors induce muscle lipid accumulation. C57BL/J6 mice were fed either a standard, high-fructose (HFru) or high-fat diet. After 16 weeks feeding, mice were killed and plasma triglyceride (TG) and free fatty acid (FFA) levels were detected. In addition, muscle TG and long chain acyl CoA (LCACoA) content was determined, glucose tolerance was evaluated and the protein content of fatty acid translocase CD36 (FATCD36) in muscle was measured. Mitochondrial oxidative function in the muscle was evaluated by estimating the activity of oxidative enzymes, namely cytochrome oxidase (COx), citrate synthase (CS) and β-hydroxyacyl CoA dehydrogenase (β-HAD), and the muscle protein content of carnitine palmitoyltransferase-1 (CPT-1), cyclo-oxygenase (COX)-1 and proliferator-activated receptor coactivator (PGC)-1α was determined. Finally, sterol regulatory element-binding protein-1c (SREBP-1c) gene expression and fatty acid synthase (FAS) protein content were determined in muscle tissues. After 16 weeks, plasma TG and FFA levels were significantly increased in both the HFru and HF groups. In addition, mice in both groups exhibited significant increases in muscle TG and LCACoA content. Compared with mice fed the standard diet (control group), those in the HFru and HF groups developed glucose intolerance and exhibited increased FATCD36 protein levels, enzyme activity related to fatty acid utilization in the mitochondria and protein expressions of CPT-1, COX-1 and PGC-1α in muscle tissue. Finally, mice in both the HFru and HF groups exhibited increase SREBP-1c expression and FAS protein content. In conclusion, high fructose and high fat feeding lead to similar changes in muscle lipid metabolism in C57BL/J6 mice. Lipid accumulation in the muscle may be associated with increased expression

  4. Transcription factor binding sites are highly enriched within microRNA precursor sequences

    Directory of Open Access Journals (Sweden)

    Piriyapongsa Jittima

    2011-12-01

    Full Text Available Abstract Background Transcription factors are thought to regulate the transcription of microRNA genes in a manner similar to that of protein-coding genes; that is, by binding to conventional transcription factor binding site DNA sequences located in or near promoter regions that lie upstream of the microRNA genes. However, in the course of analyzing the genomics of human microRNA genes, we noticed that annotated transcription factor binding sites commonly lie within 70- to 110-nt long microRNA small hairpin precursor sequences. Results We report that about 45% of all human small hairpin microRNA (pre-miR sequences contain at least one predicted transcription factor binding site motif that is conserved across human, mouse and rat, and this rises to over 75% if one excludes primate-specific pre-miRs. The association is robust and has extremely strong statistical significance; it affects both intergenic and intronic pre-miRs and both isolated and clustered microRNA genes. We also confirmed and extended this finding using a separate analysis that examined all human pre-miR sequences regardless of conservation across species. Conclusions The transcription factor binding sites localized within small hairpin microRNA precursor sequences may possibly regulate their transcription. Transcription factors may also possibly bind directly to nascent primary microRNA gene transcripts or small hairpin microRNA precursors and regulate their processing. Reviewers This article was reviewed by Guillaume Bourque (nominated by Jerzy Jurka, Dmitri Pervouchine (nominated by Mikhail Gelfand, and Yuriy Gusev.

  5. A highly specialized flavin mononucleotide riboswitch responds differently to similar ligands and confers roseoflavin resistance to Streptomyces davawensis.

    Science.gov (United States)

    Pedrolli, Danielle Biscaro; Matern, Andreas; Wang, Joy; Ester, Miriam; Siedler, Kathrin; Breaker, Ronald; Mack, Matthias

    2012-09-01

    Streptomyces davawensis is the only organism known to synthesize the antibiotic roseoflavin, a riboflavin (vitamin B2) analog. Roseoflavin is converted to roseoflavin mononucleotide (RoFMN) and roseoflavin adenine dinucleotide in the cytoplasm of target cells. (Ribo-)Flavin mononucleotide (FMN) riboswitches are genetic elements, which in many bacteria control genes responsible for the biosynthesis and transport of riboflavin. Streptomyces davawensis is roseoflavin resistant, and the closely related bacterium Streptomyces coelicolor is roseoflavin sensitive. The two bacteria served as models to investigate roseoflavin resistance of S. davawensis and to analyze the mode of action of roseoflavin in S. coelicolor. Our experiments demonstrate that the ribB FMN riboswitch of S. davawensis (in contrast to the corresponding riboswitch of S. coelicolor) is able to discriminate between the two very similar flavins FMN and RoFMN and shows opposite responses to the latter ligands.

  6. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  7. An improved, high-quality draft genome sequence of the Germination-Arrest Factor-producing Pseudomonas fluorescens WH6

    Directory of Open Access Journals (Sweden)

    Creason Allison L

    2010-09-01

    Full Text Available Abstract Background Pseudomonas fluorescens is a genetically and physiologically diverse species of bacteria present in many habitats and in association with plants. This species of bacteria produces a large array of secondary metabolites with potential as natural products. P. fluorescens isolate WH6 produces Germination-Arrest Factor (GAF, a predicted small peptide or amino acid analog with herbicidal activity that specifically inhibits germination of seeds of graminaceous species. Results We used a hybrid next-generation sequencing approach to develop a high-quality draft genome sequence for P. fluorescens WH6. We employed automated, manual, and experimental methods to further improve the draft genome sequence. From this assembly of 6.27 megabases, we predicted 5876 genes, of which 3115 were core to P. fluorescens and 1567 were unique to WH6. Comparative genomic studies of WH6 revealed high similarity in synteny and orthology of genes with P. fluorescens SBW25. A phylogenomic study also placed WH6 in the same lineage as SBW25. In a previous non-saturating mutagenesis screen we identified two genes necessary for GAF activity in WH6. Mapping of their flanking sequences revealed genes that encode a candidate anti-sigma factor and an aminotransferase. Finally, we discovered several candidate virulence and host-association mechanisms, one of which appears to be a complete type III secretion system. Conclusions The improved high-quality draft genome sequence of WH6 contributes towards resolving the P. fluorescens species, providing additional impetus for establishing two separate lineages in P. fluorescens. Despite the high levels of orthology and synteny to SBW25, WH6 still had a substantial number of unique genes and represents another source for the discovery of genes with implications in affecting plant growth and health. Two genes are demonstrably necessary for GAF and further characterization of their proteins is important for developing

  8. Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing

    Science.gov (United States)

    Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad

    2015-01-01

    Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407

  9. Frequency-locked pulse sequencer for high-frame-rate monochromatic tissue motion imaging.

    Science.gov (United States)

    Azar, Reza Zahiri; Baghani, Ali; Salcudean, Septimiu E; Rohling, Robert

    2011-04-01

    To overcome the inherent low frame rate of conventional ultrasound, we have previously presented a system that can be implemented on conventional ultrasound scanners for high-frame-rate imaging of monochromatic tissue motion. The system employs a sector subdivision technique in the sequencer to increase the acquisition rate. To eliminate the delays introduced during data acquisition, a motion phase correction algorithm has also been introduced to create in-phase displacement images. Previous experimental results from tissue- mimicking phantoms showed that the system can achieve effective frame rates of up to a few kilohertz on conventional ultrasound systems. In this short communication, we present a new pulse sequencing strategy that facilitates high-frame-rate imaging of monochromatic motion such that the acquired echo signals are inherently in-phase. The sequencer uses the knowledge of the excitation frequency to synchronize the acquisition of the entire imaging plane to that of an external exciter. This sequencing approach eliminates any need for synchronization or phase correction and has applications in tissue elastography, which we demonstrate with tissue-mimicking phantoms.

  10. New ancient DNA sequences suggest high genetic diversity for the woolly mammoth (Mammuthus primigenius )

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Partial DNA sequences of cytochrome b gene (mtDNA) were successfully retrieved from Late Pleistocene fossil bone of Mammuthus primigenius collected from the Xiguitu County (Yakeshi), Inner Mongolia Autonomous Region and from Zhaodong, Harbin of Heilongjiang Province in northern China. Two ancient DNA fragments ( 109 bp and 124 bp) were authenticated by reproducible experiments in two different laboratories and by phylogenetic analysis with other Elephantidae taxa. Phylogenetic analysis using these sequences and published data in either separate or combined datasets indicate unstable relationship among the woolly mammoth and the two living elephants, Elephas and Loxodonta. In addition to the short sequences used to attempt the long independent evolution of Elephantidae terminal taxa, we suggest that a high intra-specific diversity existed in Mammuthus primigenius crossing both spatial and temporal ranges, resulting in a complex and divergent genetic background for DNA sequences so far recovered. The high genetic diversity in the extinct woolly mammoth can explain the apparent instability of Elephantidae taxa on the molecular phylogenetic trees and can reconcile the apparent paradox regarding the unresolved Elephantidae trichotomy.

  11. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

    Science.gov (United States)

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe

    2014-04-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.

  12. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  13. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Science.gov (United States)

    Kawalia, Amit; Motameny, Susanne; Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  14. Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.

    Directory of Open Access Journals (Sweden)

    Amit Kawalia

    Full Text Available Next generation sequencing (NGS has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.

  15. Abnormal Functional Specialization within Medial Prefrontal Cortex in High-Functioning Autism: A Multi-Voxel Similarity Analysis

    Science.gov (United States)

    Gilbert, Sam J.; Meuwese, Julia D. I.; Towgood, Karren J.; Frith, Christopher D.; Burgess, Paul W.

    2009-01-01

    Multi-voxel pattern analyses have proved successful in "decoding" mental states from fMRI data, but have not been used to examine brain differences associated with atypical populations. We investigated a group of 16 (14 males) high-functioning participants with autism spectrum disorder (ASD) and 16 non-autistic control participants (12 males)…

  16. Pitfalls of mapping high throughput sequencing data to repetitive sequences: Piwi’s genomic targets still not identified

    Science.gov (United States)

    Marinov, Georgi K.; Wang, Jie; Handler, Dominik; Wold, Barbara J.; Weng, Zhiping; Hannon, Gregory J.; Aravin, Alexei A.; Zamore, Phillip D.; Brennecke, Julius; Toth, Katalin Fejes

    2015-01-01

    Huang et al. (2013) recently reported that chromatin immuno-precipitation followed by sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi - a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their underlying deep sequencing data and report that the data do not support the author’s central conclusions. PMID:25805138

  17. The regulation of thermal stress induced apoptosis in corals reveals high similarities in gene expression and function to higher animals

    Science.gov (United States)

    Kvitt, Hagit; Rosenfeld, Hanna; Tchernov, Dan

    2016-01-01

    Recent studies suggest that controlled apoptotic response provides an essential mechanism, enabling corals to respond to global warming and ocean acidification. However, the molecules involved and their functions are still unclear. To better characterize the apoptotic response in basal metazoans, we studied the expression profiles of selected genes that encode for putative pro- and anti-apoptotic mediators in the coral Stylophora pistillata under thermal stress and bleaching conditions. Upon thermal stress, as attested by the elevation of the heat-shock protein gene HSP70’s mRNA levels, the expression of all studied genes, including caspase, Bcl-2, Bax, APAF-1 and BI-1, peaked at 6–24 h of thermal stress (hts) and declined at 72 hts. Adversely, the expression levels of the survivin gene showed a shifted pattern, with elevation at 48–72 hts and a return to basal levels at 168 hts. Overall, we show the quantitative anti-apoptotic traits of the coral Bcl-2 protein, which resemble those of its mammalian counterpart. Altogether, our results highlight the similarities between apoptotic networks operating in simple metazoans and in higher animals and clearly demonstrate the activation of pro-cell survival regulators at early stages of the apoptotic response, contributing to the decline of apoptosis and the acclimation to chronic stress. PMID:27460544

  18. The regulation of thermal stress induced apoptosis in corals reveals high similarities in gene expression and function to higher animals

    Science.gov (United States)

    Kvitt, Hagit; Rosenfeld, Hanna; Tchernov, Dan

    2016-07-01

    Recent studies suggest that controlled apoptotic response provides an essential mechanism, enabling corals to respond to global warming and ocean acidification. However, the molecules involved and their functions are still unclear. To better characterize the apoptotic response in basal metazoans, we studied the expression profiles of selected genes that encode for putative pro- and anti-apoptotic mediators in the coral Stylophora pistillata under thermal stress and bleaching conditions. Upon thermal stress, as attested by the elevation of the heat-shock protein gene HSP70’s mRNA levels, the expression of all studied genes, including caspase, Bcl-2, Bax, APAF-1 and BI-1, peaked at 6–24 h of thermal stress (hts) and declined at 72 hts. Adversely, the expression levels of the survivin gene showed a shifted pattern, with elevation at 48–72 hts and a return to basal levels at 168 hts. Overall, we show the quantitative anti-apoptotic traits of the coral Bcl-2 protein, which resemble those of its mammalian counterpart. Altogether, our results highlight the similarities between apoptotic networks operating in simple metazoans and in higher animals and clearly demonstrate the activation of pro-cell survival regulators at early stages of the apoptotic response, contributing to the decline of apoptosis and the acclimation to chronic stress.

  19. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  20. BatchPrimer3: a high throughput web application for PCR and sequencing primer design.

    Science.gov (United States)

    You, Frank M; Huo, Naxin; Gu, Yong Qiang; Luo, Ming-Cheng; Ma, Yaqin; Hane, Dave; Lazo, Gerard R; Dvorak, Jan; Anderson, Olin D

    2008-05-29

    Microsatellite (simple sequence repeat - SSR) and single nucleotide polymorphism (SNP) markers are two types of important genetic markers useful in genetic mapping and genotyping. Often, large-scale genomic research projects require high-throughput computer-assisted primer design. Numerous such web-based or standard-alone programs for PCR primer design are available but vary in quality and functionality. In particular, most programs lack batch primer design capability. Such a high-throughput software tool for designing SSR flanking primers and SNP genotyping primers is increasingly demanded. A new web primer design program, BatchPrimer3, is developed based on Primer3. BatchPrimer3 adopted the Primer3 core program as a major primer design engine to choose the best primer pairs. A new score-based primer picking module is incorporated into BatchPrimer3 and used to pick position-restricted primers. BatchPrimer3 v1.0 implements several types of primer designs including generic primers, SSR primers together with SSR detection, and SNP genotyping primers (including single-base extension primers, allele-specific primers, and tetra-primers for tetra-primer ARMS PCR), as well as DNA sequencing primers. DNA sequences in FASTA format can be batch read into the program. The basic information of input sequences, as a reference of parameter setting of primer design, can be obtained by pre-analysis of sequences. The input sequences can be pre-processed and masked to exclude and/or include specific regions, or set targets for different primer design purposes as in Primer3Web and primer3Plus. A tab-delimited or Excel-formatted primer output also greatly facilitates the subsequent primer-ordering process. Thousands of primers, including wheat conserved intron-flanking primers, wheat genome-specific SNP genotyping primers, and Brachypodium SSR flanking primers in several genome projects have been designed using the program and validated in several laboratories. BatchPrimer3 is a

  1. High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps.

    Science.gov (United States)

    Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie

    2015-01-01

    The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.

  2. Structural comparison of highly similar nucleoside-diphosphate kinases: Molecular explanation of distinct membrane-binding behavior.

    Science.gov (United States)

    Francois-Moutal, L; Marcillat, O; Granjon, T

    2014-10-01

    NDPK-A, NDPK-B and NDPK-D are three enzymes which belong to the NDPK group I isoforms and are not only involved in metabolism process but also in transcriptional regulation, DNA cleavage, histidine protein kinase activity and metastasis development. Those enzymes were reported to bind to membranes either in mitochondria where NDPK-D influences cardiolipin lateral organization and is thought to be involved in apoptotic pathway or in cytosol where NDPK-A and NDPK-B membrane association was shown to influence several cellular processes like endocytosis, cellular adhesion, ion transport, etc. However, despite numerous studies, the role of NDPK-membrane association and the molecular details of the binding process are still elusive. In the present work, a comparative study of the three NDPK isoforms allowed us to show that although membrane binding is a common feature of these enzymes, mechanisms differ at the molecular scale. NDPK-A was not able to bind to model membranes mimicking the inner leaflet of plasma membrane, suggesting that its in vivo membrane association is mediated by a non-lipidic partner or other partners than the studied phospholipids. On the contrary, NDPK-B and NDPK-D were shown to bind efficiently to liposomes mimicking plasma membrane and mitochondrial inner membrane respectively but details of the binding mechanism differ between the two enzymes as NDPK-B binding necessarily involved an anionic phospholipid partner while NDPK-D can bind either zwitterionic or anionic phospholipids. Although sharing similar secondary structure and homohexameric quaternary arrangement, tryptophan fluorescence revealed fine disparities in NDPK tertiary structures. Interfacial behavior as well as ANS fluorescence showed further dissimilarities between NDPK isoforms, notably the presence of distinct accessible hydrophobic areas as well as different capacity to form Gibbs monolayers related to their surface activity properties. Those distinct features may contribute to

  3. Analysis of the Repertoire Features of TCR Beta Chain CDR3 in Human by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Xianliang Hou

    2016-07-01

    Full Text Available Background/Aims: To ward off a wide variety of pathogens, the human adaptive immune system harbors a vast array of T-cell receptors, collectively referred to as the TCR repertoire. Assessment of the repertoire features of TCR is vital for us to deeper understand of immune behaviour and immune response. Methods: In this study, we used a combination of multiplex-PCR, Illumina sequencing and IMGT (ImMunoGeneTics/HighV-QUEST for a standardized analysis of the repertoire features of TCR beta chain in the blood of healthy individuals, including the repertoire features of public TCR complementarity-determining regions (CDR3 sequences, highly expanded clones, long TCR CDR3 sequences. Results: We found that public CDR3 sequences and high-frequency sequences had the same characteristics, both of them had fewer nucleotide additions and shorter CDR3 length, which were closer to the germline sequence. Moreover, our studies provided evidence that public amino acid sequences are produced by multiple nucleotide sequences. Notably, there was skewed VDJ segment usage in long CDR3 sequences, the expression levels of 10 TRβV segments, 7 TRβJ segments and 2 TRβD segments were significantly different in the long CDR3 sequences compared to the short CDR3 sequences. Moreover, we identified that extensive N additions and increase of D gene usage contributing to TCR CDR3 length, and observed there was distinct usage frequency of amino acids in long CDR3 sequences compared to the short CDR3 sequences. Conclusions: Some repertoire features could be observed in the public sequences, highly abundance clones, and long TCR CDR3 sequences, which might be helpful for further study of immune behavior and immune response.

  4. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  5. The Main Sequences of Starforming Galaxies and Active Galactic Nuclei at High Redshift

    CERN Document Server

    Mancuso, Claudia; Shi, J; Gonzàlez-Nuevo, J; Bèthermin, M; Danese, L

    2016-01-01

    We provide a novel, unifying physical interpretation on the origin, the average shape, the scatter, and the cosmic evolution for the main sequences of starforming galaxies and active galactic nuclei at high redshift z $\\gtrsim$ 1. We achieve this goal in a model-independent way by exploiting: (i) the redshift-dependent SFR functions based on the latest UV/far-IR data from HST/Herschel, and re- lated statistics of strong gravitationally lensed sources; (ii) deterministic evolutionary tracks for the history of star formation and black hole accretion, gauged on a wealth of multiwavelength observations including the observed Eddington ratio distribution. We further validate these ingredients by showing their consistency with the observed galaxy stellar mass functions and AGN bolometric luminosity functions at different redshifts via the continuity equation approach. Our analysis of the main sequence for high-redshift galaxies and AGNs highlights that the present data are consistently interpreted in terms of an in...

  6. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing

    Science.gov (United States)

    Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  7. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  8. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  9. Transcriptome analysis of the silkworm (Bombyx mori) by high-throughput RNA sequencing.

    Science.gov (United States)

    Li, Yinü; Wang, Guozeng; Tian, Jian; Liu, Huifen; Yang, Huipeng; Yi, Yongzhu; Wang, Jinhui; Shi, Xiaofeng; Jiang, Feng; Yao, Bin; Zhang, Zhifang

    2012-01-01

    The domestic silkworm, Bombyx mori, is a model insect with important economic value for silk production that also acts as a bioreactor for biomaterial production. The functional complexity of the silkworm transcriptome has not yet been fully elucidated, although genomic sequencing and other tools have been widely used in its study. We explored the transcriptome of silkworm at different developmental stages using high-throughput paired-end RNA sequencing. A total of about 3.3 gigabases (Gb) of sequence was obtained, representing about a 7-fold coverage of the B. mori genome. From the reads that were mapped to the genome sequence; 23,461 transcripts were obtained, 5,428 of them were novel. Of the 14,623 predicted protein-coding genes in the silkworm genome database, 11,884 of them were found to be expressed in the silkworm transcriptome, giving a coverage of 81.3%. A total of 13,195 new exons were detected, of which, 5,911 were found in the annotated genes in the Silkworm Genome Database (SilkDB). An analysis of alternative splicing in the transcriptome revealed that 3,247 genes had undergone alternative splicing. To help with the data analysis, a transcriptome database that integrates our transcriptome data with the silkworm genome data was constructed and is publicly available at http://124.17.27.136/gbrowse2/. To our knowledge, this is the first study to elucidate the silkworm transcriptome using high-throughput RNA sequencing technology. Our data indicate that the transcriptome of silkworm is much more complex than previously anticipated. This work provides tools and resources for the identification of new functional elements and paves the way for future functional genomics studies.

  10. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing...... work on admixture events between Neanderthals and anatomically modern humans and but also suggested that the latter were organized in small family units whose members avoided inbreeding....

  11. High Throughput Sequencing of Germline and Tumor from Men With Early-Onset Metastatic Prostate Cancer

    Science.gov (United States)

    2014-10-01

    challenge, Dr. Tomlins has continued to develop state of the art technologies to use formalin-fixed paraffin-embedded (FFPE) prostate cancer specimens...men with early-onset, metastatic prostate cancer PRINCIPAL INVESTIGATOR: Kathleen A. Cooney, M.D. CONTRACTING ORGANIZATION...High-Throughput Sequencing of Germline and Tumor From Men with Early-Onset Metastatic Prostate Cancer 5b. GRANT NUMBER W81XWH-13-1-0371 5c

  12. Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset.

    Science.gov (United States)

    Shi, Ming-Guang; Xia, Jun-Feng; Li, Xue-Ling; Huang, De-Shuang

    2010-03-01

    Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.

  13. Exome sequencing identifies potential risk variants for Mendelian disorders at high prevalence in Qatar.

    Science.gov (United States)

    Rodriguez-Flores, Juan L; Fakhro, Khalid; Hackett, Neil R; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Nabet Al-Marri, Ajayeb; Chouchane, Lotfi; Stadler, Dora J; Mezey, Jason G; Crystal, Ronald G

    2014-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared with 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Premarital genetic screening in Qatar tests for only four out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. © 2013 WILEY PERIODICALS, INC.

  14. Characterizing ncRNAs in human pathogenic protists using high-throughput sequencing technology

    Directory of Open Access Journals (Sweden)

    Lesley Joan Collins

    2011-12-01

    Full Text Available ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, snoRNAs and long ncRNAs on a genomic scale making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases.

  15. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing.

    Science.gov (United States)

    Gamba, Cristina; Hanghøj, Kristian; Gaunitz, Charleen; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Bradley, Daniel G; Orlando, Ludovic

    2016-03-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost-effective solution for downstream applications, including DNA sequencing on HTS platforms. © 2015 John Wiley & Sons Ltd.

  16. Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

    Directory of Open Access Journals (Sweden)

    Momchilo Vuyisich

    2014-01-01

    Full Text Available Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg. There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing and de novo assembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing and de novo assembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderia spp., which have the highest GC content and are the longest, we also show that the quality of both resequencing and de novo assembly is not decreased when only 10 ng of input genomic DNA is used.

  17. High intraindividual variation in internal transcibed spacer sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics.

    Science.gov (United States)

    Denduangboripant, J; Cronk, Q C

    2000-07-22

    Aeschynanthus (Gesneriaceae) is a large genus of tropical epiphytes that is widely distributed from the Himalayas and China throughout South-East Asia to New Guinea and the Solomon Islands. Polymerase chain reaction (PCR) consensus sequences of the internal transcribed spacers (ITS) of Aeschynanthus nuclear ribosomal DNA showed sequence polymorphism that was difficult to interpret. Cloning individual sequences from the PCR product generated a phylogenetic tree of 23 Aeschynanthus species (two clones per species). The intraindividual clone pairs varied from 0 to 5.01%. We suggest that the high intraindividual sequence variation results from low molecular drive in the ITS of Aeschynanthus. However, this study shows that, despite the variation found within some individuals, it is still possible to use these data to reconstruct phylogenetic relationships of the species, suggesting that clone variation, although persistent, does not pre-date the divergence of Aeschynanthus species. The Aeschynanthus analysis revealed two major clades with different but overlapping geographic distributions and reflected classification based on morphology (particularly seed hair type).

  18. De Novo Peptide Sequencing: Deep Mining of High-Resolution Mass Spectrometry Data.

    Science.gov (United States)

    Islam, Mohammad Tawhidul; Mohamedali, Abidali; Fernandes, Criselda Santan; Baker, Mark S; Ranganathan, Shoba

    2017-01-01

    High resolution mass spectrometry has revolutionized proteomics over the past decade, resulting in tremendous amounts of data in the form of mass spectra, being generated in a relatively short span of time. The mining of this spectral data for analysis and interpretation though has lagged behind such that potentially valuable data is being overlooked because it does not fit into the mold of traditional database searching methodologies. Although the analysis of spectra by de novo sequences removes such biases and has been available for a long period of time, its uptake has been slow or almost nonexistent within the scientific community. In this chapter, we propose a methodology to integrate de novo peptide sequencing using three commonly available software solutions in tandem, complemented by homology searching, and manual validation of spectra. This simplified method would allow greater use of de novo sequencing approaches and potentially greatly increase proteome coverage leading to the unearthing of valuable insights into protein biology, especially of organisms whose genomes have been recently sequenced or are poorly annotated.

  19. End-to-End Optimization of High-Throughput DNA Sequencing.

    Science.gov (United States)

    O'Reilly, Eliza; Baccelli, Francois; De Veciana, Gustavo; Vikalo, Haris

    2016-10-01

    At the core of Illumina's high-throughput DNA sequencing platforms lies a biophysical surface process that results in a random geometry of clusters of homogeneous short DNA fragments typically hundreds of base pairs long-bridge amplification. The statistical properties of this random process and the lengths of the fragments are critical as they affect the information that can be subsequently extracted, that is, density of successfully inferred DNA fragment reads. The ensembles of overlapping DNA fragment reads are then used to computationally reconstruct the much longer target genome sequence. The success of the reconstruction in turn depends on having a sufficiently large ensemble of DNA fragments that are sufficiently long. In this article using stochastic geometry, we model and optimize the end-to-end flow cell synthesis and target genome sequencing process, linking and partially controlling the statistics of the physical processes to the success of the final computational step. Based on a rough calibration of our model, we provide, for the first time, a mathematical framework capturing the salient features of the sequencing platform that serves as a basis for optimizing cost, performance, and/or sensitivity analysis to various parameters.

  20. Escherichia coli O-Antigen Gene Clusters of Serogroups O62, O68, O131, O140, O142, and O163: DNA Sequences and Similarity between O62 and O68, and PCR-Based Serogrouping

    Directory of Open Access Journals (Sweden)

    Yanhong Liu

    2015-02-01

    Full Text Available The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase and/or wzy (O-antigen polymerase genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources.

  1. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens;

    2015-01-01

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too......-stringency in-solution hybridization method enables detection of discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral...

  2. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Directory of Open Access Journals (Sweden)

    Soichi Inagaki

    Full Text Available Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  3. Evaluation of biological activities of highly diluted nucleotide sequences by using cellular models

    Directory of Open Access Journals (Sweden)

    Pierre Dorfman

    2012-09-01

    Full Text Available Background: highly diluted specific nucleic acids (SNA®, designed to modulate viral and cytokine genes expression, are currently used in Micro-Immunotherapy to treat viral infections and immune disorders. Although some preliminary studies have showed clinical benefit of these homeopathic preparations [1], no experimental data are available to explain their mechanism of action. Aims: to investigate the in vitro effect of two sets of highly diluted (HD SNA targeting i latent/lytic Epstein-Barr virus (SNA EBV and ii TNF-α and its receptor p55 involved in rheumatoid arthritis (SNA RA on cellular models. Methodology: serial homeopathic dilutions of SNA EBV and SNA RA (15cH-18cH were tested on a EBV-positive B-lymphoblastoid (B95-8 and on a LPS-stimulated macrophage (THP1 cell lines respectively, in comparison with agitated/diluted water and scramble DNA sequences prepared in the same conditions (negative controls. For B95-8 proliferative model, high mobility group box 1 protein (HMGB1 was used as reference. Analyzed biological parameters on B95-8 were i cell proliferation measured after 24 and 48h of incubation with HD SNA and ii expression of the EBV ZEBRA protein in response to TGF-β by Western-blotting (T+24h. For THP1 model, TNF-α synthesis and release were determined by RT-qPCR and ELISA (protein, after stimulation by LPS (1µg/ml and HD SNA co-administration. Results: we demonstrated that HD SNA RA significantly down-regulated TNF-α synthesis and release. This biological activity was showed to be specific (no effect of HD scramble SNA and related to the level of dilution (maximal effect with higher dilutions. Unexpectedly, a biological effect of agitated/diluted water was also detected in both cellular models. For B95-8 model, this effect resulted in a significant decrease of B95-8 proliferation (comparable to the HMGB1 reference and an inhibition of ZEBRA expression. Similarly, a reproducible

  4. Discovery of Highly Divergent Repeat Landscapes in Snake Genomes Using High-Throughput Sequencing

    Science.gov (United States)

    Castoe, Todd A.; Hall, Kathryn T.; Guibotsy Mboulas, Marcel L.; Gu, Wanjun; de Koning, A.P. Jason; Fox, Samuel E.; Poole, Alexander W.; Vemulapalli, Vijetha; Daza, Juan M.; Mockler, Todd; Smith, Eric N.; Feschotte, Cédric; Pollock, David D.

    2011-01-01

    We conducted a comprehensive assessment of genomic repeat content in two snake genomes, the venomous copperhead (Agkistrodon contortrix) and the Burmese python (Python molurus bivittatus). These two genomes are both relatively small (∼1.4 Gb) but have surprisingly extensive differences in the abundance and expansion histories of their repeat elements. In the python, the readily identifiable repeat element content is low (21%), similar to bird genomes, whereas that of the copperhead is higher (45%), similar to mammalian genomes. The copperhead's greater repeat content arises from the recent expansion of many different microsatellites and transposable element (TE) families, and the copperhead had 23-fold greater levels of TE-related transcripts than the python. This suggests the possibility that greater TE activity in the copperhead is ongoing. Expansion of CR1 LINEs in the copperhead genome has resulted in TE-mediated microsatellite expansion (“microsatellite seeding”) at a scale several orders of magnitude greater than previously observed in vertebrates. Snakes also appear to be prone to horizontal transfer of TEs, particularly in the copperhead lineage. The reason that the copperhead has such a small genome in the face of so much recent expansion of repeat elements remains an open question, although selective pressure related to extreme metabolic performance is an obvious candidate. TE activity can affect gene regulation as well as rates of recombination and gene duplication, and it is therefore possible that TE activity played a role in the evolution of major adaptations in snakes; some evidence suggests this may include the evolution of venom repertoires. PMID:21572095

  5. High penetrance of sequencing errors and interpretative shortcomings in mtDNA sequence analysis of LHON patients.

    Science.gov (United States)

    Bandelt, Hans-Jürgen; Yao, Yong-Gang; Salas, Antonio; Kivisild, Toomas; Bravi, Claudio M

    2007-01-12

    For identifying mutation(s) that are potentially pathogenic it is essential to determine the entire mitochondrial DNA (mtDNA) sequences from patients suffering from a particular mitochondrial disease, such as Leber hereditary optic neuropathy (LHON). However, such sequencing efforts can, in the worst case, be riddled with errors by imposing phantom mutations or misreporting variant nucleotides, and moreover, by inadvertently regarding some mutations as novel and pathogenic, which are actually known to define minor haplogroups. Under such circumstances it remains unclear whether the disease-associated mutations would have been determined adequately. Here, we re-analyse four problematic LHON studies and propose guidelines by which some of the pitfalls could be avoided.

  6. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    John E McCormack

    Full Text Available Evolutionary relationships among birds in Neoaves, the clade comprising the vast majority of avian diversity, have vexed systematists due to the ancient, rapid radiation of numerous lineages. We applied a new phylogenomic approach to resolve relationships in Neoaves using target enrichment (sequence capture and high-throughput sequencing of ultraconserved elements (UCEs in avian genomes. We collected sequence data from UCE loci for 32 members of Neoaves and one outgroup (chicken and analyzed data sets that differed in their amount of missing data. An alignment of 1,541 loci that allowed missing data was 87% complete and resulted in a highly resolved phylogeny with broad agreement between the Bayesian and maximum-likelihood (ML trees. Although results from the 100% complete matrix of 416 UCE loci were similar, the Bayesian and ML trees differed to a greater extent in this analysis, suggesting that increasing from 416 to 1,541 loci led to increased stability and resolution of the tree. Novel results of our study include surprisingly close relationships between phenotypically divergent bird families, such as tropicbirds (Phaethontidae and the sunbittern (Eurypygidae as well as between bustards (Otididae and turacos (Musophagidae. This phylogeny bolsters support for monophyletic waterbird and landbird clades and also strongly supports controversial results from previous studies, including the sister relationship between passerines and parrots and the non-monophyly of raptorial birds in the hawk and falcon families. Although significant challenges remain to fully resolving some of the deep relationships in Neoaves, especially among lineages outside the waterbirds and landbirds, this study suggests that increased data will yield an increasingly resolved avian phylogeny.

  7. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing

    Directory of Open Access Journals (Sweden)

    Giancarlo Russo

    2015-12-01

    We present the first study that applies the high read accuracy and depth of single molecule, real time, circular consensus sequencing (SMRT-CCS to the detection of mutations in stool DNA in order to provide a non-invasive, sensitive and accurate test for CRC. In stool DNA isolated from patients diagnosed with adenocarcinoma, we are able to detect mutations at frequencies below 0.5% with no false positives. This approach establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and the early stage adenomas that lead to CRC.

  8. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  9. High resolution profiling of human exon methylation by liquid hybridization capture-based bisulfite sequencing

    Directory of Open Access Journals (Sweden)

    Wang Junwen

    2011-12-01

    Full Text Available Abstract Background DNA methylation plays important roles in gene regulation during both normal developmental and disease states. In the past decade, a number of methods have been developed and applied to characterize the genome-wide distribution of DNA methylation. Most of these methods endeavored to screen whole genome and turned to be enormously costly and time consuming for studies of the complex mammalian genome. Thus, they are not practical for researchers to study multiple clinical samples in biomarker research. Results Here, we display a novel strategy that relies on the selective capture of target regions by liquid hybridization followed by bisulfite conversion and deep sequencing, which is referred to as liquid hybridization capture-based bisulfite sequencing (LHC-BS. To estimate this method, we utilized about 2 μg of native genomic DNA from YanHuang (YH whole blood samples and a mature dendritic cell (mDC line, respectively, to evaluate their methylation statuses of target regions of exome. The results indicated that the LHC-BS system was able to cover more than 97% of the exome regions and detect their methylation statuses with acceptable allele dropouts. Most of the regions that couldn't provide accurate methylation information were distributed in chromosomes 6 and Y because of multiple mapping to those regions. The accuracy of this strategy was evaluated by pair-wise comparisons using the results from whole genome bisulfite sequencing and validated by bisulfite specific PCR sequencing. Conclusions In the present study, we employed a liquid hybridisation capture system to enrich for exon regions and then combined with bisulfite sequencing to examine the methylation statuses for the first time. This technique is highly sensitive and flexible and can be applied to identify differentially methylated regions (DMRs at specific genomic locations of interest, such as regulatory elements or promoters.

  10. Use of indigenous technology for the production of High Quality Cassava Flour with similar food qualities as wheat flour

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Chinedum Eleazu

    2014-09-01

    Full Text Available Background. The aim of the paper was to compare the food qualities of 2 varieties (SME 1 and 2 of high quality cassava flour (HQCF produced from indigenous technology and that of some commercially sold wheat/HQCF samples. Material and methods. The pH, proximate, phytochemical, antioxidant, functional properties and starch yield of the flours were carried out using standard techniques. Results. The wheat flours had higher bulk densities and lipids than the HQCF samples while the oil absorption capacity of the HQCF (SME 2 was higher than other fl our samples investigated. The antioxidant assays of the flours showed that they contained considerable levels of antioxidants with the HQCF sample from DAT having higher antioxidants than other flour samples studied. The HQCF (SME 1 had signifi cantly higher (P < 0.05 starch content among the flour samples. The bacteria counts of the HQCF samples ranged from 0 to 1.4 × 104 cfu/ml while the fungal count ranged from 0 to 2 × 10-3 with the unbranded wheat fl our having the highest microbial load compared with other flour samples studied. Conclusion. The use of this indigenous technology produces HQCF with lower lipids, microbial contamination but higher flavour retaining ability, flavonoids and starch contents than wheat flour. The signifi cant positive correlation (R2 = 0.872 between reducing power of the samples and their DPPH antioxidant activity indicate that either could be used to assay for the total antioxidant activity of cassava and wheat flour. The study underscores the need to buy flour from branded companies to reduce the risks of microbial contamination.

  11. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  12. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Directory of Open Access Journals (Sweden)

    Charlotte Rehm

    Full Text Available In prokaryotes simple sequence repeats (SSRs with unit sizes of 1-5 nucleotides (nt are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4 structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc, Xanthomonas axonopodis pv. citri str. 306 (Xac, and Nostoc sp. strain PCC7120 (Ana. In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  13. High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs

    Directory of Open Access Journals (Sweden)

    Darakjian Priscila

    2009-08-01

    Full Text Available Abstract Background Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated variation, is problematic for follow up analyses. Polymorphisms result in a high incidence of false positive and false negative results in hybridization based analyses and hinder the identification of the true variation underlying genetically determined differences in physiology and behavior. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice and the wealth of gene expression microarray and phenotypic studies using genetic models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing, we are now in a position to determine to what extent polymorphisms are currently cryptic in such models and their impact on downstream analyses. Results We sequenced the two most commonly used inbred mouse strains, DBA/2J and C57BL/6J, across a region of chromosome 1 (171.6 – 174.6 megabases using two next generation high-throughput sequencing platforms: Applied Biosystems (SOLiD and Illumina (Genome Analyzer. Using the same templates on both platforms, we compared realignments and single nucleotide polymorphism (SNP detection with an 80 fold average read depth across platforms and samples. While public datasets currently annotate 4,527 SNPs between the two strains in this interval, thorough high-throughput sequencing identified a total of 11,824 SNPs in the interval, including 7,663 new SNPs. Furthermore, we confirmed 40 missense SNPs and discovered 36 new missense SNPs. Conclusion Comparisons utilizing even two of the best characterized mouse genetic models, DBA/2J and C57BL/6J, indicate that more than half of naturally

  14. Localization of a new highly repeated DNA sequence of Lemur cafta (Lemuridae, Strepsirhini).

    Science.gov (United States)

    Boniotto, Michele; Ventura, Mario; Cardone, Maria Francesca; Boaretto, Francesca; Archidiacono, Nicoletta; Rocchi, Mariano; Crovella, Sergio

    2002-10-01

    We have isolated and cloned an 800-bp highly repeated DNA (HRDNA) sequence from Lemur catta (LCA) and described its localization on LCA chromosomes. Lemur catta HRDNA sequences were localized by performing FISH experiments on standard and elongated metaphasic chromosomes using an LCA HRDNA probe (LCASAT). A complex hybridization pattern was detected. A strong pericentromeric hybridization signal was observed on most LCA chromosomes. Chromosomes 7 and 13 were lit in pericentromeric regions, as well as in the interspersed heterochromatin. Chromosomes 1, 3, 4, 17, 19, X, and microchromosomes (20, 25, 26, and 27) showed no signals in the pericentromeric region, but chromosomes 3 and 4 showed a positive hybridization in heterochromatic regions. The 800-bp L catta HRDNA was species specific. We performed FISH experiments with the LCASAT probe on Eulemur macaco macaco (EMA) and Eulemur fulvus fulvus (EFU) metaphases and no positive signal of hybridization was detected. These findings were also confirmed by Southern blot analysis and PCR.

  15. Exploring Genetic Diversity in Plants Using High-Throughput Sequencing Techniques.

    Science.gov (United States)

    Onda, Yoshihiko; Mochida, Keiichi

    2016-08-01

    Food security has emerged as an urgent concern because of the rising world population. To meet the food demands of the near future, it is required to improve the productivity of various crops, not just of staple food crops. The genetic diversity among plant populations in a given species allows the plants to adapt to various environmental conditions. Such diversity could therefore yield valuable traits that could overcome the food-security challenges. To explore genetic diversity comprehensively and to rapidly identify useful genes and/or allele, advanced high-throughput sequencing techniques, also called next-generation sequencing (NGS) technologies, have been developed. These provide practical solutions to the challenges in crop genomics. Here, we review various sources of genetic diversity in plants, newly developed genetic diversity-mining tools synergized with NGS techniques, and related genetic approaches such as quantitative trait locus analysis and genome-wide association study.

  16. Molecular characterisation and similarity relationships among iranian basil (Ocimum basilicum L. accessions using inter simple sequence repeat markers Caracterização molecular de acessos de Ocimum basilicum L. por meio de marcadores ISSR

    Directory of Open Access Journals (Sweden)

    Mohammad Aghaei

    2012-06-01

    Full Text Available The study of genetic relationships is a prerequisite for plant breeding activities as well as for conservation of genetic resources. In the present study, genetic diversity among 50 Iranian basil (Ocimum basilicum L. accessions was determined using inter simple sequence repeat (ISSR markers. Thirty-eight alleles were generated at 12 ISSR loci. The number of alleles per locus ranged from 1 to 5 with an average of 3.17. The maximum number of alleles was observed at the A7, 818, 825 and 849 loci, and their size ranged from 300 to 2500 bp. A similarity matrix based on Jaccard's coefficient for all 50 basil accessions gave values from 1.00-0.60. The maximum similarity (1.00 was observed between the "Urmia" and "Shahr-e-Rey II" accessions as well as between the "Urmia" and "Qazvin II" accessions. The lowest similarity (0.60 was observed between the "Tuyserkan I" and "Gom II" accessions. The unweighted pair- group method using arithmetique average UPGMA clustering algorithm classified the studied accessions into three distinct groups. All of the basil accessions, with the exception of "Babol III", "Ahvaz II", "Yazd II" and "Ardebil I", were placed in groups I and II. Leaf colour was a specific characteristic that influenced the clustering of Iranian basil accessions. Because of this relationship, the results of the principal coordinate analysis (PCoA approximately corresponded to those obtained through cluster analysis. Our results revealed that the geographical distribution of genotypes could not be used as a basis for crossing parents to obtain high heterosis, and therefore, it must be carried out by genetic studies.O estudo das relações genéticas é um pré-requisito para atividades em reprodução de plantas assim como para conservação de recursos genéticos. Neste trabalho a diversidade genética entre 50 acessos de Manejericão Iraniano (Ocimum basilicum L. foram determinadas usando marcadores de Seqüência Simples Repetida Interna (ISSR

  17. Accurate molecular diagnosis of phenylketonuria and tetrahydrobiopterin-deficient hyperphenylalaninemias using high-throughput targeted sequencing

    Science.gov (United States)

    Trujillano, Daniel; Perez, Belén; González, Justo; Tornador, Cristian; Navarrete, Rosa; Escaramis, Georgia; Ossowski, Stephan; Armengol, Lluís; Cornejo, Verónica; Desviat, Lourdes R; Ugarte, Magdalena; Estivill, Xavier

    2014-01-01

    Genetic diagnostics of phenylketonuria (PKU) and tetrahydrobiopterin (BH4) deficient hyperphenylalaninemia (BH4DH) rely on methods that scan for known mutations or on laborious molecular tools that use Sanger sequencing. We have implemented a novel and much more efficient strategy based on high-throughput multiplex-targeted resequencing of four genes (PAH, GCH1, PTS, and QDPR) that, when affected by loss-of-function mutations, cause PKU and BH4DH. We have validated this approach in a cohort of 95 samples with the previously known PAH, GCH1, PTS, and QDPR mutations and one control sample. Pooled barcoded DNA libraries were enriched using a custom NimbleGen SeqCap EZ Choice array and sequenced using a HiSeq2000 sequencer. The combination of several robust bioinformatics tools allowed us to detect all known pathogenic mutations (point mutations, short insertions/deletions, and large genomic rearrangements) in the 95 samples, without detecting spurious calls in these genes in the control sample. We then used the same capture assay in a discovery cohort of 11 uncharacterized HPA patients using a MiSeq sequencer. In addition, we report the precise characterization of the breakpoints of four genomic rearrangements in PAH, including a novel deletion of 899 bp in intron 3. Our study is a proof-of-principle that high-throughput-targeted resequencing is ready to substitute classical molecular methods to perform differential genetic diagnosis of hyperphenylalaninemias, allowing the establishment of specifically tailored treatments a few days after birth. PMID:23942198

  18. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  19. High similarity of Trypanosoma cruzi kDNA genetic profiles detected by LSSP-PCR within family groups in an endemic area of Chagas disease in Brazil

    Directory of Open Access Journals (Sweden)

    Sandra Maria Alkmim-Oliveira

    2014-10-01

    Full Text Available Introduction Determining the genetic similarities among Trypanosoma cruzi populations isolated from different hosts and vectors is very important to clarify the epidemiology of Chagas disease. Methods An epidemiological study was conducted in a Brazilian endemic area for Chagas disease, including 76 chronic chagasic individuals (96.1% with an indeterminate form; 46.1% with positive hemoculture. Results T. cruzi I (TcI was isolated from one child and TcII was found in the remaining (97.1% subjects. Low-stringency single-specific-primer-polymerase chain reaction (LSSP-PCR showed high heterogeneity among TcII populations (46% of shared bands; however, high similarities (80-100% among pairs of mothers/children, siblings, or cousins were detected. Conclusions LSSP-PCR showed potential for identifying similar parasite populations among individuals with close kinship in epidemiological studies of Chagas disease.

  20. High incidence of nasopharyngeal cancer: similarity for 60% of mitochondrial DNA signatures between the Bidayuhs of Borneo and the Bai-yue of Southern China

    Institute of Scientific and Technical Information of China (English)

    Joseph Wee; Tam Cam Ha; Susan Loong; Chao-Nan Qian

    2012-01-01

    Populations in Southern China (Bai-yue) and Borneo (Bidayuh) with high incidence of nasopharyngeal cancer (NPC) share similar mitochondrial DNA signatures,supporting the hypothesis that these two populations may share the same genetic predisposition for NPC,which may have first appeared in a common ancestral reference population before the sea levels rose after the last ice age.

  1. A Naturally Occurring Repeat Protein with High Internal Sequence Identity Defines a New Class of TPR-like Proteins.

    Science.gov (United States)

    Marold, Jacob D; Kavran, Jennifer M; Bowman, Gregory D; Barrick, Doug

    2015-11-01

    Linear repeat proteins often have high structural similarity and low (∼25%) pairwise sequence identities (PSI) among modules. We identified a unique P. anserina (Pa) sequence with tetratricopeptide repeat (TPR) homology, which contains longer (42 residue) repeats (42PRs) with an average PSI >91%. We determined the crystal structure of five tandem Pa 42PRs to 1.6 Å, and examined the stability and solution properties of constructs containing three to six Pa 42PRs. Compared with 34-residue TPRs (34PRs), Pa 42PRs have a one-turn extension of each helix, and bury more surface area. Unfolding transitions shift to higher denaturant concentration and become sharper as repeats are added. Fitted Ising models show Pa 42PRs to be more cooperative than consensus 34PRs, with increased magnitudes of intrinsic and interfacial free energies. These results demonstrate the tolerance of the TPR motif to length variation, and provide a basis to understand the effects of helix length on intrinsic/interfacial stability.

  2. High-throughput DNA sequence analysis reveals stable engraftment of gut microbiota following transplantation of previously frozen fecal bacteria.

    Science.gov (United States)

    Hamilton, Matthew J; Weingarden, Alexa R; Unno, Tatsuya; Khoruts, Alexander; Sadowsky, Michael J

    2013-01-01

    Fecal microbiota transplantation (FMT) is becoming a more widely used technology for treatment of recurrent Clostridum difficile infection (CDI). While previous treatments used fresh fecal slurries as a source of microbiota for FMT, we recently reported the successful use of standardized, partially purified and frozen fecal microbiota to treat CDI. Here we report that high-throughput 16S rRNA gene sequencing showed stable engraftment of gut microbiota following FMT using frozen fecal bacteria from a healthy donor. Similar bacterial taxa were found in post-transplantation samples obtained from the recipients and donor samples, but the relative abundance varied considerably between patients and time points. Post FMT samples from patients showed an increase in the abundance of Firmicutes and Bacteroidetes, representing 75-80% of the total sequence reads. Proteobacteria and Actinobacteria were less abundant (fecal microbiota from a healthy donor can be used to effectively treat recurrent CDI resulting in restoration of the structure of gut microbiota and clearing of Clostridum difficile.

  3. 454-sequencing reveals stochastic local reassembly and high disturbance tolerance within arbuscular mycorrhizal fungal communities

    DEFF Research Database (Denmark)

    Lekberg, Karin Ylva Margareta; Schnoor, Tim; Kjøller, Rasmus

    2012-01-01

    1. Disturbance is assumed to be a major driver of plant community composition, but whether similar processes operate on associated soil microbial communities is less known. Based on the assumed trade-off between disturbance tolerance and competiveness, we hypothesize that a severe disturbance......, disturbance did not significantly alter the community composition and OTU richness. Instead, OTU abundances were positively correlated across treatments; i.e., common OTUs in undisturbed soil were also common after the severe disturbance. However, the distribution of OTUs within and between plots was largely...... applied within a semi-natural grassland would shift the arbuscular mycorrhizal (AM) fungal community towards disturbance-tolerant fungi that are rare in undisturbed soils. 2. We used 454-sequencing of the large subunit rDNAregion to characterizeAMfungal communities in Plantago lanceolata roots grown...

  4. High-quality genome sequence and description of Bacillus ndiopicus strain FF3T sp. nov.

    Directory of Open Access Journals (Sweden)

    C.I. Lo

    2015-11-01

    Full Text Available Strain FF3T was isolated from the skin-flora of a 39-year-old healthy Senegalese man. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry did not allow any identification. This strain exhibited a 16S rRNA sequence similarity of 96.8% with Bacillus massiliensis, the phylogenetically closest species with standing nomenclature. Using a polyphasic study made of phenotypic and genomic analyses, strain FF3T was Gram-positive, aeroanaerobic and rod shaped and exhibited a genome of 4 068 720 bp with a G+C content of 37.03% that coded 3982 protein-coding and 67 RNA genes (including four rRNA operons. On the basis of these data, we propose the creation of Bacillus ndiopicus sp. nov.

  5. Comparative sequence analysis of a highly oncogenic but horizontal spread-defective clone of Marek's disease virus.

    Science.gov (United States)

    Spatz, Stephen J; Zhao, Yuguang; Petherbridge, Lawrence; Smith, Lorraine P; Baigent, Susan J; Nair, Venugopal

    2007-12-01

    Marek's disease virus (MDV) is a cell-associated alphaherpesvirus that induces rapid-onset T-cell lymphomas in poultry. MDV isolates vary greatly in pathogenicity. While some of the strains such as CVI988 are non-pathogenic and are used as vaccines, others such as RB-1B are highly oncogenic. Molecular determinants associated with differences in pathogenicity are not completely understood. Comparison of the genome sequences of phenotypically different strains could help to identify molecular determinants of pathogenicity. We have previously reported the construction of bacterial artificial chromosome (BAC) clones of RB-1B from which fully infectious viruses could be reconstituted upon DNA transfection into chicken cells. MDV reconstituted from one of these clones (pRB-1B-5) showed similar in vitro and in vivo replication kinetics and oncogenicity as the parental virus. However, unlike the parental RB-1B virus, the BAC-derived virus showed inability to spread between birds. In order to identify the unique determinants for oncogenicity and the ''non-spreading phenotype'' of MDV derived from this clone, we determined the full-length sequence of pRB-1B-5. Comparative sequence analysis with the published sequences of strains such as Md5, Md11, and CVI988 identified frameshift mutations in RLORF1, protein kinase (UL13), and glycoproteins C (UL44) and D (US6). Comparison of the sequences of these genes with the parental virus indicated that the RLORF1, UL44, and US6 mutations were also present in the parental RB-1B stock of the virus. However with regard to UL13 mutation, the parental RB-1B stock appeared to be a mixture of wild type and mutant viruses, indicating that the BAC cloning has selected a mutant clone. Although further studies are needed to evaluate the role of these genes in the horizontal-spreading defective phenotype, our data clearly indicate that mutations in these genes do not affect the oncogenicity of MDV.

  6. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob

    2016-01-01

    Stored neonatal dried blood spot (DBS) samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can......_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity--the concordance rate...... be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject...

  7. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing

    Science.gov (United States)

    Weissensteiner, Hansi; Pacher, Dominic; Kloss-Brandstätter, Anita; Forer, Lukas; Specht, Günther; Bandelt, Hans-Jürgen; Kronenberg, Florian; Salas, Antonio; Schönherr, Sebastian

    2016-01-01

    Mitochondrial DNA (mtDNA) profiles can be classified into phylogenetic clusters (haplogroups), which is of great relevance for evolutionary, forensic and medical genetics. With the extensive growth of the underlying phylogenetic tree summarizing the published mtDNA sequences, the manual process of haplogroup classification would be too time-consuming. The previously published classification tool HaploGrep provided an automatic way to address this issue. Here, we present the completely updated version HaploGrep 2 offering several advanced features, including a generic rule-based system for immediate quality control (QC). This allows detecting artificial recombinants and missing variants as well as annotating rare and phantom mutations. Furthermore, the handling of high-throughput data in form of VCF files is now directly supported. For data output, several graphical reports are generated in real time, such as a multiple sequence alignment format, a VCF format and extended haplogroup QC reports, all viewable directly within the application. In addition, HaploGrep 2 generates a publication-ready phylogenetic tree of all input samples encoded relative to the revised Cambridge Reference Sequence. Finally, new distance measures and optimizations of the algorithm increase accuracy and speed-up the application. HaploGrep 2 can be accessed freely and without any registration at http://haplogrep.uibk.ac.at. PMID:27084951

  8. Ancient, highly polymorphic human major histocompatibility complex DQA1 intron sequence

    Energy Technology Data Exchange (ETDEWEB)

    McGinnis, M.D.; Quinn, D.L.; Lebo, R.V. [Univ. of California, San Francisco, CA (United States); Simons, M.J. [GeneType Pty. Ltd., Fitzroy, Victoria (Australia)

    1994-10-01

    A 438 basepair intron 1 sequence adjacent to exon 2 in the human major histocompatibility complex DQA1 gene defined 16 allelic variants in 69 individuals from wide ethnic backgrounds. In contrast, the most variable coding region spanned by the 247 basepair exon 2 defined 11 allelic variants. Our phylogenetic human intron 1 tree derived by the Bootstrap algorithm reflects the same relative allelic relationships as the reported DQA1 exon 2 have cosegregated since divergence of the human races. Comparison of human alleles to a Rhesus monkey DQA1 first intron sequence found only 10 nucleotide substitutions unique to Rhesus, with the other 428 positions (98%) found in at least one human allele. This high degree of homology reflects the evolutionary stability of intron sequences since these two species diverged over 20 million years ago. Because more intron 1 alleles exist than exon 2 alleles, these polymorphic introns can be used to improve tissue typing for transplantation, paternity testing, and forensics and to derive more complete phylogenetic trees. These results suggest that introns represent a previously underutilized polymorphic resource. 42 refs., 3 figs., 1 tab.

  9. High-throughput sequencing and morphology perform equally well for benthic monitoring of marine ecosystems.

    Science.gov (United States)

    Lejzerowicz, Franck; Esling, Philippe; Pillet, Loïc; Wilding, Thomas A; Black, Kenneth D; Pawlowski, Jan

    2015-09-10

    Environmental diversity surveys are crucial for the bioassessment of anthropogenic impacts on marine ecosystems. Traditional benthic monitoring relying on morphotaxonomic inventories of macrofaunal communities is expensive, time-consuming and expertise-demanding. High-throughput sequencing of environmental DNA barcodes (metabarcoding) offers an alternative to describe biological communities. However, whether the metabarcoding approach meets the quality standards of benthic monitoring remains to be tested. Here, we compared morphological and eDNA/RNA-based inventories of metazoans from samples collected at 10 stations around a fish farm in Scotland, including near-cage and distant zones. For each of 5 replicate samples per station, we sequenced the V4 region of the 18S rRNA gene using the Illumina technology. After filtering, we obtained 841,766 metazoan sequences clustered in 163 Operational Taxonomic Units (OTUs). We assigned the OTUs by combining local BLAST searches with phylogenetic analyses. We calculated two commonly used indices: the Infaunal Trophic Index and the AZTI Marine Biotic Index. We found that the molecular data faithfully reflect the morphology-based indices and provides an equivalent assessment of the impact associated with fish farms activities. We advocate that future benthic monitoring should integrate metabarcoding as a rapid and accurate tool for the evaluation of the quality of marine benthic ecosystems.

  10. More DNA-Aptamers for Small Drugs: A Capture-SELEX Coupled with Surface Plasmon Resonance and High-Throughput Sequencing.

    Science.gov (United States)

    Spiga, Fabio M; Maietta, Paolo; Guiducci, Carlotta

    2015-05-11

    To address limitations in the production of DNA aptamers against small molecules, we introduce a DNA-based capture-SELEX (systematic evolution of ligands by exponential enrichment) protocol with long and continuous randomized library for more flexibility, coupled with in-stream direct-specificity monitoring via SPR and high throughput sequencing (HTS). Applying this capture-SELEX on tobramycin shows that target-specificity arises at cycle number 8, which is confirmed by sequence convergence in HTS analysis. Interestingly, HTS also shows that the most enriched sequences are already visible after only two capture-SELEX cycles. The best aptamers displayed K(D) of approximately 200 nM, similar to RNA and DNA-based aptamers previously selected for tobramycin. The lowest concentration of tobramycin detected on label-free SPR experiments with the selected aptamers is 20-fold smaller than the clinical range limit, demonstrating suitability for small-drug biosensing.

  11. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  12. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Directory of Open Access Journals (Sweden)

    Frank Guzman

    Full Text Available BACKGROUND: microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. RESULTS: Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. CONCLUSIONS: This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  13. A highly conserved repeated chromosomal sequence in the radioresistant bacterium Deinococcus radiodurans SARK.

    Science.gov (United States)

    Lennon, E; Gutman, P D; Yao, H L; Minton, K W

    1991-03-01

    A DNA fragment containing a portion of a DNA damage-inducible gene from Deinococcus radiodurans SARK hybridized to numerous fragments of SARK genomic DNA because of a highly conserved repetitive chromosomal element. The element is of variable length, ranging from 150 to 192 bp, depending on the absence or presence of one or two 21-bp sequences located internally. A putative translational start site of the damage-inducible gene is within the reiterated element. The element contains dyad symmetries that suggest modes of transcriptional and/or translational control.

  14. Draft genome sequence of Bacillus thuringiensis 147, a Brazilian strain with high insecticidal activity

    Science.gov (United States)

    Barbosa, Luiz Carlos Bertucci; Farias, Débora Lopes; Silva, Isabella de Moraes Guimarães; Melo, Fernando Lucas; Ribeiro, Bergmann Morais; Aguiar, Raimundo Wagner de Souza

    2015-01-01

    Bacillus thuringiensis is a ubiquitous Gram-positive and sporulating bacterium. Its crystals and secreted toxins are useful tools against larvae of diverse insect orders and, as a consequence, an alternative to recalcitrant chemical insecticides. We report here the draft genome sequence ofB. thuringiensis 147, a strain isolated from Brazil and with high insecticidal activity. The assembled genome contained 6,167,994 bp and was distributed in seven replicons (a chromosome and 6 plasmids). We identified 12 coding regions, located in two plasmids, which encode insecticidal proteins. PMID:26517667

  15. Novel design of multicapillary arrays for high-throughput DNA sequencing.

    Science.gov (United States)

    Tsupryk, Andriy; Gorbovitski, Michael; Kabotyanski, Evgeni A; Gorfinkel, Vera

    2006-07-01

    A novel approach to design and optimize linear multicapillary arrays (LMCAs) for high-throughput DNA sequencing is proposed. A significant increase in the number of capillary lanes is obtained due to the use of composite insertions alternately placed between working capillaries of the array and a specific combination of refractive indices of the DNA separation matrix, capillary glass, the insertions and a medium which surrounds the capillary array. Theoretical and experimental studies showed that in conjunction with a dual-side laser illumination scheme, the proposed LMCA design allows a simultaneous uniform irradiation of as many as 550 working capillaries.

  16. New similarity search based glioma grading

    Energy Technology Data Exchange (ETDEWEB)

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)

    2012-08-15

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity