WorldWideScience

Sample records for large genomic libraries

  1. Characterization of large-insert DNA libraries from soil for environmental genomic studies of Archaea

    DEFF Research Database (Denmark)

    Treusch, Alexander H; Kletzin, Arnulf; Raddatz, Guenter

    2004-01-01

    Complex genomic libraries are increasingly being used to retrieve complete genes, operons or large genomic fragments directly from environmental samples, without the need to cultivate the respective microorganisms. We report on the construction of three large-insert fosmid libraries in total...... (approximately 1% each) have been captured in our libraries. The diversity of putative protein-encoding genes, as reflected by their distribution into different COG clusters, was comparable to that encoded in complete genomes of cultivated microorganisms. A huge variety of genomic fragments has been captured...

  2. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    Science.gov (United States)

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  3. Enzymatically Generated CRISPR Libraries for Genome Labeling and Screening.

    Science.gov (United States)

    Lane, Andrew B; Strzelecka, Magdalena; Ettinger, Andreas; Grenfell, Andrew W; Wittmann, Torsten; Heald, Rebecca

    2015-08-10

    CRISPR-based technologies have emerged as powerful tools to alter genomes and mark chromosomal loci, but an inexpensive method for generating large numbers of RNA guides for whole genome screening and labeling is lacking. Using a method that permits library construction from any source of DNA, we generated guide libraries that label repetitive loci or a single chromosomal locus in Xenopus egg extracts and show that a complex library can target the E. coli genome at high frequency. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Inexpensive multiplexed library preparation for megabase-sized genomes.

    Directory of Open Access Journals (Sweden)

    Michael Baym

    Full Text Available Whole-genome sequencing has become an indispensible tool of modern biology. However, the cost of sample preparation relative to the cost of sequencing remains high, especially for small genomes where the former is dominant. Here we present a protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing. By carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps, we achieve a cost of library preparation of $8 per sample, approximately 6 times cheaper than the standard Nextera XT protocol. Furthermore, our procedure takes less than 5 hours for 96 samples. Several hundred samples can then be pooled on the same HiSeq lane via custom barcodes. Our method will be useful for re-sequencing of microbial or viral genomes, including those from evolution experiments, genetic screens, and environmental samples, as well as for other sequencing applications including large amplicon, open chromosome, artificial chromosomes, and RNA sequencing.

  5. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  6. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Directory of Open Access Journals (Sweden)

    Kudrna David

    2011-03-01

    Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae

  7. Construction and Analysis of Siberian Tiger Bacterial Artificial Chromosome Library with Approximately 6.5-Fold Genome Equivalent Coverage

    Science.gov (United States)

    Liu, Changqing; Bai, Chunyu; Guo, Yu; Liu, Dan; Lu, Taofeng; Li, Xiangchen; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2014-01-01

    Bacterial artificial chromosome (BAC) libraries are extremely valuable for the genome-wide genetic dissection of complex organisms. The Siberian tiger, one of the most well-known wild primitive carnivores in China, is an endangered animal. In order to promote research on its genome, a high-redundancy BAC library of the Siberian tiger was constructed and characterized. The library is divided into two sub-libraries prepared from blood cells and two sub-libraries prepared from fibroblasts. This BAC library contains 153,600 individually archived clones; for PCR-based screening of the library, BACs were placed into 40 superpools of 10 × 384-deep well microplates. The average insert size of BAC clones was estimated to be 116.5 kb, representing approximately 6.46 genome equivalents of the haploid genome and affording a 98.86% statistical probability of obtaining at least one clone containing a unique DNA sequence. Screening the library with 19 microsatellite markers and a SRY sequence revealed that each of these markers were present in the library; the average number of positive clones per marker was 6.74 (range 2 to 12), consistent with 6.46 coverage of the tiger genome. Additionally, we identified 72 microsatellite markers that could potentially be used as genetic markers. This BAC library will serve as a valuable resource for physical mapping, comparative genomic study and large-scale genome sequencing in the tiger. PMID:24608928

  8. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride.

    Science.gov (United States)

    Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.

  9. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride.

    Directory of Open Access Journals (Sweden)

    Marta Matvienko

    Full Text Available Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC, which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.

  10. Construction and Analysis of Siberian Tiger Bacterial Artificial Chromosome Library with Approximately 6.5-Fold Genome Equivalent Coverage

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2014-03-01

    Full Text Available Bacterial artificial chromosome (BAC libraries are extremely valuable for the genome-wide genetic dissection of complex organisms. The Siberian tiger, one of the most well-known wild primitive carnivores in China, is an endangered animal. In order to promote research on its genome, a high-redundancy BAC library of the Siberian tiger was constructed and characterized. The library is divided into two sub-libraries prepared from blood cells and two sub-libraries prepared from fibroblasts. This BAC library contains 153,600 individually archived clones; for PCR-based screening of the library, BACs were placed into 40 superpools of 10 × 384-deep well microplates. The average insert size of BAC clones was estimated to be 116.5 kb, representing approximately 6.46 genome equivalents of the haploid genome and affording a 98.86% statistical probability of obtaining at least one clone containing a unique DNA sequence. Screening the library with 19 microsatellite markers and a SRY sequence revealed that each of these markers were present in the library; the average number of positive clones per marker was 6.74 (range 2 to 12, consistent with 6.46 coverage of the tiger genome. Additionally, we identified 72 microsatellite markers that could potentially be used as genetic markers. This BAC library will serve as a valuable resource for physical mapping, comparative genomic study and large-scale genome sequencing in the tiger.

  11. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    Science.gov (United States)

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.).

    Science.gov (United States)

    Lee, Mi-Kyung; Zhang, Yang; Zhang, Meiping; Goebel, Mark; Kim, Hee Jin; Triplett, Barbara A; Stelly, David M; Zhang, Hong-Bin

    2013-03-28

    . raimondii contains a D genome (D5). The library represents the first BIBAC library in cotton and related species, thus providing tools useful for integrative physical mapping, large-scale genome sequencing and large-scale functional analysis of the Upland cotton genome. Comparative sequence analysis provides insights into the Upland cotton genome, and a possible mechanism underlying the divergence and evolution of polyploid Upland cotton from its diploid putative progenitor species, G. raimondii.

  13. Construction of BAC Libraries from Flow-Sorted Chromosomes.

    Science.gov (United States)

    Šafář, Jan; Šimková, Hana; Doležel, Jaroslav

    2016-01-01

    Cloned DNA libraries in bacterial artificial chromosome (BAC) are the most widely used form of large-insert DNA libraries. BAC libraries are typically represented by ordered clones derived from genomic DNA of a particular organism. In the case of large eukaryotic genomes, whole-genome libraries consist of a hundred thousand to a million clones, which make their handling and screening a daunting task. The labor and cost of working with whole-genome libraries can be greatly reduced by constructing a library derived from a smaller part of the genome. Here we describe construction of BAC libraries from mitotic chromosomes purified by flow cytometric sorting. Chromosome-specific BAC libraries facilitate positional gene cloning, physical mapping, and sequencing in complex plant genomes.

  14. Construction of a nurse shark (Ginglymostoma cirratum bacterial artificial chromosome (BAC library and a preliminary genome survey

    Directory of Open Access Journals (Sweden)

    Inoko Hidetoshi

    2006-05-01

    Full Text Available Abstract Background Sharks are members of the taxonomic class Chondrichthyes, the oldest living jawed vertebrates. Genomic studies of this group, in comparison to representative species in other vertebrate taxa, will allow us to theorize about the fundamental genetic, developmental, and functional characteristics in the common ancestor of all jawed vertebrates. Aims In order to obtain mapping and sequencing data for comparative genomics, we constructed a bacterial artificial chromosome (BAC library for the nurse shark, Ginglymostoma cirratum. Results The BAC library consists of 313,344 clones with an average insert size of 144 kb, covering ~4.5 × 1010 bp and thus providing an 11-fold coverage of the haploid genome. BAC end sequence analyses revealed, in addition to LINEs and SINEs commonly found in other animal and plant genomes, two new groups of nurse shark-specific repetitive elements, NSRE1 and NSRE2 that seem to be major components of the nurse shark genome. Screening the library with single-copy or multi-copy gene probes showed 6–28 primary positive clones per probe of which 50–90% were true positives, demonstrating that the BAC library is representative of the different regions of the nurse shark genome. Furthermore, some BAC clones contained multiple genes, making physical mapping feasible. Conclusion We have constructed a deep-coverage, high-quality, large insert, and publicly available BAC library for a cartilaginous fish. It will be very useful to the scientific community interested in shark genomic structure, comparative genomics, and functional studies. We found two new groups of repetitive elements specific to the nurse shark genome, which may contribute to the architecture and evolution of the nurse shark genome.

  15. Construction of a nurse shark (Ginglymostoma cirratum) bacterial artificial chromosome (BAC) library and a preliminary genome survey.

    Science.gov (United States)

    Luo, Meizhong; Kim, Hyeran; Kudrna, Dave; Sisneros, Nicholas B; Lee, So-Jeong; Mueller, Christopher; Collura, Kristi; Zuccolo, Andrea; Buckingham, E Bryan; Grim, Suzanne M; Yanagiya, Kazuyo; Inoko, Hidetoshi; Shiina, Takashi; Flajnik, Martin F; Wing, Rod A; Ohta, Yuko

    2006-05-03

    Sharks are members of the taxonomic class Chondrichthyes, the oldest living jawed vertebrates. Genomic studies of this group, in comparison to representative species in other vertebrate taxa, will allow us to theorize about the fundamental genetic, developmental, and functional characteristics in the common ancestor of all jawed vertebrates. In order to obtain mapping and sequencing data for comparative genomics, we constructed a bacterial artificial chromosome (BAC) library for the nurse shark, Ginglymostoma cirratum. The BAC library consists of 313,344 clones with an average insert size of 144 kb, covering ~4.5 x 1010 bp and thus providing an 11-fold coverage of the haploid genome. BAC end sequence analyses revealed, in addition to LINEs and SINEs commonly found in other animal and plant genomes, two new groups of nurse shark-specific repetitive elements, NSRE1 and NSRE2 that seem to be major components of the nurse shark genome. Screening the library with single-copy or multi-copy gene probes showed 6-28 primary positive clones per probe of which 50-90% were true positives, demonstrating that the BAC library is representative of the different regions of the nurse shark genome. Furthermore, some BAC clones contained multiple genes, making physical mapping feasible. We have constructed a deep-coverage, high-quality, large insert, and publicly available BAC library for a cartilaginous fish. It will be very useful to the scientific community interested in shark genomic structure, comparative genomics, and functional studies. We found two new groups of repetitive elements specific to the nurse shark genome, which may contribute to the architecture and evolution of the nurse shark genome.

  16. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    As one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones ...

  17. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  18. Construction of the BAC Library of Small Abalone (Haliotis diversicolor) for Gene Screening and Genome Characterization.

    Science.gov (United States)

    Jiang, Likun; You, Weiwei; Zhang, Xiaojun; Xu, Jian; Jiang, Yanliang; Wang, Kai; Zhao, Zixia; Chen, Baohua; Zhao, Yunfeng; Mahboob, Shahid; Al-Ghanim, Khalid A; Ke, Caihuan; Xu, Peng

    2016-02-01

    The small abalone (Haliotis diversicolor) is one of the most important aquaculture species in East Asia. To facilitate gene cloning and characterization, genome analysis, and genetic breeding of it, we constructed a large-insert bacterial artificial chromosome (BAC) library, which is an important genetic tool for advanced genetics and genomics research. The small abalone BAC library includes 92,610 clones with an average insert size of 120 Kb, equivalent to approximately 7.6× of the small abalone genome. We set up three-dimensional pools and super pools of 18,432 BAC clones for target gene screening using PCR method. To assess the approach, we screened 12 target genes in these 18,432 BAC clones and identified 16 positive BAC clones. Eight positive BAC clones were then sequenced and assembled with the next generation sequencing platform. The assembled contigs representing these 8 BAC clones spanned 928 Kb of the small abalone genome, providing the first batch of genome sequences for genome evaluation and characterization. The average GC content of small abalone genome was estimated as 40.33%. A total of 21 protein-coding genes, including 7 target genes, were annotated into the 8 BACs, which proved the feasibility of PCR screening approach with three-dimensional pools in small abalone BAC library. One hundred fifty microsatellite loci were also identified from the sequences for marker development in the future. The BAC library and clone pools provided valuable resources and tools for genetic breeding and conservation of H. diversicolor.

  19. Large-scale trends in the evolution of gene structures within 11 animal genomes.

    Directory of Open Access Journals (Sweden)

    Mark Yandell

    2006-03-01

    Full Text Available We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales--from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for "Comparative Genomics Library". Our results demonstrate that change in intron-exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities.

  20. A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    Directory of Open Access Journals (Sweden)

    Schwartz Russell

    2008-08-01

    Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

  1. Construction and characterization of a yeast artificial chromosome library containing seven haploid human genome equivalents

    International Nuclear Information System (INIS)

    Albertsen, H.M.; Abderrahim, H.; Cann, H.M.; Dausset, J.; Le Paslier, D.; Cohen, D.

    1990-01-01

    Prior to constructing a library of yeast artificial chromosomes (YACs) containing very large human DNA fragments, the authors performed a series of preliminary experiments aimed at developing a suitable protocol. They found an inverse relationship between YAC insert size and transformation efficiency. Evidence of occasional rearrangement within YAC inserts was found resulting in clonally stable internal deletions or clonally unstable size variations. A protocol was developed for preparative electrophoretic enrichment of high molecular mass human DNA fragments from partial restriction digests and ligation with the YAC vector in agarose. A YAC library has been constructed from large fragments of DNA from an Epstein-Barr virus-transformed human lymphoblastoid cell line. The library presently contains 50,000 clones, 95% of which are greater than 250 kilobase pairs in size. The mean YAC size of the library, calculated from 132 randomly isolated clones, is 430 kilobase pairs. The library thus contains the equivalent of approximately seven haploid human genomes

  2. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    Directory of Open Access Journals (Sweden)

    Feltus Frank A

    2011-07-01

    Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of

  3. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis; FINAL

    International Nuclear Information System (INIS)

    Simon, M. I.; Kim, U.-J.

    2002-01-01

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year

  4. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.

    Science.gov (United States)

    Miller, Chase A; Anthony, Jon; Meyer, Michelle M; Marth, Gabor

    2013-02-01

    High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.

  5. Construction of a llama bacterial artificial chromosome library with approximately 9-fold genome equivalent coverage.

    Science.gov (United States)

    Airmet, K W; Hinckley, J D; Tree, L T; Moss, M; Blumell, S; Ulicny, K; Gustafson, A K; Weed, M; Theodosis, R; Lehnardt, M; Genho, J; Stevens, M R; Kooyman, D L

    2012-01-01

    The Ilama is an important agricultural livestock in much of South America. The llama is increasing in popularity in the United States as a companion animal. Little work has been done to improve llama production using modern technology. A paucity of information is available regarding the llama genome. We report the construction of a llama bacterial artificial chromosome (BAC) library of about 196,224 clones in the vector pECBAC1. Using flow cytometry and bovine, human, mouse, and chicken as controls, we determined the llama genome size to be 2.4 × 10⁹ bp. The average insert size of the library is 137.8 kb corresponding to approximately 9-fold genome coverage. Further studies are needed to further characterize the library and llama genome. We anticipate that this new library will help facilitate future genomic studies in the llama.

  6. The first insight into the salvia (lamiaceae) genome via bac library construction and high-throughput sequencing of target bac clones

    International Nuclear Information System (INIS)

    Hao, D.C.; Vautrin, S.; Berges, H.; Chen, S.L.

    2015-01-01

    Salvia is a representative genus of Lamiaceae, a eudicot family with significant species diversity and population adaptibility. One of the key goals of Salvia genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of medicinal plants to increase their health and productivity. Large-insert genomic libraries are a fundamental tool for achieving this purpose. We report herein the construction, characterization and screening of a gridded BAC library for Salvia officinalis (sage). The S. officinalis BAC library consists of 17,764 clones and the average insert size is 107 Kb, corresponding to 3 haploid genome equivalents. Seventeen positive clones (average insert size 115 Kb) containing five terpene synthase (TPS) genes were screened out by PCR and 12 of them were subject to Illumina HiSeq 2000 sequencing, which yielded 28,097,480 90-bp raw reads (2.53 Gb). Scaffolds containing sabinene synthase (Sab), a Sab homolog, TPS3 (kaurene synthase-like 2), copalyl diphosphate synthase 2 and one cytochrome P450 gene were retrieved via de novo assembly and annotation, which also have flanking noncoding sequences, including predicted promoters and repeat sequences. Among 2,638 repeat sequences, there are 330 amplifiable microsatellites. This BAC library provides a new resource for Lamiaceae genomic studies, including microsatellite marker development, physical mapping, comparative genomics and genome sequencing. Characterization of positive clones provided insights into the structure of the Salvia genome. These sequences will be used in the assembly of a future genome sequence for S. officinalis. (author)

  7. Construction of a large-scale Burkholderia cenocepacia J2315 transposon mutant library

    Science.gov (United States)

    Wong, Yee-Chin; Pain, Arnab; Nathan, Sheila

    2014-09-01

    Burkholderia cenocepacia, a pathogenic member of the Burkholderia cepacia complex (Bcc), has emerged as a significant threat towards cystic fibrosis patients, where infection often leads to the fatal clinical manifestation known as cepacia syndrome. Many studies have investigated the pathogenicity of B. cenocepacia as well as its ability to become highly resistant towards many of the antibiotics currently in use. In addition, studies have also been undertaken to understand the pathogen's capacity to adapt and survive in a broad range of environments. Transposon based mutagenesis has been widely used in creating insertional knock-out mutants and coupled with recent advances in sequencing technology, robust tools to study gene function in a genome-wide manner have been developed based on the assembly of saturated transposon mutant libraries. In this study, we describe the construction of a large-scale library of B. cenocepacia transposon mutants. To create transposon mutants of B. cenocepacia strain J2315, electrocompetent bacteria were electrotransformed with the EZ-Tn5 transposome. Tetracyline resistant colonies were harvested off selective agar and pooled. Mutants were generated in multiple batches with each batch consisting of ˜20,000 to 40,000 mutants. Transposon insertion was validated by PCR amplification of the transposon region. In conclusion, a saturated B. cenocepacia J2315 transposon mutant library with an estimated total number of 500,000 mutants was successfully constructed. This mutant library can now be further exploited as a genetic tool to assess the function of every gene in the genome, facilitating the discovery of genes important for bacterial survival and adaptation, as well as virulence.

  8. Chromosome region-specific libraries for human genome analysis. Final progress report, 1 March 1991--28 February 1994

    Energy Technology Data Exchange (ETDEWEB)

    Kao, F.T.

    1994-04-01

    The objectives of this grant proposal include (1) development of a chromosome microdissection and PCR-mediated microcloning technology, (2) application of this microtechnology to the construction of region-specific libraries for human genome analysis. During this grant period, the authors have successfully developed this microtechnology and have applied it to the construction of microdissection libraries for the following chromosome regions: a whole chromosome 21 (21E), 2 region-specific libraries for the long arm of chromosome 2, 2q35-q37 (2Q1) and 2q33-q35 (2Q2), and 4 region-specific libraries for the entire short arm of chromosome 2, 2p23-p25 (2P1), 2p21-p23 (2P2), 2p14-p16 (wP3) and 2p11-p13 (2P4). In addition, 20--40 unique sequence microclones have been isolated and characterized for genomic studies. These region-specific libraries and the single-copy microclones from the library have been used as valuable resources for (1) isolating microsatellite probes in linkage analysis to further refine the disease locus; (2) isolating corresponding clones with large inserts, e.g. YAC, BAC, P1, cosmid and phage, to facilitate construction of contigs for high resolution physical mapping; and (3) isolating region-specific cDNA clones for use as candidate genes. These libraries are being deposited in the American Type Culture Collection (ATCC) for general distribution.

  9. Human genome libraries. Final progress report, February 1, 1994--August 31, 1997

    Energy Technology Data Exchange (ETDEWEB)

    Kao, Fa-Ten

    1998-01-01

    The goal of this program is to use a novel technology of chromosome microdissection and microcloning to construct chromosome region-specific libraries as resources for various human genome program studies. Region specific libraries have been constructed for the entire human chromosomes 2 and 18.

  10. Construction of bacterial artificial chromosome libraries for Zhikong Scallop Chlamys farreri

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yang; ZHANG Xiaojun; Chantel F.SCHEURING; ZHANG Hongbin; LI Fuhua; XIANG Jianhai

    2008-01-01

    Two Large-insert genomic bacterial artificial chromosome (BAC) libraries of Zhikong scallop Chlamys farreri were constructed to promote our genetic and genomic research.High-quality megabase-sized DNA was isolated from the adductor muscle of the scallop and partially digested by BamH I and Mbo I,respectively.The BamH I library consisted of 53760 clones while the Mbo I library consisted of 7680 clones.Approximately 96% of the clones in BamH I library contained nuclear DNA inserts in average size of 100 kb,providing a coverage of 5.3 haploid genome equivalents.Similarly,the Mbo I library with an average insert of 145 kb and no insert-empty clones,thus providing a genome coverage of 1.1 haploid genome equivalents.

  11. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    Directory of Open Access Journals (Sweden)

    Kumar Santosh

    2012-12-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from

  12. Targeted sequencing of large genomic regions with CATCH-Seq.

    Directory of Open Access Journals (Sweden)

    Kenneth Day

    Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

  13. Phylogenetic distribution of large-scale genome patchiness

    Directory of Open Access Journals (Sweden)

    Hackenberg Michael

    2008-04-01

    Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.

  14. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis

    OpenAIRE

    Ong, Shyue Ping; Richards, William Davidson; Jain, Anubhav; Hautier, Geoffroy; Kocher, Michael; Cholia, Shreyas; Gunter, Dan; Chevrier, Vincent L.; Persson, Kristin A.; Ceder, Gerbrand

    2012-01-01

    We present the Python Materials Genomics (pymatgen) library, a robust, open-source Python library for materials analysis. A key enabler in high-throughput computational materials science efforts is a robust set of software tools to perform initial setup for the calculations (e.g., generation of structures and necessary input files) and post-calculation analysis to derive useful material properties from raw calculated data. The pymatgen library aims to meet these needs by (1) defining core Pyt...

  15. A BAC-based physical map of the Drosophila buzzatii genome

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez, Josefa; Nefedov, Michael; Bosdet, Ian; Casals, Ferran; Calvete, Oriol; Delprat, Alejandra; Shin, Heesun; Chiu, Readman; Mathewson, Carrie; Wye, Natasja; Hoskins, Roger A.; Schein, JacquelineE.; de Jong, Pieter; Ruiz, Alfredo

    2005-03-18

    Large-insert genomic libraries facilitate cloning of large genomic regions, allow the construction of clone-based physical maps and provide useful resources for sequencing entire genomes. Drosophilabuzzatii is a representative species of the repleta group in the Drosophila subgenus, which is being widely used as a model in studies of genome evolution, ecological adaptation and speciation. We constructed a Bacterial Artificial Chromosome (BAC) genomic library of D. buzzatii using the shuttle vector pTARBAC2.1. The library comprises 18,353 clones with an average insert size of 152 kb and a {approx}18X expected representation of the D. buzzatii euchromatic genome. We screened the entire library with six euchromatic gene probes and estimated the actual genome representation to be {approx}23X. In addition, we fingerprinted by restriction digestion and agarose gel electrophoresis a sample of 9,555 clones, and assembled them using Finger Printed Contigs (FPC) software and manual editing into 345 contigs (mean of 26 clones per contig) and 670singletons. Finally, we anchored 181 large contigs (containing 7,788clones) to the D. buzzatii salivary gland polytene chromosomes by in situ hybridization of 427 representative clones. The BAC library and a database with all the information regarding the high coverage BAC-based physical map described in this paper are available to the research community.

  16. Computational solution to automatically map metabolite libraries in the context of genome scale metabolic networks

    Directory of Open Access Journals (Sweden)

    Benjamin eMerlet

    2016-02-01

    Full Text Available This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc and flat file formats (SBML and Matlab files. We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics and Glasgow Polyomics on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks.In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks.In order to achieve this goal we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.

  17. A highly redundant BAC library of Atlantic salmon (Salmo salar: an important tool for salmon projects

    Directory of Open Access Journals (Sweden)

    Koop Ben F

    2005-04-01

    Full Text Available Abstract Background As farming of Atlantic salmon is growing as an aquaculture enterprise, the need to identify the genomic mechanisms for specific traits is becoming more important in breeding and management of the animal. Traits of importance might be related to growth, disease resistance, food conversion efficiency, color or taste. To identify genomic regions responsible for specific traits, genomic large insert libraries have previously proven to be of crucial importance. These large insert libraries can be screened using gene or genetic markers in order to identify and map regions of interest. Furthermore, large-scale mapping can utilize highly redundant libraries in genome projects, and hence provide valuable data on the genome structure. Results Here we report the construction and characterization of a highly redundant bacterial artificial chromosome (BAC library constructed from a Norwegian aquaculture strain male of Atlantic salmon (Salmo salar. The library consists of a total number of 305 557 clones, in which approximately 299 000 are recombinants. The average insert size of the library is 188 kbp, representing 18-fold genome coverage. High-density filters each consisting of 18 432 clones spotted in duplicates have been produced for hybridization screening, and are publicly available 1. To characterize the library, 15 expressed sequence tags (ESTs derived overgos and 12 oligo sequences derived from microsatellite markers were used in hybridization screening of the complete BAC library. Secondary hybridizations with individual probes were performed for the clones detected. The BACs positive for the EST probes were fingerprinted and mapped into contigs, yielding an average of 3 contigs for each probe. Clones identified using genomic probes were PCR verified using microsatellite specific primers. Conclusion Identification of genes and genomic regions of interest is greatly aided by the availability of the CHORI-214 Atlantic salmon BAC

  18. Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus subtilis.

    Science.gov (United States)

    Koo, Byoung-Mo; Kritikos, George; Farelli, Jeremiah D; Todor, Horia; Tong, Kenneth; Kimsey, Harvey; Wapinski, Ilan; Galardini, Marco; Cabal, Angelo; Peters, Jason M; Hachmann, Anna-Barbara; Rudner, David Z; Allen, Karen N; Typas, Athanasios; Gross, Carol A

    2017-03-22

    A systems-level understanding of Gram-positive bacteria is important from both an environmental and health perspective and is most easily obtained when high-quality, validated genomic resources are available. To this end, we constructed two ordered, barcoded, erythromycin-resistance- and kanamycin-resistance-marked single-gene deletion libraries of the Gram-positive model organism, Bacillus subtilis. The libraries comprise 3,968 and 3,970 genes, respectively, and overlap in all but four genes. Using these libraries, we update the set of essential genes known for this organism, provide a comprehensive compendium of B. subtilis auxotrophic genes, and identify genes required for utilizing specific carbon and nitrogen sources, as well as those required for growth at low temperature. We report the identification of enzymes catalyzing several missing steps in amino acid biosynthesis. Finally, we describe a suite of high-throughput phenotyping methodologies and apply them to provide a genome-wide analysis of competence and sporulation. Altogether, we provide versatile resources for studying gene function and pathway and network architecture in Gram-positive bacteria. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Study on the Mitochondrial Genome of Sea Island Cotton (Gossypium barbadense) by BAC Library Screening

    Institute of Scientific and Technical Information of China (English)

    SU Ai-guo; LI Shuang-shuang; LIU Guo-zheng; LEI Bin-bin; KANG Ding-ming; LI Zhao-hu; MA Zhi-ying; HUA Jin-ping

    2014-01-01

    The plant mitochondrial genome displays complex features, particularly in terms of cytoplasmic male sterility (CMS). Therefore, research on the cotton mitochondrial genome may provide important information for analyzing genome evolution and exploring the molecular mechanism of CMS. In this paper, we present a preliminary study on the mitochondrial genome of sea island cotton (Gossypium barbadense) based on positive clones from the bacterial artiifcial chromosome (BAC) library. Thirty-ifve primers designed with the conserved sequences of functional genes and exons of mitochondria were used to screen positive clones in the genome library of the sea island cotton variety called Pima 90-53. Ten BAC clones were obtained and veriifed for further study. A contig was obtained based on six overlapping clones and subsequently laid out primarily on the mitochondrial genome. One BAC clone, clone 6 harbored with the inserter of approximate 115 kb mtDNA sequence, in which more than 10 primers fragments could be ampliifed, was sequenced and assembled using the Solexa strategy. Fifteen mitochondrial functional genes were revealed in clone 6 by gene annotation. The characteristics of the syntenic gene/exon of the sequences and RNA editing were preliminarily predicted.

  20. Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.

    Science.gov (United States)

    Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

    2013-11-01

    Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.

  1. A Computational Solution to Automatically Map Metabolite Libraries in the Context of Genome Scale Metabolic Networks.

    Science.gov (United States)

    Merlet, Benjamin; Paulhe, Nils; Vinson, Florence; Frainay, Clément; Chazalviel, Maxime; Poupin, Nathalie; Gloaguen, Yoann; Giacomoni, Franck; Jourdan, Fabien

    2016-01-01

    This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.

  2. Construction and Identification of Bacterial Artificial Chromosome Library for 0-613-2R in Upland Cotton

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A bacterial artificial chromosome (BAC) library containing a large genomic DNA insert is an important tool for genome physical mapping, map-based cloning, and genome sequencing. To isolate genes via a map-based cloning strategy and to perform physical mapping of the cotton genome, a high-quality BAC library containing large cotton DNA inserts is needed. We have developed a BAC library of the restoring line 0-613-2R for isolating the fertility restorer (Rf1) gene and genomic research in cotton (Gossypium hirsutum L.). The BAC library contains 97 825 clones stored in 255 pieces of a 384-well microtiter plate. Random samples of BACs digested with the Notl enzyme indicated that the average insert size is approximately 130 kb, with a range of 80-275 kb,and 95.7% of the BAC clones in the library have an average insert size larger than 100 kb. Based on a cotton genome size of 2 250 Mb, library coverage is 5.7 x haploid genome equivalents. Four clones were selected randomly from the library to determine the stability of the BAC clones. There were no different fingerprints for 0 and 100 generations of each clone digested with Notl and Hindlll enzymes. Thus, the stability of a single BAC clone can be sustained at least for 100 generations. Eight simple sequence repeat (SSR) markers flanking the Rf1 gene were chosen to screen the BAC library by pool using PCR method and 25 positive clones were identified with 3.1 positive clones per SSR marker.

  3. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics' GemCode Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Lauren Coombe

    Full Text Available The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis. Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.

  4. Tenured Librarians in Large University Libraries.

    Science.gov (United States)

    Smith, Karen F.; And Others

    1984-01-01

    Based on a 1979 survey of 530 tenured librarians in 33 large academic libraries, this article examines characteristics of tenured librarians (sex, age, marital status, salary, degrees, rank, job titles), criteria and review procedures used in granting tenure, productivity before and after tenure, and mobility. Seven references are included. (EJS)

  5. Environmental genomics of "Haloquadratum walsbyi" in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species

    Directory of Open Access Journals (Sweden)

    Bolhuis Henk

    2006-07-01

    Full Text Available Abstract Background Mature saturated brine (crystallizers communities are largely dominated (>80% of cells by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities.

  6. Whitefly (Bemisia tabaci genome project: analysis of sequenced clones from egg, instar, and adult (viruliferous and non-viruliferous cDNA libraries

    Directory of Open Access Journals (Sweden)

    Czosnek Henryk

    2006-04-01

    Full Text Available Abstract Background The past three decades have witnessed a dramatic increase in interest in the whitefly Bemisia tabaci, owing to its nature as a taxonomically cryptic species, the damage it causes to a large number of herbaceous plants because of its specialized feeding in the phloem, and to its ability to serve as a vector of plant viruses. Among the most important plant viruses to be transmitted by B. tabaci are those in the genus Begomovirus (family, Geminiviridae. Surprisingly, little is known about the genome of this whitefly. The haploid genome size for male B. tabaci has been estimated to be approximately one billion bp by flow cytometry analysis, about five times the size of the fruitfly Drosophila melanogaster. The genes involved in whitefly development, in host range plasticity, and in begomovirus vector specificity and competency, are unknown. Results To address this general shortage of genomic sequence information, we have constructed three cDNA libraries from non-viruliferous whiteflies (eggs, immature instars, and adults and two from adult insects that fed on tomato plants infected by two geminiviruses: Tomato yellow leaf curl virus (TYLCV and Tomato mottle virus (ToMoV. In total, the sequence of 18,976 clones was determined. After quality control, and removal of 5,542 clones of mitochondrial origin 9,110 sequences remained which included 3,843 singletons and 1,017 contigs. Comparisons with public databases indicated that the libraries contained genes involved in cellular and developmental processes. In addition, approximately 1,000 bases aligned with the genome of the B. tabaci endosymbiotic bacterium Candidatus Portiera aleyrodidarum, originating primarily from the egg and instar libraries. Apart from the mitochondrial sequences, the longest and most abundant sequence encodes vitellogenin, which originated from whitefly adult libraries, indicating that much of the gene expression in this insect is directed toward the production

  7. CORALINA: a universal method for the generation of gRNA libraries for CRISPR-based screening.

    Science.gov (United States)

    Köferle, Anna; Worf, Karolina; Breunig, Christopher; Baumann, Valentin; Herrero, Javier; Wiesbeck, Maximilian; Hutter, Lukas H; Götz, Magdalena; Fuchs, Christiane; Beck, Stephan; Stricker, Stefan H

    2016-11-14

    The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens.

  8. Methods for the preparation of large quantities of complex single-stranded oligonucleotide libraries.

    Science.gov (United States)

    Murgha, Yusuf E; Rouillard, Jean-Marie; Gulari, Erdogan

    2014-01-01

    Custom-defined oligonucleotide collections have a broad range of applications in fields of synthetic biology, targeted sequencing, and cytogenetics. Also, they are used to encode information for technologies like RNA interference, protein engineering and DNA-encoded libraries. High-throughput parallel DNA synthesis technologies developed for the manufacture of DNA microarrays can produce libraries of large numbers of different oligonucleotides, but in very limited amounts. Here, we compare three approaches to prepare large quantities of single-stranded oligonucleotide libraries derived from microarray synthesized collections. The first approach, alkaline melting of double-stranded PCR amplified libraries with a biotinylated strand captured on streptavidin coated magnetic beads results in little or no non-biotinylated ssDNA. The second method wherein the phosphorylated strand of PCR amplified libraries is nucleolyticaly hydrolyzed is recommended when small amounts of libraries are needed. The third method combining in vitro transcription of PCR amplified libraries to reverse transcription of the RNA product into single-stranded cDNA is our recommended method to produce large amounts of oligonucleotide libraries. Finally, we propose a method to remove any primer binding sequences introduced during library amplification.

  9. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    Science.gov (United States)

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  10. Virtual screening methods as tools for drug lead discovery from large chemical libraries.

    Science.gov (United States)

    Ma, X H; Zhu, F; Liu, X; Shi, Z; Zhang, J X; Yang, S Y; Wei, Y Q; Chen, Y Z

    2012-01-01

    Virtual screening methods have been developed and explored as useful tools for searching drug lead compounds from chemical libraries, including large libraries that have become publically available. In this review, we discussed the new developments in exploring virtual screening methods for enhanced performance in searching large chemical libraries, their applications in screening libraries of ~ 1 million or more compounds in the last five years, the difficulties in their applications, and the strategies for further improving these methods.

  11. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

    Directory of Open Access Journals (Sweden)

    Bharti Arvind K

    2008-12-01

    Full Text Available Abstract Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR and methylation spanning linker libraries (MSLL. These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig, while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%. These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of

  12. Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens.

    Science.gov (United States)

    Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio

    2017-10-24

    High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.

  13. A High-Quality Reference Genome for the Invasive Mosquitofish Gambusia affinis Using a Chicago Library.

    Science.gov (United States)

    Hoffberg, Sandra L; Troendle, Nicholas J; Glenn, Travis C; Mahmud, Ousman; Louha, Swarnali; Chalopin, Domitille; Bennetzen, Jeffrey L; Mauricio, Rodney

    2018-04-27

    The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species. Copyright © 2018, G3: Genes, Genomes, Genetics.

  14. Interpreting a sequenced genome: toward a cosmid transgenic library of Caenorhabditis elegans.

    Science.gov (United States)

    Janke, D L; Schein, J E; Ha, T; Franz, N W; O'Neil, N J; Vatcher, G P; Stewart, H I; Kuervers, L M; Baillie, D L; Rose, A M

    1997-10-01

    We have generated a library of transgenic Caenorhabditis elegans strains that carry sequenced cosmids from the genome of the nematode. Each strain carries an extrachromosomal array containing a single cosmid, sequenced by the C. elegans Genome Sequencing Consortium, and a dominate Rol-6 marker. More than 500 transgenic strains representing 250 cosmids have been constructed. Collectively, these strains contain approximately 8 Mb of sequence data, or approximately 8% of the C. elegans genome. The transgenic strains are being used to rescue mutant phenotypes, resulting in a high-resolution map alignment of the genetic, physical, and DNA sequence maps of the nematode. We have chosen the region of chromosome III deleted by sDf127 and not covered by the duplication sDp8(III;I) as a starting point for a systematic correlation of mutant phenotypes with nucleotide sequence. In this defined region, we have identified 10 new essential genes whose mutant phenotypes range from developmental arrest at early larva, to maternal effect lethal. To date, 8 of these 10 essential genes have been rescued. In this region, these rescues represent approximately 10% of the genes predicted by GENEFINDER and considerably enhance the map alignment. Furthermore, this alignment facilitates future efforts to physically position and clone other genes in the region. [Updated information about the Transgenic Library is available via the Internet at http://darwin.mbb.sfu.ca/imbb/dbaillie/cos mid.html.

  15. A Bac Library and Paired-PCR Approach to Mapping and Completing the Genome Sequence of Sulfolobus Solfataricus P2

    DEFF Research Database (Denmark)

    She, Qunxin; Confalonieri, F.; Zivanovic, Y.

    2000-01-01

    The original strategy used in the Sulfolobus solfatnricus genome project was to sequence non overlapping, or minimally overlapping, cosmid or lambda inserts without constructing a physical map. However, after only about two thirds of the genome sequence was completed, this approach became counter......-productive because there was a high sequence bias in the cosmid and lambda libraries. Therefore, a new approach was devised for linking the sequenced regions which may be generally applicable. BAC libraries were constructed and terminal sequences of the clones were determined and used for both end mapping and PCR...

  16. EUPAN enables pan-genome studies of a large number of eukaryotic genomes.

    Science.gov (United States)

    Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun

    2017-08-01

    Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  17. Construction of a Bacterial Artificial Chromosome Library of TM-1, a Standard Line for Genetics and Genomics in Upland Cotton

    Institute of Scientific and Technical Information of China (English)

    Yan Hu; Wang-Zhen Guo; Tian-Zhen Zhang

    2009-01-01

    A bacterial artificial chromosome (BAC) library was constructed for Gossyplum hirsutum acc. TM-1, a genetic and genomic standard line for Upland cotton. The library consists of 147 456 clones with an average insert size of 122.8 kb ranging from 97 to 240 kb. About 96.0% of the clones have inserts over 100 kb. Therefore, this library represents theoretically 7.4 haploid genome equivalents based on an AD genome size of 2 425 Mb. Clones were stored in 384 384- well plates and arrayed into multiplex pools for rapid and reliable library screening. BAC screening was carded out by four-round polymerase chain reactions using 23 simple sequence repeats (SSR) markers, three sequence-related amplified polymorphism markers and one pair of pdmere for a gene associated with fiber development to test the quality of the library. Correspondingly, in total 92 positive BAC clones were Identified with an average four positive clones per SSR marker, ranging from one to eight hits. Additionally, since these SSR markers have been localized to chromosome 12 (A12) and 26 (D12) according to the genetic map, these BAC clonee are expected to serve as seeds for the physical mapping of these two homologous chromosomes, sequentially map-based cloning of quantitative trait loci or genes associated with Important agronomic traits.

  18. Development of a large peptoid-DOTA combinatorial library.

    Science.gov (United States)

    Singh, Jaspal; Lopes, Daniel; Gomika Udugamasooriya, D

    2016-09-01

    Conventional one-bead one-compound (OBOC) library synthesis is typically used to identify molecules with therapeutic value. The design and synthesis of OBOC libraries that contain molecules with imaging or even potentially therapeutic and diagnostic capacities (e.g. theranostic agents) has been overlooked. The development of a therapeutically active molecule with a built-in imaging component for a certain target is a daunting task, and structure-based rational design might not be the best approach. We hypothesize to develop a combinatorial library with potentially therapeutic and imaging components fused together in each molecule. Such molecules in the library can be used to screen, identify, and validate as direct theranostic candidates against targets of interest. As the first step in achieving that aim, we developed an on-bead library of 153,600 Peptoid-DOTA compounds in which the peptoids are the target-recognizing and potentially therapeutic components and the DOTA is the imaging component. We attached the DOTA scaffold to TentaGel beads using one of the four arms of DOTA, and we built a diversified 6-mer peptoid library on the remaining three arms. We evaluated both the synthesis and the mass spectrometric sequencing capacities of the test compounds and of the final library. The compounds displayed unique ionization patterns including direct breakages of the DOTA scaffold into two units, allowing clear decoding of the sequences. Our approach provides a facile synthesis method for the complete on-bead development of large peptidomimetic-DOTA libraries for screening against biological targets for the identification of potential theranostic agents in the future. © 2016 The Authors. Biopolymers Published by Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 673-684, 2016. © 2016 The Authors. Biopolymers Published by Wiley Periodicals, Inc.

  19. A kingdom-specific protein domain HMM library for improved annotation of fungal genomes

    Directory of Open Access Journals (Sweden)

    Oliver Stephen G

    2007-04-01

    Full Text Available Abstract Background Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs, which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. Results Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. Conclusion Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam

  20. pileup.js: a JavaScript library for interactive and in-browser visualization of genomic data.

    Science.gov (United States)

    Vanderkam, Dan; Aksoy, B Arman; Hodes, Isaac; Perrone, Jaclyn; Hammerbacher, Jeff

    2016-08-01

    P: ileup.js is a new browser-based genome viewer. It is designed to facilitate the investigation of evidence for genomic variants within larger web applications. It takes advantage of recent developments in the JavaScript ecosystem to provide a modular, reliable and easily embedded library. The code and documentation for pileup.js is publicly available at https://github.com/hammerlab/pileup.js under the Apache 2.0 license. correspondence@hammerlab.org. © The Author 2016. Published by Oxford University Press.

  1. Meeting the Information Needs of Interdisciplinary Scholars: Issues for Administrators of Large University Libraries.

    Science.gov (United States)

    Searing, Susan E.

    1996-01-01

    Provides an overview of administrative issues in supporting interdisciplinary library use at large universities. Topics include information resources; cataloging and classification; library services to users, including library use education and reference services; library organization; the campus context; and the politics of interdisciplinarity.…

  2. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  3. From human monocytes to genome-wide binding sites--a protocol for small amounts of blood: monocyte isolation/ChIP-protocol/library amplification/genome wide computational data analysis.

    Directory of Open Access Journals (Sweden)

    Sebastian Weiterer

    Full Text Available Chromatin immunoprecipitation in combination with a genome-wide analysis via high-throughput sequencing is the state of the art method to gain genome-wide representation of histone modification or transcription factor binding profiles. However, chromatin immunoprecipitation analysis in the context of human experimental samples is limited, especially in the case of blood cells. The typically extremely low yields of precipitated DNA are usually not compatible with library amplification for next generation sequencing. We developed a highly reproducible protocol to present a guideline from the first step of isolating monocytes from a blood sample to analyse the distribution of histone modifications in a genome-wide manner.The protocol describes the whole work flow from isolating monocytes from human blood samples followed by a high-sensitivity and small-scale chromatin immunoprecipitation assay with guidance for generating libraries compatible with next generation sequencing from small amounts of immunoprecipitated DNA.

  4. GDC 2: Compression of large collections of genomes.

    Science.gov (United States)

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-06-25

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about.

  5. A High-Quality Reference Genome for the Invasive Mosquitofish Gambusia affinis Using a Chicago Library

    Directory of Open Access Journals (Sweden)

    Sandra L. Hoffberg

    2018-06-01

    Full Text Available The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species.

  6. Reliability analysis of the Ahringer Caenorhabditis elegans RNAi feeding library: a guide for genome-wide screens

    Directory of Open Access Journals (Sweden)

    Lu Yiming

    2011-03-01

    Full Text Available Abstract Background The Ahringer C. elegans RNAi feeding library prepared by cloning genomic DNA fragments has been widely used in genome-wide analysis of gene function. However, the library has not been thoroughly validated by direct sequencing, and there are potential errors, including: 1 mis-annotation (the clone with the retired gene name should be remapped to the actual target gene; 2 nonspecific PCR amplification; 3 cross-RNAi; 4 mis-operation such as sample loading error, etc. Results Here we performed a reliability analysis on the Ahringer C. elegans RNAi feeding library, which contains 16,256 bacterial strains, using a bioinformatics approach. Results demonstrated that most (98.3% of the bacterial strains in the library are reliable. However, we also found that 2,851 (17.54% bacterial strains need to be re-annotated even they are reliable. Most of these bacterial strains are the clones having the retired gene names. Besides, 28 strains are grouped into unreliable category and 226 strains are marginal because of probably expressing unrelated double-stranded RNAs (dsRNAs. The accuracy of the prediction was further confirmed by direct sequencing analysis of 496 bacterial strains. Finally, a freely accessible database named CelRNAi (http://biocompute.bmi.ac.cn/CelRNAi/ was developed as a valuable complement resource for the feeding RNAi library by providing the predicted information on all bacterial strains. Moreover, submission of the direct sequencing result or any other annotations for the bacterial strains to the database are allowed and will be integrated into the CelRNAi database to improve the accuracy of the library. In addition, we provide five candidate primer sets for each of the unreliable and marginal bacterial strains for users to construct an alternative vector for their own RNAi studies. Conclusions Because of the potential unreliability of the Ahringer C. elegans RNAi feeding library, we strongly suggest the user examine

  7. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    Science.gov (United States)

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  8. Design of focused and restrained subsets from extremely large virtual libraries.

    Science.gov (United States)

    Jamois, Eric A; Lin, Chien T; Waldman, Marvin

    2003-11-01

    With the current and ever-growing offering of reagents along with the vast palette of organic reactions, virtual libraries accessible to combinatorial chemists can reach sizes of billions of compounds or more. Extracting practical size subsets for experimentation has remained an essential step in the design of combinatorial libraries. A typical approach to computational library design involves enumeration of structures and properties for the entire virtual library, which may be unpractical for such large libraries. This study describes a new approach termed as on the fly optimization (OTFO) where descriptors are computed as needed within the subset optimization cycle and without intermediate enumeration of structures. Results reported herein highlight the advantages of coupling an ultra-fast descriptor calculation engine to subset optimization capabilities. We also show that enumeration of properties for the entire virtual library may not only be unpractical but also wasteful. Successful design of focused and restrained subsets can be achieved while sampling only a small fraction of the virtual library. We also investigate the stability of the method and compare results obtained from simulated annealing (SA) and genetic algorithms (GA).

  9. An improved yeast transformation method for the generation of very large human antibody libraries.

    Science.gov (United States)

    Benatuil, Lorenzo; Perez, Jennifer M; Belk, Jonathan; Hsieh, Chung-Ming

    2010-04-01

    Antibody library selection by yeast display technology is an efficient and highly sensitive method to identify binders to target antigens. This powerful selection tool, however, is often hampered by the typically modest size of yeast libraries (approximately 10(7)) due to the limited yeast transformation efficiency, and the full potential of the yeast display technology for antibody discovery and engineering can only be realized if it can be coupled with a mean to generate very large yeast libraries. We describe here a yeast transformation method by electroporation that allows for the efficient generation of large antibody libraries up to 10(10) in size. Multiple components and conditions including CaCl(2), MgCl(2), sucrose, sorbitol, lithium acetate, dithiothreitol, electroporation voltage, DNA input and cell volume have been tested to identify the best combination. By applying this developed protocol, we have constructed a 1.4 x 10(10) human spleen antibody library essentially in 1 day with a transformation efficiency of 1-1.5 x 10(8) transformants/microg vector DNA. Taken together, we have developed a highly efficient yeast transformation method that enables the generation of very large and productive human antibody libraries for antibody discovery, and we are now routinely making 10(9) libraries in a day for antibody engineering purposes.

  10. CRISPR library designer (CLD): software for multispecies design of single guide RNA libraries.

    Science.gov (United States)

    Heigwer, Florian; Zhan, Tianzuo; Breinig, Marco; Winter, Jan; Brügemann, Dirk; Leible, Svenja; Boutros, Michael

    2016-03-24

    Genetic screens using CRISPR/Cas9 are a powerful method for the functional analysis of genomes. Here we describe CRISPR library designer (CLD), an integrated bioinformatics application for the design of custom single guide RNA (sgRNA) libraries for all organisms with annotated genomes. CLD is suitable for the design of libraries using modified CRISPR enzymes and targeting non-coding regions. To demonstrate its utility, we perform a pooled screen for modulators of the TNF-related apoptosis inducing ligand (TRAIL) pathway using a custom library of 12,471 sgRNAs. CLD predicts a high fraction of functional sgRNAs and is publicly available at https://github.com/boutroslab/cld.

  11. DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries.

    Science.gov (United States)

    Franzini, Raphael M; Neri, Dario; Scheuermann, Jörg

    2014-04-15

    DNA-encoded chemical libraries (DECLs) represent a promising tool in drug discovery. DECL technology allows the synthesis and screening of chemical libraries of unprecedented size at moderate costs. In analogy to phage-display technology, where large antibody libraries are displayed on the surface of filamentous phage and are genetically encoded in the phage genome, DECLs feature the display of individual small organic chemical moieties on DNA fragments serving as amplifiable identification barcodes. The DNA-tag facilitates the synthesis and allows the simultaneous screening of very large sets of compounds (up to billions of molecules), because the hit compounds can easily be identified and quantified by PCR-amplification of the DNA-barcode followed by high-throughput DNA sequencing. Several approaches have been used to generate DECLs, differing both in the methods used for library encoding and for the combinatorial assembly of chemical moieties. For example, DECLs can be used for fragment-based drug discovery, displaying a single molecule on DNA or two chemical moieties at the extremities of complementary DNA strands. DECLs can vary substantially in the chemical structures and the library size. While ultralarge libraries containing billions of compounds have been reported containing four or more sets of building blocks, also smaller libraries have been shown to be efficient for ligand discovery. In general, it has been found that the overall library size is a poor predictor for library performance and that the number and diversity of the building blocks are rather important indicators. Smaller libraries consisting of two to three sets of building blocks better fulfill the criteria of drug-likeness and often have higher quality. In this Account, we present advances in the DECL field from proof-of-principle studies to practical applications for drug discovery, both in industry and in academia. DECL technology can yield specific binders to a variety of target

  12. Construction and characterization of two BAC libraries representing a deep-coverage of the genome of chicory (Cichorium intybus L., Asteraceae

    Directory of Open Access Journals (Sweden)

    Gonthier Lucy

    2010-08-01

    Full Text Available Abstract Background The Asteraceae represents an important plant family with respect to the numbers of species present in the wild and used by man. Nonetheless, genomic resources for Asteraceae species are relatively underdeveloped, hampering within species genetic studies as well as comparative genomics studies at the family level. So far, six BAC libraries have been described for the main crops of the family, i.e. lettuce and sunflower. Here we present the characterization of BAC libraries of chicory (Cichorium intybus L. constructed from two genotypes differing in traits related to sexual and vegetative reproduction. Resolving the molecular mechanisms underlying traits controlling the reproductive system of chicory is a key determinant for hybrid development, and more generally will provide new insights into these traits, which are poorly investigated so far at the molecular level in Asteraceae. Findings Two bacterial artificial chromosome (BAC libraries, CinS2S2 and CinS1S4, were constructed from HindIII-digested high molecular weight DNA of the contrasting genotypes C15 and C30.01, respectively. C15 was hermaphrodite, non-embryogenic, and S2S2 for the S-locus implicated in self-incompatibility, whereas C30.01 was male sterile, embryogenic, and S1S4. The CinS2S2 and CinS1S4 libraries contain 89,088 and 81,408 clones. Mean insert sizes of the CinS2S2 and CinS1S4 clones are 90 and 120 kb, respectively, and provide together a coverage of 12.3 haploid genome equivalents. Contamination with mitochondrial and chloroplast DNA sequences was evaluated with four mitochondrial and four chloroplast specific probes, and was estimated to be 0.024% and 1.00% for the CinS2S2 library, and 0.028% and 2.35% for the CinS1S4 library. Using two single copy genes putatively implicated in somatic embryogenesis, screening of both libraries resulted in detection of 12 and 13 positive clones for each gene, in accordance with expected numbers. Conclusions This

  13. Construction and characterization of two BAC libraries representing a deep-coverage of the genome of chicory (Cichorium intybus L., Asteraceae).

    Science.gov (United States)

    Gonthier, Lucy; Bellec, Arnaud; Blassiau, Christelle; Prat, Elisa; Helmstetter, Nicolas; Rambaud, Caroline; Huss, Brigitte; Hendriks, Theo; Bergès, Hélène; Quillet, Marie-Christine

    2010-08-11

    The Asteraceae represents an important plant family with respect to the numbers of species present in the wild and used by man. Nonetheless, genomic resources for Asteraceae species are relatively underdeveloped, hampering within species genetic studies as well as comparative genomics studies at the family level. So far, six BAC libraries have been described for the main crops of the family, i.e. lettuce and sunflower. Here we present the characterization of BAC libraries of chicory (Cichorium intybus L.) constructed from two genotypes differing in traits related to sexual and vegetative reproduction. Resolving the molecular mechanisms underlying traits controlling the reproductive system of chicory is a key determinant for hybrid development, and more generally will provide new insights into these traits, which are poorly investigated so far at the molecular level in Asteraceae. Two bacterial artificial chromosome (BAC) libraries, CinS2S2 and CinS1S4, were constructed from HindIII-digested high molecular weight DNA of the contrasting genotypes C15 and C30.01, respectively. C15 was hermaphrodite, non-embryogenic, and S2S2 for the S-locus implicated in self-incompatibility, whereas C30.01 was male sterile, embryogenic, and S1S4. The CinS2S2 and CinS1S4 libraries contain 89,088 and 81,408 clones. Mean insert sizes of the CinS2S2 and CinS1S4 clones are 90 and 120 kb, respectively, and provide together a coverage of 12.3 haploid genome equivalents. Contamination with mitochondrial and chloroplast DNA sequences was evaluated with four mitochondrial and four chloroplast specific probes, and was estimated to be 0.024% and 1.00% for the CinS2S2 library, and 0.028% and 2.35% for the CinS1S4 library. Using two single copy genes putatively implicated in somatic embryogenesis, screening of both libraries resulted in detection of 12 and 13 positive clones for each gene, in accordance with expected numbers. This indicated that both BAC libraries are valuable tools for molecular

  14. Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

    Science.gov (United States)

    Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

    2014-12-01

    Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.

  15. Largenet2: an object-oriented programming library for simulating large adaptive networks.

    Science.gov (United States)

    Zschaler, Gerd; Gross, Thilo

    2013-01-15

    The largenet2 C++ library provides an infrastructure for the simulation of large dynamic and adaptive networks with discrete node and link states. The library is released as free software. It is available at http://biond.github.com/largenet2. Largenet2 is licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License. gerd@biond.org

  16. Democratizing Human Genome Project Information: A Model Program for Education, Information and Debate in Public Libraries.

    Science.gov (United States)

    Pollack, Miriam

    The "Mapping the Human Genome" project demonstrated that librarians can help whomever they serve in accessing information resources in the areas of biological and health information, whether it is the scientists who are developing the information or a member of the public who is using the information. Public libraries can guide library…

  17. Developing new microsatellite markers in walnut (Juglans regia L.) from Juglans nigra genomic GA enriched library

    Science.gov (United States)

    Hayat Topcu; Nergiz Coban; Keith Woeste; Mehmet Sutyemez; Salih. Kafkas

    2015-01-01

    We attempted to develop new polymorphic SSR primer pairs in walnut using sequences derived from Juglans nigra L. genomic enriched library with GA repeat. The designed 94 SSR primer pairs were subjected to gradient PCR in 12 walnut cultivars to determine their optimum annealing temperatures and to determine whether they produce bands. Then, the...

  18. Long-Term Protective Immune Response Elicited by Vaccination with an Expression Genomic Library of Toxoplasma gondii

    OpenAIRE

    Fachado, Alberto; Rodriguez, Alexandro; Molina, Judith; Silvério, Jaline C.; Marino, Ana P. M. P.; Pinto, Luzia M. O.; Angel, Sergio O.; Infante, Juan F.; Traub-Cseko, Yara; Amendoeira, Regina R.; Lannes-Vieira, Joseli

    2003-01-01

    Immunization of BALB/c mice with an expression genomic library of Toxoplasma gondii induces a Th1-type immune response, with recognition of several T. gondii proteins (21 to 117 kDa) and long-term protective immunity against a lethal challenge. These results support further investigations to achieve a multicomponent anti-T. gondii DNA vaccine.

  19. Construction of a 7-fold BAC library and cytogenetic mapping of 10 genes in the giant panda (Ailuropoda melanoleuca

    Directory of Open Access Journals (Sweden)

    Zhang Ying

    2006-11-01

    Full Text Available Abstract Background The giant panda, one of the most primitive carnivores, is an endangered animal. Although it has been the subject of many interesting studies during recent years, little is known about its genome. In order to promote research on this genome, a bacterial artificial chromosome (BAC library of the giant panda was constructed in this study. Results This BAC library contains 198,844 clones with an average insert size of 108 kb, which represents approximately seven equivalents of the giant panda haploid genome. Screening the library with 15 genes and 8 microsatellite markers demonstrates that it is representative and has good genome coverage. Furthermore, ten BAC clones harbouring AGXT, GHR, FSHR, IRBP, SOX14, TTR, BDNF, NT-4, LH and ZFX1 were mapped to 8 pairs of giant panda chromosomes by fluorescence in situ hybridization (FISH. Conclusion This is the first large-insert genomic DNA library for the giant panda, and will contribute to understanding this endangered species in the areas of genome sequencing, physical mapping, gene cloning and comparative genomic studies. We also identified the physical locations of ten genes on their relative chromosomes by FISH, providing a preliminary framework for further development of a high resolution cytogenetic map of the giant panda.

  20. A generic library for large scale solution of PDEs on modern heterogeneous architectures

    DEFF Research Database (Denmark)

    Glimberg, Stefan Lemvig; Engsig-Karup, Allan Peter

    2012-01-01

    Adapting to new programming models for modern multi- and many-core architectures requires code-rewriting and changing algorithms and data structures, in order to achieve good efficiency and scalability. We present a generic library for solving large scale partial differential equations (PDEs......), capable of utilizing heterogeneous CPU/GPU environments. The library can be used for fast proto-typing of PDE solvers, based on finite difference approximations of spatial derivatives in one, two, or three dimensions. In order to efficiently solve large scale problems, we keep memory consumption...... and memory access low, using a low-storage implementation of flexible-order finite difference operators. We will illustrate the use of library components by assembling such matrix-free operators to be used with one of the supported iterative solvers, such as GMRES, CG, Multigrid or Defect Correction...

  1. Rapid Genome-wide Single Nucleotide Polymorphism Discovery in Soybean and Rice via Deep Resequencing of Reduced Representation Libraries with the Illumina Genome Analyzer

    Directory of Open Access Journals (Sweden)

    Stéphane Deschamps

    2010-07-01

    Full Text Available Massively parallel sequencing platforms have allowed for the rapid discovery of single nucleotide polymorphisms (SNPs among related genotypes within a species. We describe the creation of reduced representation libraries (RRLs using an initial digestion of nuclear genomic DNA with a methylation-sensitive restriction endonuclease followed by a secondary digestion with the 4bp-restriction endonuclease This strategy allows for the enrichment of hypomethylated genomic DNA, which has been shown to be rich in genic sequences, and the digestion with serves to increase the number of common loci resequenced between individuals. Deep resequencing of these RRLs performed with the Illumina Genome Analyzer led to the identification of 2618 SNPs in rice and 1682 SNPs in soybean for two representative genotypes in each of the species. A subset of these SNPs was validated via Sanger sequencing, exhibiting validation rates of 96.4 and 97.0%, in rice ( and soybean (, respectively. Comparative analysis of the read distribution relative to annotated genes in the reference genome assemblies indicated that the RRL strategy was primarily sampling within genic regions for both species. The massively parallel sequencing of methylation-sensitive RRLs for genome-wide SNP discovery can be applied across a wide range of plant species having sufficient reference genomic sequence.

  2. Efficient assembly of de novo human artificial chromosomes from large genomic loci

    Directory of Open Access Journals (Sweden)

    Stromberg Gregory

    2005-07-01

    Full Text Available Abstract Background Human Artificial Chromosomes (HACs are potentially useful vectors for gene transfer studies and for functional annotation of the genome because of their suitability for cloning, manipulating and transferring large segments of the genome. However, development of HACs for the transfer of large genomic loci into mammalian cells has been limited by difficulties in manipulating high-molecular weight DNA, as well as by the low overall frequencies of de novo HAC formation. Indeed, to date, only a small number of large (>100 kb genomic loci have been reported to be successfully packaged into de novo HACs. Results We have developed novel methodologies to enable efficient assembly of HAC vectors containing any genomic locus of interest. We report here the creation of a novel, bimolecular system based on bacterial artificial chromosomes (BACs for the construction of HACs incorporating any defined genomic region. We have utilized this vector system to rapidly design, construct and validate multiple de novo HACs containing large (100–200 kb genomic loci including therapeutically significant genes for human growth hormone (HGH, polycystic kidney disease (PKD1 and ß-globin. We report significant differences in the ability of different genomic loci to support de novo HAC formation, suggesting possible effects of cis-acting genomic elements. Finally, as a proof of principle, we have observed sustained ß-globin gene expression from HACs incorporating the entire 200 kb ß-globin genomic locus for over 90 days in the absence of selection. Conclusion Taken together, these results are significant for the development of HAC vector technology, as they enable high-throughput assembly and functional validation of HACs containing any large genomic locus. We have evaluated the impact of different genomic loci on the frequency of HAC formation and identified segments of genomic DNA that appear to facilitate de novo HAC formation. These genomic loci

  3. Analysis of an RNA-seq Strand-Specific Library from an East Timorese Cucumber Sample Reveals a Complete Cucurbit aphid-borne yellows virus Genome.

    Science.gov (United States)

    Maina, Solomon; Edwards, Owain R; de Almeida, Luis; Ximenes, Abel; Jones, Roger A C

    2017-05-11

    Analysis of an RNA-seq library from cucumber leaf RNA extracted from a fast technology for analysis of nucleic acids (FTA) card revealed the first complete genome of Cucurbit aphid-borne yellows virus (CABYV) from East Timor. We compare it with 35 complete CABYV genomes from other world regions. It most resembled the genome of the South Korean isolate HD118. Copyright © 2017 Maina et al.

  4. Genomic Selection Using Extreme Phenotypes and Pre-Selection of SNPs in Large Yellow Croaker (Larimichthys crocea).

    Science.gov (United States)

    Dong, Linsong; Xiao, Shijun; Chen, Junwei; Wan, Liang; Wang, Zhiyong

    2016-10-01

    Genomic selection (GS) is an effective method to improve predictive accuracies of genetic values. However, high cost in genotyping will limit the application of this technology in some species. Therefore, it is necessary to find some methods to reduce the genotyping costs in genomic selection. Large yellow croaker is one of the most commercially important marine fish species in southeast China and Eastern Asia. In this study, genotyping-by-sequencing was used to construct the libraries for the NGS sequencing and find 29,748 SNPs in the genome. Two traits, eviscerated weight (EW) and the ratio between eviscerated weight and whole body weight (REW), were chosen to study. Two strategies to reduce the costs were proposed as follows: selecting extreme phenotypes (EP) for genotyping in reference population or pre-selecting SNPs to construct low-density marker panels in candidates. Three methods of pre-selection of SNPs, i.e., pre-selecting SNPs by absolute effects (SE), by single marker analysis (SMA), and by fixed intervals of sequence number (EL), were studied. The results showed that using EP was a feasible method to save the genotyping costs in reference population. Heritability did not seem to have obvious influences on the predictive abilities estimated by EP. Using SMA was the most feasible method to save the genotyping costs in candidates. In addition, the combination of EP and SMA in genomic selection also showed good results, especially for trait of REW. We also described how to apply the new methods in genomic selection and compared the genotyping costs before and after using the new methods. Our study may not only offer a reference for aquatic genomic breeding but also offer a reference for genomic prediction in other species including livestock and plants, etc.

  5. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.

    Directory of Open Access Journals (Sweden)

    Andaine Seguin-Orlando

    Full Text Available Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

  6. Utilizing Web 2.0 Technologies for Library Web Tutorials: An Examination of Instruction on Community College Libraries' Websites Serving Large Student Bodies

    Science.gov (United States)

    Blummer, Barbara; Kenton, Jeffrey M.

    2015-01-01

    This is the second part of a series on Web 2.0 tools available from community college libraries' Websites. The first article appeared in an earlier volume of this journal and it illustrated the wide variety of Web 2.0 tools on community college libraries' Websites serving large student bodies (Blummer and Kenton 2014). The research found many of…

  7. GBParsy: A GenBank flatfile parser library with high speed

    Directory of Open Access Journals (Sweden)

    Kim Yeon-Ki

    2008-07-01

    Full Text Available Abstract Background GenBank flatfile (GBF format is one of the most popular sequence file formats because of its detailed sequence features and ease of readability. To use the data in the file by a computer, a parsing process is required and is performed according to a given grammar for the sequence and the description in a GBF. Currently, several parser libraries for the GBF have been developed. However, with the accumulation of DNA sequence information from eukaryotic chromosomes, parsing a eukaryotic genome sequence with these libraries inevitably takes a long time, due to the large GBF file and its correspondingly large genomic nucleotide sequence and related feature information. Thus, there is significant need to develop a parsing program with high speed and efficient use of system memory. Results We developed a library, GBParsy, which was C language-based and parses GBF files. The parsing speed was maximized by using content-specified functions in place of regular expressions that are flexible but slow. In addition, we optimized an algorithm related to memory usage so that it also increased parsing performance and efficiency of memory usage. GBParsy is at least 5 - 100× faster than current parsers in benchmark tests. Conclusion GBParsy is estimated to extract annotated information from almost 100 Mb of a GenBank flatfile for chromosomal sequence information within a second. Thus, it should be used for a variety of applications such as on-time visualization of a genome at a web site.

  8. Observing copepods through a genomic lens

    Directory of Open Access Journals (Sweden)

    Johnson Stewart C

    2011-09-01

    Full Text Available Abstract Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to

  9. Observing copepods through a genomic lens

    Science.gov (United States)

    2011-01-01

    Background Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited. Discussion The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to provide genomics tools for

  10. Physical Analysis of the Complex Rye (Secale cereale L.) Alt4 Aluminium (Aluminum) Tolerance Locus Using a Whole-Genome BAC Library of Rye cv. Blanco

    Science.gov (United States)

    Rye is a diploid crop species with many outstanding qualities, and is also important as a source of new traits for wheat and triticale improvement. Here we describe a BAC library of rye cv. Blanco, representing a valuable resource for rye molecular genetic studies. The library provides a 6 × genome ...

  11. Technique for simultaneous adjustment of large nuclear data libraries

    International Nuclear Information System (INIS)

    Harris, D.R.; Wilson, W.B.

    1975-01-01

    Adjustment of the nuclear data base to agree with integral observations in design work has been limited in part by problems in the required inversion of matrices. It is shown that this inversion problem can be circumvented and arbitrarily large nuclear data libraries can be adjusted simultaneously when the basic data are uncorrelated. The technique is illustrated by adjusting nuclear data to integral observations made on fast reactor benchmark critical assemblies. 3 tables

  12. Chromosome microdissection and cloning in human genome and genetic disease analysis

    International Nuclear Information System (INIS)

    Kao, Faten; Yu, Jingwei

    1991-01-01

    A procedure has been described for microdissection and microcloning of human chromosomal DNA sequences in which universal amplification of the dissected fragments by Mbo I linker adaptor and polymerase chain reaction is used. A very large library comprising 700,000 recombinant plasmid microclones from 30 dissected chromosomes of human chromosome 21 was constructed. Colony hybridization showed that 42% of the clones contained repetitive sequences and 58% contained single or low-copy sequences. The insert sizes generated by complete Mbo I cleavage ranged from 50 to 1,100 base pairs with a mean of 416 base pairs. Southern blot analysis of microclones from the library confirmed their human origin and chromosome 21 specificity. Some of these clones have also been regionally mapped to specific sites of chromosome 21 by using a regional mapping panel of cell hybrids. This chromosome microtechnology can generate large numbers of microclones with unique sequences from defined chromosomal regions and can be used for processes such as (i) isolating corresponding yeast artificial chromosome clones with large inserts, (ii) screening various cDNA libraries for isolating expressed sequences, and (iii) constructing region-specific libraries of the entire human genome. The studies described here demonstrate the power of this technology for high-resolution genome analysis and explicate their use in an efficient search for disease-associated genes localized to specific chromosomal regions

  13. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  14. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    Science.gov (United States)

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.

  15. Map it @ WSU: Development of a Library Mapping System for Large Academic Libraries

    Directory of Open Access Journals (Sweden)

    Paul Gallagher

    2010-06-01

    Full Text Available The Wayne State Library System launched its library mapping application in February 2010, designed to help locate materials in the five WSU libraries. The system works within the catalog to show the location of materials, as well as provides a web form for use at the reference desk. Developed using PHP and MySQL, it requires only minimal effort to update using a unique call number overlay mechanism. In addition to mapping shelved materials, the system provides information for any of the over three hundred collections held by the WSU Libraries. Patrons can do more than just locate a book on a shelf: they can learn where to locate reserve items, how to access closed collections, or get driving maps to extension center libraries. The article includes a discussion of the technology reviewed and chosen during development, an overview of the system architecture, and lessons learned during development.

  16. Estimating P-coverage of biosynthetic pathways in DNA libraries and screening by genetic selection: biotin biosynthesis in the marine microorganism Chromohalobacter.

    Science.gov (United States)

    Kim, Eun Jin; Angell, Scott; Janes, Jeff; Watanabe, Coran M H

    2008-06-01

    Traditional approaches to natural product discovery involve cell-based screening of natural product extracts followed by compound isolation and characterization. Their importance notwithstanding, continued mining leads to depletion of natural resources and the reisolation of previously identified metabolites. Metagenomic strategies aimed at localizing the biosynthetic cluster genes and expressing them in surrogate hosts offers one possible alternative. A fundamental question that naturally arises when pursuing such a strategy is, how large must the genomic library be to effectively represent the genome of an organism(s) and the biosynthetic gene clusters they harbor? Such an issue is certainly augmented in the absence of expensive robotics to expedite colony picking and/or screening of clones. We have developed an algorism, named BPC (biosynthetic pathway coverage), supported by molecular simulations to deduce the number of BAC clones required to achieve proper coverage of the genome and their respective biosynthetic pathways. The strategy has been applied to the construction of a large-insert BAC library from a marine microorganism, Hon6 (isolated from Honokohau, Maui) thought to represent a new species. The genomic library is constructed with a BAC yeast shuttle vector pClasper lacZ paving the way for the culturing of libraries in both prokaryotic and eukaryotic hosts. Flow cytometric methods are utilized to estimate the genome size of the organism and BPC implemented to assess P-coverage or percent coverage. A genetic selection strategy is illustrated, applications of which could expedite screening efforts in the identification and localization of biosynthetic pathways from marine microbial consortia, offering a powerful complement to genome sequencing and degenerate probe strategies. Implementing this approach, we report on the biotin biosynthetic pathway from the marine microorganism Hon6.

  17. Genomic characterization of large heterochromatic gaps in the human genome assembly.

    Directory of Open Access Journals (Sweden)

    Nicolas Altemose

    2014-05-01

    Full Text Available The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3. The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

  18. A microfluidic DNA library preparation platform for next-generation sequencing.

    Science.gov (United States)

    Kim, Hanyoup; Jebrail, Mais J; Sinha, Anupama; Bent, Zachary W; Solberg, Owen D; Williams, Kelly P; Langevin, Stanley A; Renzi, Ronald F; Van De Vreugde, James L; Meagher, Robert J; Schoeniger, Joseph S; Lane, Todd W; Branda, Steven S; Bartsch, Michael S; Patel, Kamlesh D

    2013-01-01

    Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  19. A microfluidic DNA library preparation platform for next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Hanyoup Kim

    Full Text Available Next-generation sequencing (NGS is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM. The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.

  20. Optimization and quality control of genome-wide Hi-C library preparation.

    Science.gov (United States)

    Zhang, Xiang-Yuan; He, Chao; Ye, Bing-Yu; Xie, De-Jian; Shi, Ming-Lei; Zhang, Yan; Shen, Wen-Long; Li, Ping; Zhao, Zhi-Hu

    2017-09-20

    Highest-throughput chromosome conformation capture (Hi-C) is one of the key assays for genome- wide chromatin interaction studies. It is a time-consuming process that involves many steps and many different kinds of reagents, consumables, and equipments. At present, the reproducibility is unsatisfactory. By optimizing the key steps of the Hi-C experiment, such as crosslinking, pretreatment of digestion, inactivation of restriction enzyme, and in situ ligation etc., we established a robust Hi-C procedure and prepared two biological replicates of Hi-C libraries from the GM12878 cells. After preliminary quality control by Sanger sequencing, the two replicates were high-throughput sequenced. The bioinformatics analysis of the raw sequencing data revealed the mapping-ability and pair-mate rate of the raw data were around 90% and 72%, respectively. Additionally, after removal of self-circular ligations and dangling-end products, more than 96% of the valid pairs were reached. Genome-wide interactome profiling shows clear topological associated domains (TADs), which is consistent with previous reports. Further correlation analysis showed that the two biological replicates strongly correlate with each other in terms of both bin coverage and all bin pairs. All these results indicated that the optimized Hi-C procedure is robust and stable, which will be very helpful for the wide applications of the Hi-C assay.

  1. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  2. Cpf1-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cpf1.

    Science.gov (United States)

    Park, Jeongbin; Bae, Sangsu

    2018-03-15

    Following the type II CRISPR-Cas9 system, type V CRISPR-Cpf1 endonucleases have been found to be applicable for genome editing in various organisms in vivo. However, there are as yet no web-based tools capable of optimally selecting guide RNAs (gRNAs) among all possible genome-wide target sites. Here, we present Cpf1-Database, a genome-wide gRNA library design tool for LbCpf1 and AsCpf1, which have DNA recognition sequences of 5'-TTTN-3' at the 5' ends of target sites. Cpf1-Database provides a sophisticated but simple way to design gRNAs for AsCpf1 nucleases on the genome scale. One can easily access the data using a straightforward web interface, and using the powerful collections feature one can easily design gRNAs for thousands of genes in short time. Free access at http://www.rgenome.net/cpf1-database/. sangsubae@hanyang.ac.kr.

  3. Long span DNA paired-end-tag (DNA-PET sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons.

    Directory of Open Access Journals (Sweden)

    Fei Yao

    Full Text Available Structural variations (SVs contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.

  4. Kernel methods for large-scale genomic data analysis

    Science.gov (United States)

    Xing, Eric P.; Schaid, Daniel J.

    2015-01-01

    Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today’s explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the outcome of interest. In this review, we highlight the potential key role it will have in modern genomic data processing, especially with regard to integration with classical methods for gene prioritizing, prediction and data fusion. PMID:25053743

  5. Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS

    Directory of Open Access Journals (Sweden)

    You Frank M

    2010-06-01

    Full Text Available Abstract Background Physical maps employing libraries of bacterial artificial chromosome (BAC clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of the D-genome of hexaploid wheat (Triticum aestivum, Aegilops tauschii, is used as a resource for wheat genomics. The barley diploid genome also provides a good model for the Triticeae and T. aestivum since it is only slightly larger than the ancestor wheat D genome. Gene co-linearity between the grasses can be exploited by extrapolating from rice and Brachypodium distachyon to Ae. tauschii or barley, and then to wheat. Results We report the use of Ae. tauschii for the construction of the physical map of a large distal region of chromosome arm 3DS. A physical map of 25.4 Mb was constructed by anchoring BAC clones of Ae. tauschii with 85 EST on the Ae. tauschii and barley genetic maps. The 24 contigs were aligned to the rice and B. distachyon genomic sequences and a high density SNP genetic map of barley. As expected, the mapped region is highly collinear to the orthologous chromosome 1 in rice, chromosome 2 in B. distachyon and chromosome 3H in barley. However, the chromosome scale of the comparative maps presented provides new insights into grass genome organization. The disruptions of the Ae. tauschii-rice and Ae. tauschii-Brachypodium syntenies were identical. We observed chromosomal rearrangements between Ae. tauschii and barley. The comparison of Ae. tauschii physical and genetic maps showed that the recombination rate across the region dropped from 2.19 cM/Mb in the distal region to 0.09 cM/Mb in the proximal region. The size of the gaps between contigs was evaluated by comparing the recombination rate along the map with the local recombination rates calculated on single contigs. Conclusions The physical map reported here is the first physical map using fingerprinting of a complete

  6. Large inserts for big data: artificial chromosomes in the genomic era.

    Science.gov (United States)

    Tocchetti, Arianna; Donadio, Stefano; Sosio, Margherita

    2018-05-01

    The exponential increase in available microbial genome sequences coupled with predictive bioinformatic tools is underscoring the genetic capacity of bacteria to produce an unexpected large number of specialized bioactive compounds. Since most of the biosynthetic gene clusters (BGCs) present in microbial genomes are cryptic, i.e. not expressed under laboratory conditions, a variety of cloning systems and vectors have been devised to harbor DNA fragments large enough to carry entire BGCs and to allow their transfer in suitable heterologous hosts. This minireview provides an overview of the vectors and approaches that have been developed for cloning large BGCs, and successful examples of heterologous expression.

  7. Begin at the beginning: A BAC-end view of the passion fruit (Passiflora) genome.

    Science.gov (United States)

    Santos, Anselmo Azevedo; Penha, Helen Alves; Bellec, Arnaud; Munhoz, Carla de Freitas; Pedrosa-Harand, Andrea; Bergès, Hélène; Vieira, Maria Lucia Carneiro

    2014-09-26

    The passion fruit (Passiflora edulis) is a tropical crop of economic importance both for juice production and consumption as fresh fruit. The juice is also used in concentrate blends that are consumed worldwide. However, very little is known about the genome of the species. Therefore, improving our understanding of passion fruit genomics is essential and to some degree a pre-requisite if its genetic resources are to be used more efficiently. In this study, we have constructed a large-insert BAC library and provided the first view on the structure and content of the passion fruit genome, using BAC-end sequence (BES) data as a major resource. The library consisted of 82,944 clones and its levels of organellar DNA were very low. The library represents six haploid genome equivalents, and the average insert size was 108 kb. To check its utility for gene isolation, successful macroarray screening experiments were carried out with probes complementary to eight Passiflora gene sequences available in public databases. BACs harbouring those genes were used in fluorescent in situ hybridizations and unique signals were detected for four BACs in three chromosomes (n=9). Then, we explored 10,000 BES and we identified reads likely to contain repetitive mobile elements (19.6% of all BES), simple sequence repeats and putative proteins, and to estimate the GC content (~42%) of the reads. Around 9.6% of all BES were found to have high levels of similarity to plant genes and ontological terms were assigned to more than half of the sequences analysed (940). The vast majority of the top-hits made by our sequences were to Populus trichocarpa (24.8% of the total occurrences), Theobroma cacao (21.6%), Ricinus communis (14.3%), Vitis vinifera (6.5%) and Prunus persica (3.8%). We generated the first large-insert library for a member of Passifloraceae. This BAC library provides a new resource for genetic and genomic studies, as well as it represents a valuable tool for future whole genome

  8. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    Energy Technology Data Exchange (ETDEWEB)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  9. Construction of a bacterial artificial chromosome library of S-type CMS maize mitochondria

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    In order to isolate mitochondrial genes easily, we have developed a new method to construct S-type CMS maize mitochondrial gene library by means of embedding mitochondria and enzymatic digesting mitochondria in situ, preparing mtDNA by electrophoresis, digesting LMP agarose with β-agarase, using BAC vector and electroporation. About 2 500 white clones of Mo17 CMS-J mitochondrial gene library were obtained with the average size of 18.24 kb, ranging from 5 to 40 kb, 63.6% inserts came from mitochondrial genome and represented 48 ′ mitochondrial genome equivalents. All the probes had detected the positive clones in the gene library. It is helpful to elucidating the maize mitochondrial genome structure and mechanism of S-type CMS, and may give some valuable reference to the construction of other plant mitochondrial genome library.

  10. A protocol for large scale genomic DNA isolation for cacao genetics ...

    African Journals Online (AJOL)

    Advances in DNA technology, such as marker assisted selection, detection of quantitative trait loci and genomic selection also require the isolation of DNA from a large number of samples and the preservation of tissue samples for future use in cacao genome studies. The present study proposes a method for the ...

  11. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  12. Indexes of large genome collections on a PC.

    Directory of Open Access Journals (Sweden)

    Agnieszka Danek

    Full Text Available The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16-32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online.

  13. Understanding role of genome dynamics in host adaptation of gut commensal, L. reuteri

    Directory of Open Access Journals (Sweden)

    Shikha Sharma

    2017-10-01

    Full Text Available Lactobacillus reuteri is a gram-positive gut commensal and exhibits noteworthy adaptation to its vertebrate hosts. Host adaptation is often driven by inter-strain genome dynamics like expansion of insertion sequences that lead to acquisition and loss of gene(s and creation of large dynamic regions. In this regard we carried in-house genome sequencing of large number of L. reuteri strains origination from human, chicken, pig and rodents. We further next generation sequence data in understanding invasion and expansion of an IS element in shaping genome of strains belonging to human associated lineage. Finally, we share our experience in high-throughput genomic library preparation and generating high quality sequence data of a very low GC bacterium like L. reuteri.

  14. Genic regions of a large salamander genome contain long introns and novel genes

    Directory of Open Access Journals (Sweden)

    Bryant Susan V

    2009-01-01

    Full Text Available Abstract Background The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp were isolated and sequenced to characterize the structure of genic regions. Results Annotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86% of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! Conclusion This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.

  15. Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes

    Science.gov (United States)

    Peng, Qian; Alekseyev, Max A.; Tesler, Glenn; Pevzner, Pavel A.

    The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications. We present a new algorithm based on the A-Bruijn graph framework that overcomes these difficulties and provides a unified approach to synteny block reconstruction for multiple genomes, and for genomes with large duplications.

  16. Undermethylated DNA as a source of microsatellites from a conifer genome.

    Science.gov (United States)

    Zhou, Y; Bui, T; Auckland, L D; Williams, C G

    2002-02-01

    Developing microsatellites from the large, highly duplicated conifer genome requires special tools. To improve the efficiency of developing Pinus taeda L. microsatellites, undermethylated (UM) DNA fragments were used to construct a microsatellite-enriched copy library. A methylation-sensitive restriction enzyme, McrBC, was used to enrich for UM DNA before library construction. Digested DNA fragments larger than 9 kb were then excised and digested with RsaI and used to construct nine dinucleotide and trinucleotide libraries. A total of 1016 microsatellite-positive clones were detected among 11 904 clones and 620 of these were unique. Of 245 primer sets that produced a PCR product, 113 could be developed as UM microsatellite markers and 70 were polymorphic. Inheritance and marker informativeness were tested for a random sample of 36 polymorphic markers using a three-generation outbred pedigree. Thirty-one microsatellites (86%) had single-locus inheritance despite the highly duplicated nature of the P. taeda genome. Nineteen UM microsatellites had highly informative intercross mating type configurations. Allele number and frequency were estimated for eleven UM microsatellites using a population survey. Allele numbers for these UM microsatellites ranged from 3 to 12 with an average of 5.7 alleles/locus. Frequencies for the 63 alleles were mostly in the low-common range; only 14 of the 63 were in the rare allele (q < 0.05) class. Enriching for UM DNA was an efficient method for developing polymorphic microsatellites from a large plant genome.

  17. Assembling large genomes: analysis of the stick insect (Clitarchus hookeri) genome reveals a high repeat content and sex-biased genes associated with reproduction.

    Science.gov (United States)

    Wu, Chen; Twort, Victoria G; Crowhurst, Ross N; Newcomb, Richard D; Buckley, Thomas R

    2017-11-16

    Stick insects (Phasmatodea) have a high incidence of parthenogenesis and other alternative reproductive strategies, yet the genetic basis of reproduction is poorly understood. Phasmatodea includes nearly 3000 species, yet only the genome of Timema cristinae has been published to date. Clitarchus hookeri is a geographical parthenogenetic stick insect distributed across New Zealand. Sexual reproduction dominates in northern habitats but is replaced by parthenogenesis in the south. Here, we present a de novo genome assembly of a female C. hookeri and use it to detect candidate genes associated with gamete production and development in females and males. We also explore the factors underlying large genome size in stick insects. The C. hookeri genome assembly was 4.2 Gb, similar to the flow cytometry estimate, making it the second largest insect genome sequenced and assembled to date. Like the large genome of Locusta migratoria, the genome of C. hookeri is also highly repetitive and the predicted gene models are much longer than those from most other sequenced insect genomes, largely due to longer introns. Miniature inverted repeat transposable elements (MITEs), absent in the much smaller T. cristinae genome, is the most abundant repeat type in the C. hookeri genome assembly. Mapping RNA-Seq reads from female and male gonadal transcriptomes onto the genome assembly resulted in the identification of 39,940 gene loci, 15.8% and 37.6% of which showed female-biased and male-biased expression, respectively. The genes that were over-expressed in females were mostly associated with molecular transportation, developmental process, oocyte growth and reproductive process; whereas, the male-biased genes were enriched in rhythmic process, molecular transducer activity and synapse. Several genes involved in the juvenile hormone synthesis pathway were also identified. The evolution of large insect genomes such as L. migratoria and C. hookeri genomes is most likely due to the

  18. Multidimensional scaling for large genomic data sets

    Directory of Open Access Journals (Sweden)

    Lu Henry

    2008-04-01

    Full Text Available Abstract Background Multi-dimensional scaling (MDS is aimed to represent high dimensional data in a low dimensional space with preservation of the similarities between data points. This reduction in dimensionality is crucial for analyzing and revealing the genuine structure hidden in the data. For noisy data, dimension reduction can effectively reduce the effect of noise on the embedded structure. For large data set, dimension reduction can effectively reduce information retrieval complexity. Thus, MDS techniques are used in many applications of data mining and gene network research. However, although there have been a number of studies that applied MDS techniques to genomics research, the number of analyzed data points was restricted by the high computational complexity of MDS. In general, a non-metric MDS method is faster than a metric MDS, but it does not preserve the true relationships. The computational complexity of most metric MDS methods is over O(N2, so that it is difficult to process a data set of a large number of genes N, such as in the case of whole genome microarray data. Results We developed a new rapid metric MDS method with a low computational complexity, making metric MDS applicable for large data sets. Computer simulation showed that the new method of split-and-combine MDS (SC-MDS is fast, accurate and efficient. Our empirical studies using microarray data on the yeast cell cycle showed that the performance of K-means in the reduced dimensional space is similar to or slightly better than that of K-means in the original space, but about three times faster to obtain the clustering results. Our clustering results using SC-MDS are more stable than those in the original space. Hence, the proposed SC-MDS is useful for analyzing whole genome data. Conclusion Our new method reduces the computational complexity from O(N3 to O(N when the dimension of the feature space is far less than the number of genes N, and it successfully

  19. Cell-free translational screening of an expression sequence tag library of Clonorchis sinensis for novel antigen discovery.

    Science.gov (United States)

    Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung

    2017-05-01

    The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:832-837, 2017. © 2017 American Institute of Chemical Engineers.

  20. CERN Library | Mario Campanelli presents "Inside CERN's Large Hadron Collider" | 16 March

    CERN Multimedia

    CERN Library

    2016-01-01

    "Inside CERN's Large Hadron Collider" by Mario Campanelli. Presentation on Wednesday, 16 March at 4 p.m. in the Library (bldg 52-1-052) The book aims to explain the historical development of particle physics, with special emphasis on CERN and collider physics. It describes in detail the LHC accelerator and its detectors, describing the science involved as well as the sociology of big collaborations, culminating with the discovery of the Higgs boson.  Inside CERN's Large Hadron Collider  Mario Campanelli World Scientific Publishing, 2015  ISBN 9789814656641​

  1. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    Science.gov (United States)

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. News from the Library: The 'long tail' Library

    CERN Multimedia

    CERN Library

    2012-01-01

    "The term 'long tail' has gained popularity in recent times as describing the retailing strategy of selling a large number of unique items with relatively small quantities sold of each usually in addition to selling fewer popular items in large quantities. The long tail was popularized by Chris Anderson, who mentioned Amazon.com, Apple and Yahoo! as examples of businesses applying this strategy." *   If we leave the business environment and move to the world of libraries, we still see this "long tail". Usually, only a small portion of a library's book collection accounts for the majority of its loans. On the other hand, there are a variety of "niche information needs" that might not be met, as libraries cannot afford to build up huge collections of documents available just-in-case. However, the networked environment of today's libraries can offer a solution. Online networks of libraries ca...

  3. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

    Science.gov (United States)

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  4. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  5. Large-scale genomic 2D visualization reveals extensive CG-AT skew correlation in bird genomes

    Directory of Open Access Journals (Sweden)

    Deng Xuemei

    2007-11-01

    Full Text Available Abstract Background Bird genomes have very different compositional structure compared with other warm-blooded animals. The variation in the base skew rules in the vertebrate genomes remains puzzling, but it must relate somehow to large-scale genome evolution. Current research is inclined to relate base skew with mutations and their fixation. Here we wish to explore base skew correlations in bird genomes, to develop methods for displaying and quantifying such correlations at different scales, and to discuss possible explanations for the peculiarities of the bird genomes in skew correlation. Results We have developed a method called Base Skew Double Triangle (BSDT for exhibiting the genome-scale change of AT/CG skew as a two-dimensional square picture, showing base skews at many scales simultaneously in a single image. By this method we found that most chicken chromosomes have high AT/CG skew correlation (symmetry in 2D picture, except for some microchromosomes. No other organisms studied (18 species show such high skew correlations. This visualized high correlation was validated by three kinds of quantitative calculations with overlapping and non-overlapping windows, all indicating that chicken and birds in general have a special genome structure. Similar features were also found in some of the mammal genomes, but clearly much weaker than in chickens. We presume that the skew correlation feature evolved near the time that birds separated from other vertebrate lineages. When we eliminated the repeat sequences from the genomes, the AT and CG skews correlation increased for some mammal genomes, but were still clearly lower than in chickens. Conclusion Our results suggest that BSDT is an expressive visualization method for AT and CG skew and enabled the discovery of the very high skew correlation in bird genomes; this peculiarity is worth further study. Computational analysis indicated that this correlation might be a compositional characteristic

  6. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  7. Small genomes and large seeds: chromosome numbers, genome size and seed mass in diploid Aesculus species (Sapindaceae).

    Science.gov (United States)

    Krahulcová, Anna; Trávnícek, Pavel; Krahulec, František; Rejmánek, Marcel

    2017-04-01

    Aesculus L. (horse chestnut, buckeye) is a genus of 12-19 extant woody species native to the temperate Northern Hemisphere. This genus is known for unusually large seeds among angiosperms. While chromosome counts are available for many Aesculus species, only one has had its genome size measured. The aim of this study is to provide more genome size data and analyse the relationship between genome size and seed mass in this genus. Chromosome numbers in root tip cuttings were confirmed for four species and reported for the first time for three additional species. Flow cytometric measurements of 2C nuclear DNA values were conducted on eight species, and mean seed mass values were estimated for the same taxa. The same chromosome number, 2 n = 40, was determined in all investigated taxa. Original measurements of 2C values for seven Aesculus species (eight taxa), added to just one reliable datum for A. hippocastanum , confirmed the notion that the genome size in this genus with relatively large seeds is surprisingly low, ranging from 0·955 pg 2C -1 in A. parviflora to 1·275 pg 2C -1 in A. glabra var. glabra. The chromosome number of 2 n = 40 seems to be conclusively the universal 2 n number for non-hybrid species in this genus. Aesculus genome sizes are relatively small, not only within its own family, Sapindaceae, but also within woody angiosperms. The genome sizes seem to be distinct and non-overlapping among the four major Aesculus clades. These results provide an extra support for the most recent reconstruction of Aesculus phylogeny. The correlation between the 2C values and seed masses in examined Aesculus species is slightly negative and not significant. However, when the four major clades are treated separately, there is consistent positive association between larger genome size and larger seed mass within individual lineages. © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For

  8. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    Science.gov (United States)

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  9. Chicken microsatellite markers isolated from libraries enriched for simple tandem repeats.

    Science.gov (United States)

    Gibbs, M; Dawson, D A; McCamley, C; Wardle, A F; Armour, J A; Burke, T

    1997-12-01

    The total number of microsatellite loci is considered to be at least 10-fold lower in avian species than in mammalian species. Therefore, efficient large-scale cloning of chicken microsatellites, as required for the construction of a high-resolution linkage map, is facilitated by the construction of libraries using an enrichment strategy. In this study, a plasmid library enriched for tandem repeats was constructed from chicken genomic DNA by hybridization selection. Using this technique the proportion of recombinant clones that cross-hybridized to probes containing simple tandem repeats was raised to 16%, compared with < 0.1% in a non-enriched library. Primers were designed from 121 different sequences. Polymerase chain reaction (PCR) analysis of two chicken reference pedigrees enabled 72 loci to be localized within the collaborative chicken genetic map, and at least 30 of the remaining loci have been shown to be informative in these or other crosses.

  10. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  11. DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.

    Science.gov (United States)

    Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin

    2016-01-01

    The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

  12. Econometric Analysis Suggests Possible Crowding Out of Public Libraries by Book Superstores among Middle Income Families in the 1990s. A review of: Hemmeter, Jeffrey A. “Household Use of Public Libraries and Large Bookstores.” Library & Information Science Research 28.4 (Sept. 2006: 595–616.

    Directory of Open Access Journals (Sweden)

    Stephanie Hall

    2007-09-01

    Full Text Available Objective – To determine the effect of large bookstores (defined as those having 20 or more employees on household library use. Design – Econometric analysis using crosssectional data sets. Setting – The United States of America. Subjects – People in over 55,000 households across the U.S.A. Methods – Data from three 1996 studies were examined using logit and multinomial logit estimation procedures: the NationalCenter for Education Statistics’ National Household Education Survey (NHES and Public Library Survey (PLS, and the U.S. Census Bureau’s County Business Patterns (CBP. The county level results of the NHEStelephone survey were merged with the county level data from the PLS and the CBP. Additionally, data on Internet use at the state level from the Statistical Abstract of the United States were incorporated into the data set. A logit regression model was used to estimate probability of library use based on several independent variables, evaluated at the mean. Main results – In general, Hemmeter found that "with regard to the impact of large bookstores on household library use, largebookstores do not appear to have an effect on overall library use among the general population” (613. While no significant changes in general library use were found among high and low income households where more large bookstores were present, nor in the population taken as a whole, middle income households (between $25,000 and $50,000 in annual income showed notable declines in library use in these situations. These effects were strongest in the areas of borrowing (200% less likely and recreational purposes (161%, but were also present in workrelated use and job searching. Hemmeter also writes that “poorer households use the library more often for job search purposes. The probability of library use for recreation,work, and consumer information increases as income increases. This effect diminishes as households get richer” (611. Finally, home

  13. Construction of high quality Gateway™ entry libraries and their application to yeast two-hybrid for the monocot model plant Brachypodium distachyon

    Directory of Open Access Journals (Sweden)

    Kumimoto Roderick W

    2011-05-01

    Full Text Available Abstract Background Monocots, especially the temperate grasses, represent some of the most agriculturally important crops for both current food needs and future biofuel development. Because most of the agriculturally important grass species are difficult to study (e.g., they often have large, repetitive genomes and can be difficult to grow in laboratory settings, developing genetically tractable model systems is essential. Brachypodium distachyon (hereafter Brachypodium is an emerging model system for the temperate grasses. To fully realize the potential of this model system, publicly accessible discovery tools are essential. High quality cDNA libraries that can be readily adapted for multiple downstream purposes are a needed resource. Additionally, yeast two-hybrid (Y2H libraries are an important discovery tool for protein-protein interactions and are not currently available for Brachypodium. Results We describe the creation of two high quality, publicly available Gateway™ cDNA entry libraries and their derived Y2H libraries for Brachypodium. The first entry library represents cloned cDNA populations from both short day (SD, 8/16-h light/dark and long day (LD, 20/4-h light/dark grown plants, while the second library was generated from hormone treated tissues. Both libraries have extensive genome coverage (~5 × 107 primary clones each and average clone lengths of ~1.5 Kb. These entry libraries were then used to create two recombination-derived Y2H libraries. Initial proof-of-concept screens demonstrated that a protein with known interaction partners could readily re-isolate those partners, as well as novel interactors. Conclusions Accessible community resources are a hallmark of successful biological model systems. Brachypodium has the potential to be a broadly useful model system for the grasses, but still requires many of these resources. The Gateway™ compatible entry libraries created here will facilitate studies for multiple user

  14. BFAST: an alignment tool for large scale genome resequencing.

    Directory of Open Access Journals (Sweden)

    Nils Homer

    2009-11-01

    Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.

  15. Identification of an extensive gene cluster among a family of PPOs in Trifolium pratense L. (red clover using a large insert BAC library

    Directory of Open Access Journals (Sweden)

    Thomas Ann

    2009-07-01

    Full Text Available Abstract Background Polyphenol oxidase (PPO activity in plants is a trait with potential economic, agricultural and environmental impact. In relation to the food industry, PPO-induced browning causes unacceptable discolouration in fruit and vegetables: from an agriculture perspective, PPO can protect plants against pathogens and environmental stress, improve ruminant growth by increasing nitrogen absorption and decreasing nitrogen loss to the environment through the animal's urine. The high PPO legume, red clover, has a significant economic and environmental role in sustaining low-input organic and conventional farms. Molecular markers for a range of important agricultural traits are being developed for red clover and improved knowledge of PPO genes and their structure will facilitate molecular breeding. Results A bacterial artificial chromosome (BAC library comprising 26,016 BAC clones with an average 135 Kb insert size, was constructed from Trifolium pratense L. (red clover, a diploid legume with a haploid genome size of 440–637 Mb. Library coverage of 6–8 genome equivalents ensured good representation of genes: the library was screened for polyphenol oxidase (PPO genes. Two single copy PPO genes, PPO4 and PPO5, were identified to add to a family of three, previously reported, paralogous genes (PPO1–PPO3. Multiple PPO1 copies were identified and characterised revealing a subfamily comprising three variants PPO1/2, PPO1/4 and PPO1/5. Six PPO genes clustered within the genome: four separate BAC clones could be assembled onto a predicted 190–510 Kb single BAC contig. Conclusion A PPO gene family in red clover resides as a cluster of at least 6 genes. Three of these genes have high homology, suggesting a more recent evolutionary event. This PPO cluster covers a longer region of the genome than clusters detected in rice or previously reported in tomato. Full-length coding sequences from PPO4, PPO5, PPO1/5 and PPO1/4 will facilitate

  16. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data.

    Science.gov (United States)

    de Andrade, Roberto R S; Vaslin, Maite F S

    2014-03-07

    Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/.

  17. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    Directory of Open Access Journals (Sweden)

    Philipp H. Schiffer

    2016-08-01

    Full Text Available Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term “run-away evolution”. This process might ultimately lead to the failure of genomic integrity and drive species to extinction.

  18. Draft genome of the gayal, Bos frontalis

    Science.gov (United States)

    Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping

    2017-01-01

    Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483

  19. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

    Science.gov (United States)

    Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

    2016-10-11

    Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

  20. Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries.

    Science.gov (United States)

    Lam, Kathy N; Charles, Trevor C

    2015-01-01

    Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite

  1. An evaluation of Comparative Genome Sequencing (CGS by comparing two previously-sequenced bacterial genomes

    Directory of Open Access Journals (Sweden)

    Herring Christopher D

    2007-08-01

    Full Text Available Abstract Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions.

  2. Surveys of Online Information Service in Large Public Libraries.

    Science.gov (United States)

    Woy, James B.

    1983-01-01

    Reports results of 1983 survey of 25 public libraries and 1981 survey of 11 public libraries, both of which focused on facets of online information services--user fees, databases, documentation, equipment, miscellaneous services, and subject areas searched. The 1983 questionnaire and seven sources are appended. (EJS)

  3. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    Science.gov (United States)

    Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

    2013-01-01

    Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  4. PANTHER: A Library of Protein Families and Subfamilies Indexed by Function

    OpenAIRE

    Thomas, Paul D.; Campbell, Michael J.; Kejariwal, Anish; Mi, Huaiyu; Karlak, Brian; Daverman, Robin; Diemer, Karen; Muruganujan, Anushya; Narechania, Apurva

    2003-01-01

    In the genomic era, one of the fundamental goals is to characterize the function of proteins on a large scale. We describe a method, PANTHER, for relating protein sequence relationships to function relationships in a robust and accurate way. PANTHER is composed of two main components: the PANTHER library (PANTHER/LIB) and the PANTHER index (PANTHER/X). PANTHER/LIB is a collection of “books,” each representing a protein family as a multiple sequence alignment, a Hidden Markov Model (HMM)...

  5. ORIGEN-S data libraries

    International Nuclear Information System (INIS)

    Ryman, J.C.

    1984-01-01

    There are five card-image nuclear data libraries: (1) a small light element library for 253 nuclides, (2) a large light element library for 687 nuclides, (3) an actinide library for 101 nuclides, (4) a small fission product library for 461 nuclides, and (5) a large fission product library for 821 nuclides. The data for each nuclide are contained on five card-image records. The first card image contains decay data (half-life, branching fractions, recoverable energy per decay and the fraction of recoverable energy from photons), percent natural abundance, and radioactivity concentration guides. The last four card images contain cross section and (for fission product nuclides) fission yield data for four reactor types (HTGR, LWR, LMFBR, and MSBR), with one card for each reactor type. The card-image nuclear data libraries are the basic libraries for ORIGEN-S. The code can be run using these libraries directly, or it can be run from a binary data library which (prior to any cross section or other nuclear data updating) was created by running the COUPLE code to convert one or more of these card-image libraries

  6. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries

    Directory of Open Access Journals (Sweden)

    Han Bucong

    2012-11-01

    Full Text Available Abstract Background Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. Results We evaluated support vector machines (SVM as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33% of 13.56M PubChem, 1,496 (0.89% of 168 K MDDR, and 719 (7.73% of 9,305 MDDR compounds similar to the known inhibitors. Conclusions SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.

  7. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries.

    Science.gov (United States)

    Han, Bucong; Ma, Xiaohua; Zhao, Ruiying; Zhang, Jingxian; Wei, Xiaona; Liu, Xianghui; Liu, Xin; Zhang, Cunlong; Tan, Chunyan; Jiang, Yuyang; Chen, Yuzong

    2012-11-23

    Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. We evaluated support vector machines (SVM) as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33%) of 13.56M PubChem, 1,496 (0.89%) of 168 K MDDR, and 719 (7.73%) of 9,305 MDDR compounds similar to the known inhibitors. SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.

  8. Large BRCA1 and BRCA2 genomic rearrangements in Danish high risk breast-ovarian cancer families

    DEFF Research Database (Denmark)

    Hansen, Thomas v O; Jønson, Lars; Albrechtsen, Anders

    2009-01-01

    BRCA1 and BRCA2 germ-line mutations predispose to breast and ovarian cancer. Large genomic rearrangements of BRCA1 account for 0-36% of all disease causing mutations in various populations, while large genomic rearrangements in BRCA2 are more rare. We examined 642 East Danish breast and/or ovaria...

  9. Construction of an American mink Bacterial Artificial Chromosome (BAC library and sequencing candidate genes important for the fur industry

    Directory of Open Access Journals (Sweden)

    Christensen Knud

    2011-07-01

    Full Text Available Abstract Background Bacterial artificial chromosome (BAC libraries continue to be invaluable tools for the genomic analysis of complex organisms. Complemented by the newly and fast growing deep sequencing technologies, they provide an excellent source of information in genomics projects. Results Here, we report the construction and characterization of the CHORI-231 BAC library constructed from a Danish-farmed, male American mink (Neovison vison. The library contains approximately 165,888 clones with an average insert size of 170 kb, representing approximately 10-fold coverage. High-density filters, each consisting of 18,432 clones spotted in duplicate, have been produced for hybridization screening and are publicly available. Overgo probes derived from expressed sequence tags (ESTs, representing 21 candidate genes for traits important for the mink industry, were used to screen the BAC library. These included candidate genes for coat coloring, hair growth and length, coarseness, and some receptors potentially involved in viral diseases in mink. The extensive screening yielded positive results for 19 of these genes. Thirty-five clones corresponding to 19 genes were sequenced using 454 Roche, and large contigs (184 kb in average were assembled. Knowing the complete sequences of these candidate genes will enable confirmation of the association with a phenotype and the finding of causative mutations for the targeted phenotypes. Additionally, 1577 BAC clones were end sequenced; 2505 BAC end sequences (80% of BACs were obtained. An excess of 2 Mb has been analyzed, thus giving a snapshot of the mink genome. Conclusions The availability of the CHORI-321 American mink BAC library will aid in identification of genes and genomic regions of interest. We have demonstrated how the library can be used to identify specific genes of interest, develop genetic markers, and for BAC end sequencing and deep sequencing of selected clones. To our knowledge, this is the

  10. "My Library Was Dukedom Large Enough": Academic Libraries Mediating the Shakespeare Authorship Debate

    Directory of Open Access Journals (Sweden)

    Michael Quinn Dudley

    2013-11-01

    Full Text Available The "Shakespeare Authorship Question" regarding the identity of the poet-playwright has been debated for over 150 years. Now, with the growing list of signatories to the "Declaration of Reasonable Doubt", the creation of a Master's Degree program in Authorship Studies at Brunel University in London, the opening of the Shakespeare Authorship Research Studies Center at the Library of Concordia University in Portland, and the release of two competing high profile books both entitled Shakespeare Beyond Doubt, academic libraries are being presented with a unique and timely opportunity to participate in and encourage this debate, which has long been considered a taboo subject in the academy.

  11. Architectural Optimization of Digital Libraries

    Science.gov (United States)

    Biser, Aileen O.

    1998-01-01

    This work investigates performance and scaling issues relevant to large scale distributed digital libraries. Presently, performance and scaling studies focus on specific implementations of production or prototype digital libraries. Although useful information is gained to aid these designers and other researchers with insights to performance and scaling issues, the broader issues relevant to very large scale distributed libraries are not addressed. Specifically, no current studies look at the extreme or worst case possibilities in digital library implementations. A survey of digital library research issues is presented. Scaling and performance issues are mentioned frequently in the digital library literature but are generally not the focus of much of the current research. In this thesis a model for a Generic Distributed Digital Library (GDDL) and nine cases of typical user activities are defined. This model is used to facilitate some basic analysis of scaling issues. Specifically, the calculation of Internet traffic generated for different configurations of the study parameters and an estimate of the future bandwidth needed for a large scale distributed digital library implementation. This analysis demonstrates the potential impact a future distributed digital library implementation would have on the Internet traffic load and raises questions concerning the architecture decisions being made for future distributed digital library designs.

  12. Insertion Sequence-Caused Large Scale-Rearrangements in the Genome of Escherichia coli

    Science.gov (United States)

    2016-07-18

    affordable ap- proach to genome-wide characterization of genetic varia - tion in bacterial and eukaryotic genomes (1–3). In addition to small-scale...Paired-End Reads), that uses a graph-based al- gorithm (27) capable of detecting most large-scale varia - tion involving repetitive regions, including novel...Avila,P., Grinsted,J. and De La Cruz,F. (1988) Analysis of the variable endpoints generated by one-ended transposition of Tn21.. J. Bacteriol., 170

  13. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library

    Directory of Open Access Journals (Sweden)

    Salem Mohamed

    2009-11-01

    Full Text Available Abstract Background To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs have been used for single nucleotide polymorphism (SNP discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA broodstock population. Results The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends. Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183 of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In

  14. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    Science.gov (United States)

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the

  15. The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

    Science.gov (United States)

    Argout, X; Martin, G; Droc, G; Fouet, O; Labadie, K; Rivals, E; Aury, J M; Lanaud, C

    2017-09-15

    Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes. We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence. Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).

  16. High-efficiency targeted editing of large viral genomes by RNA-guided nucleases.

    Science.gov (United States)

    Bi, Yanwei; Sun, Le; Gao, Dandan; Ding, Chen; Li, Zhihua; Li, Yadong; Cun, Wei; Li, Qihan

    2014-05-01

    A facile and efficient method for the precise editing of large viral genomes is required for the selection of attenuated vaccine strains and the construction of gene therapy vectors. The type II prokaryotic CRISPR-Cas (clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas)) RNA-guided nuclease system can be introduced into host cells during viral replication. The CRISPR-Cas9 system robustly stimulates targeted double-stranded breaks in the genomes of DNA viruses, where the non-homologous end joining (NHEJ) and homology-directed repair (HDR) pathways can be exploited to introduce site-specific indels or insert heterologous genes with high frequency. Furthermore, CRISPR-Cas9 can specifically inhibit the replication of the original virus, thereby significantly increasing the abundance of the recombinant virus among progeny virus. As a result, purified recombinant virus can be obtained with only a single round of selection. In this study, we used recombinant adenovirus and type I herpes simplex virus as examples to demonstrate that the CRISPR-Cas9 system is a valuable tool for editing the genomes of large DNA viruses.

  17. High-efficiency targeted editing of large viral genomes by RNA-guided nucleases.

    Directory of Open Access Journals (Sweden)

    Yanwei Bi

    2014-05-01

    Full Text Available A facile and efficient method for the precise editing of large viral genomes is required for the selection of attenuated vaccine strains and the construction of gene therapy vectors. The type II prokaryotic CRISPR-Cas (clustered regularly interspaced short palindromic repeats (CRISPR-associated (Cas RNA-guided nuclease system can be introduced into host cells during viral replication. The CRISPR-Cas9 system robustly stimulates targeted double-stranded breaks in the genomes of DNA viruses, where the non-homologous end joining (NHEJ and homology-directed repair (HDR pathways can be exploited to introduce site-specific indels or insert heterologous genes with high frequency. Furthermore, CRISPR-Cas9 can specifically inhibit the replication of the original virus, thereby significantly increasing the abundance of the recombinant virus among progeny virus. As a result, purified recombinant virus can be obtained with only a single round of selection. In this study, we used recombinant adenovirus and type I herpes simplex virus as examples to demonstrate that the CRISPR-Cas9 system is a valuable tool for editing the genomes of large DNA viruses.

  18. Teleporting the library?

    DEFF Research Database (Denmark)

    Heilesen, Simon

    2009-01-01

    In 2007, six Danish public libraries established a virtual library, Info Island DK, in Second Life. This article discusses the library project in terms of design. The design processes include the planning and implementation of the virtual library structure and its equipment, as well...... as the organizing and carrying out of activities in the virtual setting. It will be argued that, to a large extent, conventions have determined design and use of the virtual library, and also that design has had an impact on the attitudes and understanding of the participants....

  19. Library Information-Processing System

    Science.gov (United States)

    1985-01-01

    System works with Library of Congress MARC II format. System composed of subsystems that provide wide range of library informationprocessing capabilities. Format is American National Standards Institute (ANSI) format for machine-readable bibliographic data. Adaptable to any medium-to-large library.

  20. Isolation of BAC Clones Containing Conserved Genes from Libraries of Three Distantly Related Moths: A Useful Resource for Comparative Genomics of Lepidoptera

    Directory of Open Access Journals (Sweden)

    Yuji Yasukochi

    2011-01-01

    Full Text Available Lepidoptera, butterflies and moths, is the second largest animal order and includes numerous agricultural pests. To facilitate comparative genomics in Lepidoptera, we isolated BAC clones containing conserved and putative single-copy genes from libraries of three pests, Heliothis virescens, Ostrinia nubilalis, and Plutella xylostella, harboring the haploid chromosome number, =31, which are not closely related with each other or with the silkworm, Bombyx mori, (=28, the sequenced model lepidopteran. A total of 108–184 clones representing 101–182 conserved genes were isolated for each species. For 79 genes, clones were isolated from more than two species, which will be useful as common markers for analysis using fluorescence in situ hybridization (FISH, as well as for comparison of genome sequence among multiple species. The PCR-based clone isolation method presented here is applicable to species which lack a sequenced genome but have a significant collection of cDNA or EST sequences.

  1. The detection of large deletions or duplications in genomic DNA.

    Science.gov (United States)

    Armour, J A L; Barton, D E; Cockburn, D J; Taylor, G R

    2002-11-01

    While methods for the detection of point mutations and small insertions or deletions in genomic DNA are well established, the detection of larger (>100 bp) genomic duplications or deletions can be more difficult. Most mutation scanning methods use PCR as a first step, but the subsequent analyses are usually qualitative rather than quantitative. Gene dosage methods based on PCR need to be quantitative (i.e., they should report molar quantities of starting material) or semi-quantitative (i.e., they should report gene dosage relative to an internal standard). Without some sort of quantitation, heterozygous deletions and duplications may be overlooked and therefore be under-ascertained. Gene dosage methods provide the additional benefit of reporting allele drop-out in the PCR. This could impact on SNP surveys, where large-scale genotyping may miss null alleles. Here we review recent developments in techniques for the detection of this type of mutation and compare their relative strengths and weaknesses. We emphasize that comprehensive mutation analysis should include scanning for large insertions and deletions and duplications. Copyright 2002 Wiley-Liss, Inc.

  2. Next-generation sampling: Pairing genomics with herbarium specimens provides species-level signal in Solidago (Asteraceae).

    Science.gov (United States)

    Beck, James B; Semple, John C

    2015-06-01

    The ability to conduct species delimitation and phylogeny reconstruction with genomic data sets obtained exclusively from herbarium specimens would rapidly enhance our knowledge of large, taxonomically contentious plant genera. In this study, the utility of genotyping by sequencing is assessed in the notoriously difficult genus Solidago (Asteraceae) by attempting to obtain an informative single-nucleotide polymorphism data set from a set of specimens collected between 1970 and 2010. Reduced representation libraries were prepared and Illumina-sequenced from 95 Solidago herbarium specimen DNAs, and resulting reads were processed with the nonreference Universal Network-Enabled Analysis Kit (UNEAK) pipeline. Multidimensional clustering was used to assess the correspondence between genetic groups and morphologically defined species. Library construction and sequencing were successful in 93 of 95 samples. The UNEAK pipeline identified 8470 single-nucleotide polymorphisms, and a filtered data set was analyzed for each of three Solidago subsections. Although results varied, clustering identified genomic groups that often corresponded to currently recognized species or groups of closely related species. These results suggest that genotyping by sequencing is broadly applicable to DNAs obtained from herbarium specimens. The data obtained and their biological signal suggest that pairing genomics with large-scale herbarium sampling is a promising strategy in species-rich plant groups.

  3. Hybridization Capture Using Short PCR Products Enriches Small Genomes by Capturing Flanking Sequences (CapFlank)

    DEFF Research Database (Denmark)

    Tsangaras, Kyriakos; Wales, Nathan; Sicheritz-Pontén, Thomas

    2014-01-01

    , a non-negligible fraction of the resulting sequence reads are not homologous to the bait. We demonstrate that during capture, the bait-hybridized library molecules add additional flanking library sequences iteratively, such that baits limited to targeting relatively short regions (e.g. few hundred...... nucleotides) can result in enrichment across entire mitochondrial and bacterial genomes. Our findings suggest that some of the off-target sequences derived in capture experiments are non-randomly enriched, and that CapFlank will facilitate targeted enrichment of large contiguous sequences with minimal prior...

  4. GENOMIC FEATURES OF COTESIA PLUTELLAE POLYDNAVIRUS

    Institute of Scientific and Technical Information of China (English)

    LIUCai-ling; ZHUXiang-xiong; FuWen-jun; ZHAOMu-jun

    2003-01-01

    Polydnavirus was purified from the calyx fluid of Cotesia plutellae ovary. The genomic features of C. plutellae polydnavirus (CpPDV) were investigated. The viral genome consists of at least 12 different segments and the aggregate genome size is a lower estimate of 80kbp. By partial digestion of CpPDV DNA with BamHI and subsequent ligation with BamHI-cut plasmid Bluescript, a representative library of CpPDV genome was obtained.

  5. Environmental whole-genome amplification to access microbial populations in contaminated sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, Carl B [Diversa Corporation; Wyborski, Denise L. [Diversa Corporation; Garcia, Joseph A. [Diversa Corporation; Podar, Mircea [ORNL; Chen, Wenqiong [Diversa Corporation; Chang, Sherman H. [Diversa Corporation; Chang, Hwai W. [Diversa Corporation; Watson, David B [ORNL; Brodie, Eoin L. [Lawrence Berkeley National Laboratory (LBNL); Hazen, Terry [Lawrence Berkeley National Laboratory (LBNL); Keller, Martin [ORNL

    2006-05-01

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using {phi}29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and 'clusters of orthologous groups' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  6. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  7. An Engineered Virus Library as a Resource for the Spectrum-wide Exploration of Virus and Vector Diversity

    Directory of Open Access Journals (Sweden)

    Wenli Zhang

    2017-05-01

    Full Text Available Adenoviruses (Ads are large human-pathogenic double-stranded DNA (dsDNA viruses presenting an enormous natural diversity associated with a broad variety of diseases. However, only a small fraction of adenoviruses has been explored in basic virology and biomedical research, highlighting the need to develop robust and adaptable methodologies and resources. We developed a method for high-throughput direct cloning and engineering of adenoviral genomes from different sources utilizing advanced linear-linear homologous recombination (LLHR and linear-circular homologous recombination (LCHR. We describe 34 cloned adenoviral genomes originating from clinical samples, which were characterized by next-generation sequencing (NGS. We anticipate that this recombineering strategy and the engineered adenovirus library will provide an approach to study basic and clinical virology. High-throughput screening (HTS of the reporter-tagged Ad library in a panel of cell lines including osteosarcoma disease-specific cell lines revealed alternative virus types with enhanced transduction and oncolysis efficiencies. This highlights the usefulness of this resource.

  8. Construction and Screening of Marine Metagenomic Large Insert Libraries.

    Science.gov (United States)

    Weiland-Bräuer, Nancy; Langfeldt, Daniela; Schmitz, Ruth A

    2017-01-01

    The marine environment covers more than 70 % of the world's surface. Marine microbial communities are highly diverse and have evolved during extended evolutionary processes of physiological adaptations under the influence of a variety of ecological conditions and selection pressures. They harbor an enormous diversity of microbes with still unknown and probably new physiological characteristics. In the past, marine microbes, mostly bacteria of microbial consortia attached to marine tissues of multicellular organisms, have proven to be a rich source of highly potent bioactive compounds, which represent a considerable number of drug candidates. However, to date, the biodiversity of marine microbes and the versatility of their bioactive compounds and metabolites have not been fully explored. This chapter describes sampling in the marine environment, construction of metagenomic large insert libraries from marine habitats, and exemplarily one function based screen of metagenomic clones for identification of quorum quenching activities.

  9. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    Science.gov (United States)

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  10. Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome.

    Science.gov (United States)

    Nicholson, Matthew J; Theodorou, Michael K; Brookman, Jayne L

    2005-01-01

    The anaerobic gut fungi occupy a unique niche in the intestinal tract of large herbivorous animals and are thought to act as primary colonizers of plant material during digestion. They are the only known obligately anaerobic fungi but molecular analysis of this group has been hampered by difficulties in their culture and manipulation, and by their extremely high A+T nucleotide content. This study begins to answer some of the fundamental questions about the structure and organization of the anaerobic gut fungal genome. Directed plasmid libraries using genomic DNA digested with highly or moderately rich AT-specific restriction enzymes (VspI and EcoRI) were prepared from a polycentric Orpinomyces isolate. Clones were sequenced from these libraries and the breadth of genomic inserts, both genic and intergenic, was characterized. Genes encoding numerous functions not previously characterized for these fungi were identified, including cytoskeletal, secretory pathway and transporter genes. A peptidase gene with no introns and having sequence similarity to a gene encoding a bacterial peptidase was also identified, extending the range of metabolic enzymes resulting from apparent trans-kingdom transfer from bacteria to fungi, as previously characterized largely for genes encoding plant-degrading enzymes. This paper presents the first thorough analysis of the genic, intergenic and rDNA regions of a variety of genomic segments from an anaerobic gut fungus and provides observations on rules governing intron boundaries, the codon biases observed with different types of genes, and the sequence of only the second anaerobic gut fungal promoter reported. Large numbers of retrotransposon sequences of different types were found and the authors speculate on the possible consequences of any such transposon activity in the genome. The coding sequences identified included several orphan gene sequences, including one with regions strongly suggestive of structural proteins such as collagens

  11. Informative genomic microsatellite markers for efficient genotyping applications in sugarcane.

    Science.gov (United States)

    Parida, Swarup K; Kalia, Sanjay K; Kaul, Sunita; Dalal, Vivek; Hemaprabha, G; Selvi, Athiappan; Pandit, Awadhesh; Singh, Archana; Gaikwad, Kishor; Sharma, Tilak R; Srivastava, Prem Shankar; Singh, Nagendra K; Mohapatra, Trilochan

    2009-01-01

    Genomic microsatellite markers are capable of revealing high degree of polymorphism. Sugarcane (Saccharum sp.), having a complex polyploid genome requires more number of such informative markers for various applications in genetics and breeding. With the objective of generating a large set of microsatellite markers designated as Sugarcane Enriched Genomic MicroSatellite (SEGMS), 6,318 clones from genomic libraries of two hybrid sugarcane cultivars enriched with 18 different microsatellite repeat-motifs were sequenced to generate 4.16 Mb high-quality sequences. Microsatellites were identified in 1,261 of the 5,742 non-redundant clones that accounted for 22% enrichment of the libraries. Retro-transposon association was observed for 23.1% of the identified microsatellites. The utility of the microsatellite containing genomic sequences were demonstrated by higher primer designing potential (90%) and PCR amplification efficiency (87.4%). A total of 1,315 markers including 567 class I microsatellite markers were designed and placed in the public domain for unrestricted use. The level of polymorphism detected by these markers among sugarcane species, genera, and varieties was 88.6%, while cross-transferability rate was 93.2% within Saccharum complex and 25% to cereals. Cloning and sequencing of size variant amplicons revealed that the variation in the number of repeat-units was the main source of SEGMS fragment length polymorphism. High level of polymorphism and wide range of genetic diversity (0.16-0.82 with an average of 0.44) assayed with the SEGMS markers suggested their usefulness in various genotyping applications in sugarcane.

  12. Afghanistan Digital Library Initiative: Revitalizing an Integrated Library System

    Directory of Open Access Journals (Sweden)

    Yan HAN

    2007-12-01

    Full Text Available This paper describes an Afghanistan digital library initiative of building an integrated library system (ILS for Afghanistan universities and colleges based on open-source software. As one of the goals of the Afghan eQuality Digital Libraries Alliance, the authors applied systems analysis approach, evaluated different open-source ILSs, and customized the selected software to accommodate users’ needs. Improvements include Arabic and Persian language support, user interface changes, call number label printing, and ISBN-13 support. To our knowledge, this ILS is the first at a large academic library running on open-source software.

  13. Giant panda BAC library construction and assembly of a 650-kb contig spanning major histocompatibility complex class II region

    Directory of Open Access Journals (Sweden)

    Pan Hui-Juan

    2007-09-01

    Full Text Available Abstract Background Giant panda is rare and endangered species endemic to China. The low rates of reproductive success and infectious disease resistance have severely hampered the development of captive and wild populations of the giant panda. The major histocompatibility complex (MHC plays important roles in immune response and reproductive system such as mate choice and mother-fetus bio-compatibility. It is thus essential to understand genetic details of the giant panda MHC. Construction of a bacterial artificial chromosome (BAC library will provide a new tool for panda genome physical mapping and thus facilitate understanding of panda MHC genes. Results A giant panda BAC library consisting of 205,800 clones has been constructed. The average insert size was calculated to be 97 kb based on the examination of 174 randomly selected clones, indicating that the giant panda library contained 6.8-fold genome equivalents. Screening of the library with 16 giant panda PCR primer pairs revealed 6.4 positive clones per locus, in good agreement with an expected 6.8-fold genomic coverage of the library. Based on this BAC library, we constructed a contig map of the giant panda MHC class II region from BTNL2 to DAXX spanning about 650 kb by a three-step method: (1 PCR-based screening of the BAC library with primers from homologous MHC class II gene loci, end sequences and BAC clone shotgun sequences, (2 DNA sequencing validation of positive clones, and (3 restriction digest fingerprinting verification of inter-clone overlapping. Conclusion The identifications of genes and genomic regions of interest are greatly favored by the availability of this giant panda BAC library. The giant panda BAC library thus provides a useful platform for physical mapping, genome sequencing or complex analysis of targeted genomic regions. The 650 kb sequence-ready BAC contig map of the giant panda MHC class II region from BTNL2 to DAXX, verified by the three-step method, offers a

  14. GenomeRNAi: a database for cell-based RNAi phenotypes.

    Science.gov (United States)

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.

  15. Isolation of a 97-kb minimal essential MHC B locus from a new reverse-4D BAC library of the golden pheasant.

    Directory of Open Access Journals (Sweden)

    Qing Ye

    Full Text Available The bacterial artificial chromosome (BAC system is widely used in isolation of large genomic fragments of interest. Construction of a routine BAC library requires several months for picking clones and arraying BACs into superpools in order to employ 4D-PCR to screen positive BACs, which might be time-consuming and laborious. The major histocompatibility complex (MHC is a cluster of genes involved in the vertebrate immune system, and the classical avian MHC-B locus is a minimal essential one, occupying a 100-kb genomic region. In this study, we constructed a more effective reverse-4D BAC library for the golden pheasant, which first creates sub-libraries and then only picks clones of positive sub-libraries, and identified several MHC clones within thirty days. The full sequencing of a 97-kb reverse-4D BAC demonstrated that the golden pheasant MHC-B locus contained 20 genes and showed good synteny with that of the chicken. The notable differences between these two species were the numbers of class II B loci and NK genes and the inversions of the TAPBP gene and the TAP1-TAP2 region. Furthermore, the inverse TAP2-TAP1 was unique in the golden pheasant in comparison with that of chicken, turkey, and quail. The newly defined genomic structure of the golden pheasant MHC will give an insight into the evolutionary history of the avian MHC.

  16. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    Science.gov (United States)

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  17. A First Insight into the Genome of the Filter-Feeder Mussel Mytilus galloprovincialis.

    Directory of Open Access Journals (Sweden)

    Maria Murgarella

    Full Text Available Mussels belong to the phylum Mollusca, one of the largest and most diverse taxa in the animal kingdom. Despite their importance in aquaculture and in biology in general, genomic resources from mussels are still scarce. To broaden and increase the genomic knowledge in this family, we carried out a whole-genome sequencing study of the cosmopolitan Mediterranean mussel (Mytilus galloprovincialis. We sequenced its genome (32X depth of coverage on the Illumina platform using three pair-end libraries with different insert sizes. The large number of contigs obtained pointed out a highly complex genome of 1.6 Gb where repeated elements seem to be widespread (~30% of the genome, a feature that is also shared with other marine molluscs. Notwithstanding the limitations of our genome sequencing, we were able to reconstruct two mitochondrial genomes and predict 10,891 putative genes. A comparative analysis with other molluscs revealed a gene enrichment of gene ontology categories related to multixenobiotic resistance, glutamate biosynthetic process, and the maintenance of ciliary structures.

  18. Single-tube library preparation for degraded DNA

    DEFF Research Database (Denmark)

    Carøe, Christian; Gopalakrishnan, Shyam; Vinner, Lasse

    2018-01-01

    these obstacles and enable higher throughput are therefore of interest to researchers working with degraded DNA. 2.In this study, we compare four Illumina library preparation protocols, including two “single-tube” methods developed for this study with the explicit aim of improving data quality and reducing...... of chemically damaged and highly fragmented DNA molecules. In particular, the enzymatic reactions and DNA purification steps during library preparation can result in DNA template loss and sequencing biases, affecting downstream analyses. The development of library preparation methods that circumvent...... preparation time and expenses. The methods are tested on grey wolf (Canis lupus) museum specimens. 3.We found single-tube protocols increase library complexity, yield more reads that map uniquely to the reference genome, reduce processing time, and may decrease laboratory costs by 90%. 4.Given the advantages...

  19. Low frequency of large genomic rearrangements of BRCA1 and BRCA2 in western Denmark

    DEFF Research Database (Denmark)

    Thomassen, Mads; Gerdes, Anne-Marie; Cruger, Dorthe

    2006-01-01

    Germline mutations in BRCA1 and BRCA2 predispose female carriers to breast and ovarian cancer. The majority of mutations identified are small deletions or insertions or are nonsense mutations. Large genomic rearrangements in BRCA1 are found with varying frequencies in different populations......, but BRCA2 rearrangements have not been investigated thoroughly. The objective in this study was to determine the frequency of large genomic rearrangements in BRCA1 and BRCA2 in a large group of Danish families with increased risk of breast and ovarian cancer. A total of 617 families previously tested...... negative for mutations involving few bases were screened with multiplex ligation-dependent probe amplification (MLPA). Two deletions in BRCA1 were identified in three families; no large rearrangements were detected in BRCA2. The large deletions constitute 3.8% of the BRCA1 mutations identified, which...

  20. Techniques for Large-Scale Bacterial Genome Manipulation and Characterization of the Mutants with Respect to In Silico Metabolic Reconstructions.

    Science.gov (United States)

    diCenzo, George C; Finan, Turlough M

    2018-01-01

    The rate at which all genes within a bacterial genome can be identified far exceeds the ability to characterize these genes. To assist in associating genes with cellular functions, a large-scale bacterial genome deletion approach can be employed to rapidly screen tens to thousands of genes for desired phenotypes. Here, we provide a detailed protocol for the generation of deletions of large segments of bacterial genomes that relies on the activity of a site-specific recombinase. In this procedure, two recombinase recognition target sequences are introduced into known positions of a bacterial genome through single cross-over plasmid integration. Subsequent expression of the site-specific recombinase mediates recombination between the two target sequences, resulting in the excision of the intervening region and its loss from the genome. We further illustrate how this deletion system can be readily adapted to function as a large-scale in vivo cloning procedure, in which the region excised from the genome is captured as a replicative plasmid. We next provide a procedure for the metabolic analysis of bacterial large-scale genome deletion mutants using the Biolog Phenotype MicroArray™ system. Finally, a pipeline is described, and a sample Matlab script is provided, for the integration of the obtained data with a draft metabolic reconstruction for the refinement of the reactions and gene-protein-reaction relationships in a metabolic reconstruction.

  1. Biological consequences of ancient gene acquisition and duplication in the large genome soil bacterium, ""solibacter usitatus"" strain Ellin6076

    Energy Technology Data Exchange (ETDEWEB)

    Challacombe, Jean F [Los Alamos National Laboratory; Eichorst, Stephanie A [Los Alamos National Laboratory; Xie, Gary [Los Alamos National Laboratory; Kuske, Cheryl R [Los Alamos National Laboratory; Hauser, Loren [ORNL; Land, Miriam [ORNL

    2009-01-01

    Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.

  2. RESOURCES OF CANADIAN ACADEMIC AND RESEARCH LIBRARIES.

    Science.gov (United States)

    DOWNS, ROBERT B.

    ALTHOUGH IT EMPHASIZES ACADEMIC LIBRARIES, THIS STUDY ALSO INCLUDES THE NATIONAL AND PROVINCIAL LIBRARIES, LARGE PUBLIC LIBRARIES, AND SPECIAL LIBRARIES THAT SERVE CANADIAN SCHOLARS, STUDENTS, AND RESEARCH WORKERS. WITH THE DATA OBTAINED FROM A QUESTIONNAIRE ON LIBRARY STATISTICS AND HOLDINGS, VISITS TO THE LIBRARIES, INTERVIEWS WITH LIBRARIANS…

  3. library use instruction and the pattern of utilization of library services ...

    African Journals Online (AJOL)

    Global Journal

    The data collected was analysed using descriptive statistics (simple percentage %). ... semesters and it should be an independent credit carrying course under the General Studies ... large and well stocked a library is, if the ..... Online. LIBRARY USE INSTRUCTION AND THE PATTERN OF UTILIZATION OF LIBRARY ...

  4. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome.

    Science.gov (United States)

    Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E

    2017-03-06

    Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.

  5. Specific single-cell isolation and genomic amplification of uncultured microorganisms

    DEFF Research Database (Denmark)

    Kvist, Thomas; Ahring, Birgitte Kiær; Lasken, R.S.

    2007-01-01

    We in this study describe a new method for genomic studies of individual uncultured prokaryotic organisms, which was used for the isolation and partial genome sequencing of a soil archaeon. The diversity of Archaea in a soil sample was mapped by generating a clone library using group-specific pri......We in this study describe a new method for genomic studies of individual uncultured prokaryotic organisms, which was used for the isolation and partial genome sequencing of a soil archaeon. The diversity of Archaea in a soil sample was mapped by generating a clone library using group......-specific primers in combination with a terminal restriction fragment length polymorphism profile. Intact cells were extracted from the environmental sample, and fluorescent in situ hybridization probing with Cy3-labeled probes designed from the clone library was subsequently used to detect the organisms...... of interest. Single cells with a bright fluorescent signal were isolated using a micromanipulator and the genome of the single isolated cells served as a template for multiple displacement amplification (MDA) using the Phi29 DNA polymerase. The generated MDA product was afterwards used for 16S rRNA gene...

  6. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  7. The JBEI quantitative metabolic modeling library (jQMM): a python library for modeling microbial metabolism

    DEFF Research Database (Denmark)

    Birkel, Garrett W.; Ghosh, Amit; Kumar, Vinay S.

    2017-01-01

    analysis, new methods for the effective use of the ever more readily available and abundant -omics data (i.e. transcriptomics, proteomics and metabolomics) are urgently needed.Results: The jQMM library presented here provides an open-source, Python-based framework for modeling internal metabolic fluxes......, it introduces the capability to use C-13 labeling experimental data to constrain comprehensive genome-scale models through a technique called two-scale C-13 Metabolic Flux Analysis (2S-C-13 MFA). In addition, the library includes a demonstration of a method that uses proteomics data to produce actionable...... insights to increase biofuel production. Finally, the use of the jQMM library is illustrated through the addition of several Jupyter notebook demonstration files that enhance reproducibility and provide the capability to be adapted to the user's specific needs.Conclusions: jQMM will facilitate the design...

  8. A Fast Solution to NGS Library Prep with Low Nanogram DNA Input

    Science.gov (United States)

    Liu, Pingfang; Lohman, Gregory J.S.; Cantor, Eric; Langhorst, Bradley W.; Yigit, Erbay; Apone, Lynne M.; Munafo, Daniela B.; Stewart, Fiona J.; Evans, Thomas C.; Nichols, Nicole; Dimalanta, Eileen T.; Davis, Theodore B.; Sumner, Christine

    2013-01-01

    Next Generation Sequencing (NGS) has significantly impacted human genetics, enabling a comprehensive characterization of the human genome as well as a better understanding of many genomic abnormalities. By delivering massive DNA sequences at unprecedented speed and cost, NGS promises to make personalized medicine a reality in the foreseeable future. To date, library construction with clinical samples has been a challenge, primarily due to the limited quantities of sample DNA available. Our objective here was to overcome this challenge by developing NEBNext® Ultra DNA Library Prep Kit, a fast library preparation method. Specifically, we streamlined the workflow utilizing novel NEBNext reagents and adaptors, including a new DNA polymerase that has been optimized to minimize GC bias. As a result of this work, we have developed a simple method for library construction from an amount of DNA as low as 5 ng, which can be used for both intact and fragmented DNA. Moreover, the workflow is compatible with multiple NGS platforms.

  9. Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library.

    Science.gov (United States)

    Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi

    2017-10-10

    We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.

  10. Microcomputers in the Anesthesia Library.

    Science.gov (United States)

    Wright, A. J.

    The combination of computer technology and library operation is helping to alleviate such library problems as escalating costs, increasing collection size, deteriorating materials, unwieldy arrangement schemes, poor subject control, and the acquisition and processing of large numbers of rarely used documents. Small special libraries such as…

  11. Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

    Science.gov (United States)

    Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

    2015-10-16

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average

  12. Building an Undergraduate Book Approval Plan for a Large Academic Library

    Directory of Open Access Journals (Sweden)

    Denise Koufogiannakis

    2007-03-01

    Full Text Available The University of Alberta Libraries (UAL, working with two book vendors,created large-scale undergraduate book approval plans to deliver newpublications. Detailed selections profiles were created for many subject areas,designed to deliver books that would have been obvious choices by subjectselectors. More than 5800 monographs were received through the book approvalplans during the pilot period. These volumes proved to be highly relevant tousers, showing twice as much circulation as other monographs acquired duringthe same time period. Goals achieved through this project include: release ofselectors’ time from routine work, systematic acquisition of a broadly based highdemandundergraduate collection and faster delivery of undergraduate materials.This successful program will be expanded and incorporated into UAL’s normalacquisitions processes for undergraduate materials.

  13. Creation of BAC genomic resources for cocoa ( Theobroma cacao L.) for physical mapping of RGA containing BAC clones.

    Science.gov (United States)

    Clément, D; Lanaud, C; Sabau, X; Fouet, O; Le Cunff, L; Ruiz, E; Risterucci, A M; Glaszmann, J C; Piffanelli, P

    2004-05-01

    We have constructed and validated the first cocoa ( Theobroma cacao L.) BAC library, with the aim of developing molecular resources to study the structure and evolution of the genome of this perennial crop. This library contains 36,864 clones with an average insert size of 120 kb, representing approximately ten haploid genome equivalents. It was constructed from the genotype Scavina-6 (Sca-6), a Forastero clone highly resistant to cocoa pathogens and a parent of existing mapping populations. Validation of the BAC library was carried out with a set of 13 genetically-anchored single copy and one duplicated markers. An average of nine BAC clones per probe was identified, giving an initial experimental estimation of the genome coverage represented in the library. Screening of the library with a set of resistance gene analogues (RGAs), previously mapped in cocoa and co-localizing with QTL for resistance to Phytophthora traits, confirmed at the physical level the tight clustering of RGAs in the cocoa genome and provided the first insights into the relationships between genetic and physical distances in the cocoa genome. This library represents an available BAC resource for structural genomic studies or map-based cloning of genes corresponding to important QTLs for agronomic traits such as resistance genes to major cocoa pathogens like Phytophthora spp ( palmivora and megakarya), Crinipellis perniciosa and Moniliophthora roreri.

  14. Library usage patterns in the electronic information environment. Electronic journals, Use studies, Libraries, Medical libraries

    Directory of Open Access Journals (Sweden)

    B. Franklin

    2004-01-01

    Full Text Available This paper examines the methodology and results from Web-based surveys of more than 15,000 networked electronic services users in the United States between July 1998 and June 2003 at four academic health sciences libraries and two large main campus libraries serving a variety of disciplines. A statistically valid methodology for administering simultaneous Web-based and print-based surveys using the random moments sampling technique is discussed and implemented. Results from the Web-based surveys showed that at the four academic health sciences libraries, there were approximately four remote networked electronic services users for each in-house user. This ratio was even higher for faculty, staff, and research fellows at the academic health sciences libraries, where more than five remote users for each in-house user were recorded. At the two main libraries, there were approximately 1.3 remote users for each in-house user of electronic information. Sponsored research (grant funded research accounted for approximately 32% of the networked electronic services activity at the health sciences libraries and 16% at the main campus libraries. Sponsored researchers at the health sciences libraries appeared to use networked electronic services most intensively from on-campus, but not from in the library. The purpose of use for networked electronic resources by patrons within the library is different from the purpose of use of those resources by patrons using the resources remotely. The implications of these results on how librarians reach decisions about networked electronic resources and services are discussed.

  15. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high...

  16. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  17. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    Science.gov (United States)

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  18. "After the Genome 5, Conference to be held October 6-10, 1999, Jackson Hole, Wyoming"

    Energy Technology Data Exchange (ETDEWEB)

    Brent, Roger [Molecular Sciences Inst., Milpitas, CA (United States)

    1999-10-06

    The postgenomic era is arriving faster than anyone had imagined-- sometime during 2000 we'll have a large fraction of the human genome sequence. Heretofore, our understanding of function has come from non-industrial experiments whose conclusions were largely framed in human language. The advent of large amounts of sequence data, and of "functional genomic" data types such as mRNA expression data, have changed this picture. These data share the feature that individual observations and measurements are typically relatively low value adding. Such data is now being generated so rapidly that the amount of information contained in it will surpass the amount of biological information collected by traditional means. It is tantalizing to envision using genomic information to create a quantitative biology with a very strong data component. Unfortunately, we are very early in our understanding of how to "compute on" genomic information so as to extract biological knowledge from it. In fact, some current efforts to come to grips with genomic information often resemble a computer savvy library science, where the most important issues concern categories, classification schemes, and information retrieval. When exploring new libraries, a measure of cataloging and inventory is surely inevitable. However, at some point we will need to move from library science to scholarship. We would like to achieve a quantitative and predictive understanding of biological function. We realize that making the bridge from knowledge of systems to the sets of abstractions that constitute computable entities is not easy. The After the Genome meetings were started in 1995 to help the biological community think about and prepare for the changes in biological research in the face of the oncoming flow of genomic information. The term "After the Genome" refers to a future in which complete inventories of the gene products of entire organisms become available. Since then, many more biologists have

  19. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  20. Construction of CRISPR Libraries for Functional Screening.

    Science.gov (United States)

    Carstens, Carsten P; Felts, Katherine A; Johns, Sarah E

    2018-01-01

    Identification of gene function has been aided by the ability to generate targeted gene knockouts or transcriptional repression using the CRISPR/CAS9 system. Using pooled libraries of guide RNA expression vectors that direct CAS9 to a specific genomic site allows identification of genes that are either enriched or depleted in response to a selection scheme, thus linking the affected gene to the chosen phenotype. The quality of the data generated by the screening is dependent on the quality of the guide RNA delivery library with regards to error rates and especially evenness of distribution of the guides. Here, we describe a method for constructing complex plasmid libraries based on pooled designed oligomers with high representation and tight distributions. The procedure allows construction of plasmid libraries of >60,000 members with a 95th/5th percentile ratio of less than 3.5.

  1. Building an Undergraduate Book Approval Plan for a Large Academic Library

    Directory of Open Access Journals (Sweden)

    Denise Koufogiannakis

    2007-05-01

    Full Text Available The University of Alberta Libraries (UAL, working with two book vendors, created large-scale undergraduate book approval plans to deliver new publications. Detailed selections profiles were created for many subject areas, designed to deliver books that would have been obvious choices by subject selectors. More than 5800 monographs were received through the book approval plans during the pilot period. These volumes proved to be highly relevant to users, showing twice as much circulation as other monographs acquired during the same time period. Goals achieved through this project include: release of selectors’ time from routine work, systematic acquisition of a broadly based high-demand undergraduate collection and faster delivery of undergraduate materials. This successful program will be expanded and incorporated into UAL’s normal acquisitions processes for undergraduate materials.

  2. Bioinformatics decoding the genome

    CERN Multimedia

    CERN. Geneva; Deutsch, Sam; Michielin, Olivier; Thomas, Arthur; Descombes, Patrick

    2006-01-01

    Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological ...

  3. Usability Testing of a Large, Multidisciplinary Library Database: Basic Search and Visual Search

    Directory of Open Access Journals (Sweden)

    Jody Condit Fagan

    2006-09-01

    Full Text Available Visual search interfaces have been shown by researchers to assist users with information search and retrieval. Recently, several major library vendors have added visual search interfaces or functions to their products. For public service librarians, perhaps the most critical area of interest is the extent to which visual search interfaces and text-based search interfaces support research. This study presents the results of eight full-scale usability tests of both the EBSCOhost Basic Search and Visual Search in the context of a large liberal arts university.

  4. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  5. Emory University: High-Throughput Protein-Protein Interaction Dataset for Lung Cancer-Associated Genes | Office of Cancer Genomics

    Science.gov (United States)

    To discover novel PPI signaling hubs for lung cancer, CTD2 Center at Emory utilized large-scale genomics datasets and literature to compile a set of lung cancer-associated genes. A library of expression vectors were generated for these genes and utilized for detecting pairwise PPIs with cell lysate-based TR-FRET assays in high-throughput screening format. Read the abstract.

  6. Applying Shannon's information theory to bacterial and phage genomes and metagenomes

    Science.gov (United States)

    Akhter, Sajia; Bailey, Barbara A.; Salamon, Peter; Aziz, Ramy K.; Edwards, Robert A.

    2013-01-01

    All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

  7. zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm.

    Science.gov (United States)

    Sand, Andreas; Kristiansen, Martin; Pedersen, Christian N S; Mailund, Thomas

    2013-11-22

    Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models.Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.

  8. Human antibody fragments specific for the epidermal growth factor receptor selected from large non-immunised phage display libraries.

    Science.gov (United States)

    Souriau, Christelle; Rothacker, Julie; Hoogenboom, Hennie R; Nice, Edouard

    2004-09-01

    Antibodies to EGFR have been shown to display anti-tumour effects mediated in part by inhibition of cellular proliferation and angiogenesis, and by enhancement of apoptosis. Humanised antibodies are preferred for clinical use to reduce complications with HAMA and HAHA responses frequently seen with murine and chimaeric antibodies. We have used depletion and subtractive selection strategies on cells expressing the EGFR to sample two large antibody fragment phage display libraries for the presence of human antibodies which are specific for the EGFR. Four Fab fragments and six scFv fragments were identified, with affinities of up to 2.2nM as determined by BIAcore analysis using global fitting of the binding curves to obtain the individual rate constants (ka and kd). This overall approach offers a generic screening method for the identification of growth factor specific antibodies and antibody fragments from large expression libraries and has potential for the rapid development of new therapeutic and diagnostic reagents.

  9. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  10. Screening of a Brassica napus bacterial artificial chromosome library using highly parallel single nucleotide polymorphism assays

    Science.gov (United States)

    2013-01-01

    Background Efficient screening of bacterial artificial chromosome (BAC) libraries with polymerase chain reaction (PCR)-based markers is feasible provided that a multidimensional pooling strategy is implemented. Single nucleotide polymorphisms (SNPs) can be screened in multiplexed format, therefore this marker type lends itself particularly well for medium- to high-throughput applications. Combining the power of multiplex-PCR assays with a multidimensional pooling system may prove to be especially challenging in a polyploid genome. In polyploid genomes two classes of SNPs need to be distinguished, polymorphisms between accessions (intragenomic SNPs) and those differentiating between homoeologous genomes (intergenomic SNPs). We have assessed whether the highly parallel Illumina GoldenGate® Genotyping Assay is suitable for the screening of a BAC library of the polyploid Brassica napus genome. Results A multidimensional screening platform was developed for a Brassica napus BAC library which is composed of almost 83,000 clones. Intragenomic and intergenomic SNPs were included in Illumina’s GoldenGate® Genotyping Assay and both SNP classes were used successfully for screening of the multidimensional BAC pools of the Brassica napus library. An optimized scoring method is proposed which is especially valuable for SNP calling of intergenomic SNPs. Validation of the genotyping results by independent methods revealed a success of approximately 80% for the multiplex PCR-based screening regardless of whether intra- or intergenomic SNPs were evaluated. Conclusions Illumina’s GoldenGate® Genotyping Assay can be efficiently used for screening of multidimensional Brassica napus BAC pools. SNP calling was specifically tailored for the evaluation of BAC pool screening data. The developed scoring method can be implemented independently of plant reference samples. It is demonstrated that intergenomic SNPs represent a powerful tool for BAC library screening of a polyploid genome

  11. Software engineering the mixed model for genome-wide association studies on large samples

    Science.gov (United States)

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...

  12. Functional genomics of tomato

    Indian Academy of Sciences (India)

    2014-10-20

    Oct 20, 2014 ... 1Repository of Tomato Genomics Resources, Department of Plant Sciences, School .... Due to its position at the crossroads of Sanger's sequencing .... replacement for the microarray-based expression profiling. .... during RNA fragmentation step prior to library construction, ...... tomato pollen as a test case.

  13. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    Science.gov (United States)

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  14. Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library.

    Science.gov (United States)

    Page, Roderic D M

    2011-05-23

    The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org/.

  15. 15th ACM/IEEE-CS Joint Conference on Digital Libraries: Large, Dynamic and Ubiquitous - The Era of the Digital Library

    CERN Document Server

    2015-01-01

    Big Data is everywhere – from Computational Science to Digital Humanities, from Web Analytics to traditional libraries. While there do exist significant challenges in other areas, for many the biggest issue of all is a digital libraries one – How do we preserve big data collections? How do we provide access to big data collections? What new questions can we pose against our big data collections? These are all digital libraries questions. How can we, the digital libraries community, stand up in the face of these challenges and inform collection builders, curators, and interface developers how to best solve their challenges? What assumptions have we been working under that no longer hold in light of Big Data? These are some of the timely questions we hope to address at JCDL 2015.

  16. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.

    Science.gov (United States)

    Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L

    2016-05-01

    Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.

  17. Knowledgeability of Copyright Law among Librarians and Library Paraprofessionals Employed in Adult Services at a Large Public Library System.

    Science.gov (United States)

    Lavelle, Bridget M.

    Since public libraries contain copyrighted works in the form of print, electronic or audiovisual sources, librarians and library paraprofessionals need to possess sufficient knowledge of United States copyright law to meet the information needs of patrons successfully and legally. A literature review revealed that minimal works address this topic.…

  18. Unexpected observations after mapping LongSAGE tags to the human genome

    Directory of Open Access Journals (Sweden)

    Duret Laurent

    2007-05-01

    Full Text Available Abstract Background SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtained from all public libraries. We focused mainly on tags that do not map to known transcripts. Results Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails, junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data with these screens to ensure that our dataset is highly reliable, we studied the tags that map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half are located either in antisense or in new variants of these known transcripts. Conclusion We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the high rate of cross

  19. Genomic profiling of plasmablastic lymphoma using array comparative genomic hybridization (aCGH: revealing significant overlapping genomic lesions with diffuse large B-cell lymphoma

    Directory of Open Access Journals (Sweden)

    Lu Xin-Yan

    2009-11-01

    Full Text Available Abstract Background Plasmablastic lymphoma (PL is a subtype of diffuse large B-cell lymphoma (DLBCL. Studies have suggested that tumors with PL morphology represent a group of neoplasms with clinopathologic characteristics corresponding to different entities including extramedullary plasmablastic tumors associated with plasma cell myeloma (PCM. The goal of the current study was to evaluate the genetic similarities and differences among PL, DLBCL (AIDS-related and non AIDS-related and PCM using array-based comparative genomic hybridization. Results Examination of genomic data in PL revealed that the most frequent segmental gain (> 40% include: 1p36.11-1p36.33, 1p34.1-1p36.13, 1q21.1-1q23.1, 7q11.2-7q11.23, 11q12-11q13.2 and 22q12.2-22q13.3. This correlated with segmental gains occurring in high frequency in DLBCL (AIDS-related and non AIDS-related cases. There were some segmental gains and some segmental loss that occurred in PL but not in the other types of lymphoma suggesting that these foci may contain genes responsible for the differentiation of this lymphoma. Additionally, some segmental gains and some segmental loss occurred only in PL and AIDS associated DLBCL suggesting that these foci may be associated with HIV infection. Furthermore, some segmental gains and some segmental loss occurred only in PL and PCM suggesting that these lesions may be related to plasmacytic differentiation. Conclusion To the best of our knowledge, the current study represents the first genomic exploration of PL. The genomic aberration pattern of PL appears to be more similar to that of DLBCL (AIDS-related or non AIDS-related than to PCM. Our findings suggest that PL may remain best classified as a subtype of DLBCL at least at the genome level.

  20. The academic library network

    Directory of Open Access Journals (Sweden)

    Jacek Wojciechowski

    2012-01-01

    Full Text Available The efficiency of libraries, academic libraries in particular, necessitates organizational changes facilitating or even imposing co-operation. Any structure of any university has to have an integrated network of libraries, with an appropriate division of work, and one that is consolidated as much as it is possible into medium-size or large libraries. Within thus created network, a chance arises to centralize the main library processes based on appropriate procedures in the main library, highly specialized, more effective and therefore cheaper in operation, including a co-ordination of all more important endeavours and tasks. Hierarchically subordinated libraries can be thus more focused on performing their routine service, more and more frequently providing for the whole of the university, and being able to adjust to changeable requirements and demands of patrons and of new tasks resulting from the new model of the university operation. Another necessary change seems to be a universal implementation of an ov rall programme framework that would include all services in the university’s library networks.

  1. Construction of a BAC library and identification of Dmrt1 gene of the rice field eel, Monopterus albus

    International Nuclear Information System (INIS)

    Jang Songhun; Zhou Fang; Xia Laixin; Zhao Wei; Cheng Hanhua; Zhou Rongjia

    2006-01-01

    A bacterial artificial chromosome (BAC) library was constructed using nuclear DNA from the rice field eel (Monopterus albus). The BAC library consists of a total of 33,000 clones with an average insert size of 115 kb. Based on the rice field eel haploid genome size of 600 Mb, the BAC library is estimated to contain approximately 6.3 genome equivalents and represents 99.8% of the genome of the rice field eel. This is first BAC library constructed from this species. To estimate the possibility of isolating a specific clone, high-density colony hybridization-based library screening was performed using Dmrt1 cDNA of the rice field eel as a probe. Both library screening and PCR identification results revealed three positive BAC clones which were overlapped, and formed a contig covering the Dmrt1 gene of 195 kb. By sequence comparisons with the Dmrt1 cDNA and sequencing of first four intron-exon junctions, Dmrt1 gene of the rice field eel was predicted to contain four introns and five exons. The sizes of first and second intron are 1.5 and 2.6 kb, respectively, and the sizes of last two introns were predicted to be about 20 kb. The Dmrt1 gene structure was conserved in evolution. These results also indicate that the BAC library is a useful resource for BAC contig construction and molecular isolation of functional genes

  2. Complementation of radiation-sensitive Ataxia telangiectasia cells after transfection of cDNA expression libraries and cosmid clones from wildtype cells

    International Nuclear Information System (INIS)

    Fritz, E.

    1994-06-01

    In this Ph.D.-thesis, phenotypic complementation of AT-cells (AT5BIVA) by transfection of cDNA-expression-libraries was adressed: After stable transfection of cDNA-expression-libraries G418 resistant clones were selected for enhanced radioresistance by a fractionated X-ray selection. One surviving transfectant clone (clone 514) exhibited enhanced radiation resistance in dose-response experiments and further X-ray selections. Cell cycle analysis revealed complementation of untreated and irradiated 514-cells in cell cycle progression. The rate of DNA synthesis, however, is not diminished after irradiation but shows the reverse effect. A transfected cDNA-fragment (AT500-cDNA) was isolated from the genomic DNA of 514-cells and proved to be an unknown DNA sequence. A homologous sequence could be detected in genomic DNA from human cell lines, but not in DNA from other species. The cDNA-sequence could be localized to human chromosome 11. In human cells the cDNA sequence is part of two large mRNAs. 4 different cosmid clones containing high molecular genomic DNA from normal human cells could be isolated from a library, each hybridizing to the AT500-cDNA. After stable transfection into AT-cells, one cosmid-clone was able to confer enhanced radiation resistance both in X-ray selections and dose-response experiments. The results indicate that the cloned cDNA-fragment is based on an unknown gene from human chromosome 11 which partially complements the radiosensitivity and the defective cell cycle progression in AT5BIVA cells. (orig.) [de

  3. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  4. The TPAC Digital Library: A Web Application for Publishing Large Catalogs of Earth Science Data

    Science.gov (United States)

    Blain, P.; Pugh, T.

    2010-12-01

    The Tasmanian Partnership for Advanced Computing (TPAC) has developed a rich web-based application that publishes large catalogs of scientific datasets. The TPAC Digital Library provides a user interface for viewing, searching, and accessing the catalog data collections, as well as enabling data services for user access. The product also provides management functions for librarians of digital data collections. The search features allow files to be selected graphically based on geospatial extent, or by file name, variable name, attribute value, and by tag. Alternatively, there is a file manager style interface that provides a direct route to the data. The interface is specifically geared towards discovery and access of earth science data files, which makes it intuitive and easy to navigate. Files can be downloaded, or accessed through OPeNDAP, GridFTP, WCS, Matlab and other interfaces. The digital library can harvest metadata from THREDDS, Hyrax, IPCC catalogs and other instances of the digital library. The product is freely available under an open-source license, and is currently deployed by a small but active user base. It has existed since 2005, and remains under constant development by TPAC and other contributors (including the Australian Bureau of Meteorology). Current development initiatives will allow interoperability with library service protocols, as well as other data archive organizations and scientific bodies for data reference transparency. There is a project in progress that will allow the data collection’s owner to attach attribute information, access rights, and meta-data to the data collection to conform to various user community and service standards. Future releases will allow publishers to attach media rich information about the data collection, as well as additional information about scientific results, and papers and web pages that reference the data collection. The presentation will discuss the current implementation, and future directions.

  5. A borderless Library

    CERN Multimedia

    CERN Library

    2010-01-01

    The CERN Library has a large collection of documents in online or printed format in all disciplines needed by physicists, engineers and technicians. However,  users sometimes need to read documents not available at CERN. But don’t worry! Thanks to its Interlibrary loan and document delivery service, the CERN Library can still help you. Just fill in the online form or email us. We will then locate the document in other institutions and order it for you free of charge. The CERN Library cooperates with the largest libraries in Europe, such as ETH (Eidgenössische Technische Hochschule) in Zurich, TIB (Technische Informationsbibliothek) in Hanover and the British Library in London. Thanks to our network and our expertise in document search, most requests are satisfied in record time: articles are usually served in .pdf version a few hours after the order, and books or other printed materials are delivered within a few days. It is possible to ask for all types of documents suc...

  6. The large-scale blast score ratio (LS-BSR pipeline: a method to rapidly compare genetic content between bacterial genomes

    Directory of Open Access Journals (Sweden)

    Jason W. Sahl

    2014-04-01

    Full Text Available Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR.Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27–57 h, depending upon the alignment method, using 16 processors.Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated

  7. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  8. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Directory of Open Access Journals (Sweden)

    Tran Duc

    2010-05-01

    Full Text Available Abstract Background Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the

  9. Millstone: software for multiplex microbial genome analysis and engineering.

    Science.gov (United States)

    Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  10. Profesional medical library education in the United States in relation to the qualifications of medical library manpower in Ohio.

    Science.gov (United States)

    Rees, A M; Rothenberg, L; Denison, B

    1968-10-01

    THE PRESENT SYSTEM OF EDUCATION FOR MEDICAL LIBRARY PRACTICE IN THE UNITED STATES CONSISTS OF FOUR MAJOR COMPONENTS: graduate degree programs in library science with specialization in medical librarianship; graduate degree programs in library science with no such specialization; postgraduate internships in medical libraries; continuing education programs. Data are presented illustrating the flow of graduates along these several educational pathways into medical library practice.The relevance of these educational components to the current medical library work force is discussed with reference to manpower data compiled for Ohio. The total number of medical library personnel in Ohio in 1968 is 316. Of this total, only forty-two (approximately 14 percent) have received any formal library training. Seventy persons have only a high school education. From these figures, it is concluded that there is no standard or essential qualification which is universally accepted as educational preparation for work in medical libraries; that the comparative sophistication of the educational programs in medical librarianship has yet to be reflected widely in general medical library practice; that an increasingly large number of non-professional or ancillary personnel are being, and will continue to be, utilized in medical libraries; that large numbers of untrained persons have sole responsibility for medical libraries; and that appropriate educational programs will have to be designed specifically for this type of personnel.

  11. Jannovar: a java library for exome annotation.

    Science.gov (United States)

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.

  12. 12G: code for conversion of isotope-ordered cross-section libraries into group-ordered cross-section libraries

    International Nuclear Information System (INIS)

    Resnik, W.M. II; Bosler, G.E.

    1977-09-01

    Many current reactor physics codes accept cross-section libraries in an isotope-ordered form, convert them with internal preprocessing routines to a group-ordered form, and then perform calculations using these group-ordered data. Occasionally, because of storage and time limitations, the preprocessing routines in these codes cannot convert very large multigroup isotope-ordered libraries. For this reason, the I2G code, i.e., ISOTXS to GRUPXS, was written to convert externally isotope-ordered cross section libraries in the standard file format called ISOTXS to group-ordered libraries in the standard format called GRUPXS. This code uses standardized multilevel data management routines which establish a strategy for the efficient conversion of large libraries. The I2G code is exportable contingent on access to, and an intimate familiarization with, the multilevel routines. These routines are machine dependent, and therefore must be provided by the importing facility. 6 figures, 3 tables

  13. A genomic library-based amplification approach (GL-PCR) for the mapping of multiple IS6110 insertion sites and strain differentiation of Mycobacterium tuberculosis.

    Science.gov (United States)

    Namouchi, Amine; Mardassi, Helmi

    2006-11-01

    Evidence suggests that insertion of the IS6110 element is not without consequence to the biology of Mycobacterium tuberculosis complex strains. Thus, mapping of multiple IS6110 insertion sites in the genome of biomedically relevant clinical isolates would result in a better understanding of the role of this mobile element, particularly with regard to transmission, adaptability and virulence. In the present paper, we describe a versatile strategy, referred to as GL-PCR, that amplifies IS6110-flanking sequences based on the construction of a genomic library. M. tuberculosis chromosomal DNA is fully digested with HincII and then ligated into a plasmid vector between T7 and T3 promoter sequences. The ligation reaction product is transformed into Escherichia coli and selective PCR amplification targeting both 5' and 3' IS6110-flanking sequences are performed on the plasmid library DNA. For this purpose, four separate PCR reactions are performed, each combining an outward primer specific for one IS6110 end with either T7 or T3 primer. Determination of the nucleotide sequence of the PCR products generated from a single ligation reaction allowed mapping of 21 out of the 24 IS6110 copies of two 12 banded M. tuberculosis strains, yielding an overall sensitivity of 87,5%. Furthermore, by simply comparing the migration pattern of GL-PCR-generated products, the strategy proved to be as valuable as IS6110 RFLP for molecular typing of M. tuberculosis complex strains. Importantly, GL-PCR was able to discriminate between strains differing by a single IS6110 band.

  14. Reusable libraries for safety-critical Java

    DEFF Research Database (Denmark)

    Rios Rivas, Juan Ricardo; Schoeberl, Martin

    2014-01-01

    The large collection of Java class libraries is a main factor of the success of Java. However, these libraries assume that a garbage-collected heap is used. Safety-critical Java uses scope-based memory areas instead of a garbage-collected heap. Therefore, the Java class libraries are problematic...... to use in safety-critical Java. We have identified common programming patterns in the Java class libraries that make them unsuitable for safety-critical Java. We propose ways to improve the libraries to avoid the impact of the identified problematic patterns. We illustrate these changes by implementing...

  15. Design for manufacturability of a VDSM standard cell library

    International Nuclear Information System (INIS)

    Zhou Chong; Zeng Jianping; Chen Lan; Yin Minghui; Zhao Jie

    2012-01-01

    This paper presents a method of designing a 65 nm DFM standard cell library. By reducing the amount of the library largely, the process of optical proximity correction (OPC) becomes more efficient and the need for large storage is reduced. This library is more manufacture-friendly as each cell has been optimized according to the DFM rule and optical simulation. The area penalty is minor compared with traditional library, and the timing, as well as power has a good performance. Furthermore, this library has passed the test from the Technology Design Department of Foundry. The result shows this DFM standard cell library has advantages that improve the yield. (semiconductor integrated circuits)

  16. Generation of comprehensive transposon insertion mutant library for the model archaeon, Haloferax volcanii, and its use for gene discovery.

    Science.gov (United States)

    Kiljunen, Saija; Pajunen, Maria I; Dilks, Kieran; Storf, Stefanie; Pohlschroder, Mechthild; Savilahti, Harri

    2014-12-09

    Archaea share fundamental properties with bacteria and eukaryotes. Yet, they also possess unique attributes, which largely remain poorly characterized. Haloferax volcanii is an aerobic, moderately halophilic archaeon that can be grown in defined media. It serves as an excellent archaeal model organism to study the molecular mechanisms of biological processes and cellular responses to changes in the environment. Studies on haloarchaea have been impeded by the lack of efficient genetic screens that would facilitate the identification of protein functions and respective metabolic pathways. Here, we devised an insertion mutagenesis strategy that combined Mu in vitro DNA transposition and homologous-recombination-based gene targeting in H. volcanii. We generated an insertion mutant library, in which the clones contained a single genomic insertion. From the library, we isolated pigmentation-defective and auxotrophic mutants, and the respective insertions pinpointed a number of genes previously known to be involved in carotenoid and amino acid biosynthesis pathways, thus validating the performance of the methodologies used. We also identified mutants that had a transposon insertion in a gene encoding a protein of unknown or putative function, demonstrating that novel roles for non-annotated genes could be assigned. We have generated, for the first time, a random genomic insertion mutant library for a halophilic archaeon and used it for efficient gene discovery. The library will facilitate the identification of non-essential genes behind any specific biochemical pathway. It represents a significant step towards achieving a more complete understanding of the unique characteristics of halophilic archaea.

  17. Capturing the 'ome': the expanding molecular toolbox for RNA and DNA library construction.

    Science.gov (United States)

    Boone, Morgane; De Koker, Andries; Callewaert, Nico

    2018-04-06

    All sequencing experiments and most functional genomics screens rely on the generation of libraries to comprehensively capture pools of targeted sequences. In the past decade especially, driven by the progress in the field of massively parallel sequencing, numerous studies have comprehensively assessed the impact of particular manipulations on library complexity and quality, and characterized the activities and specificities of several key enzymes used in library construction. Fortunately, careful protocol design and reagent choice can substantially mitigate many of these biases, and enable reliable representation of sequences in libraries. This review aims to guide the reader through the vast expanse of literature on the subject to promote informed library generation, independent of the application.

  18. Pure chromosome-specific PCR libraries from single sorted chromosomes

    NARCIS (Netherlands)

    VanDevanter, D. R.; Choongkittaworn, N. M.; Dyer, K. A.; Aten, J. A.; Otto, P.; Behler, C.; Bryant, E. M.; Rabinovitch, P. S.

    1994-01-01

    Chromosome-specific DNA libraries can be very useful in molecular and cytogenetic genome mapping studies. We have developed a rapid and simple method for the generation of chromosome-specific DNA sequences that relies on polymerase chain reaction (PCR) amplification of a single flow-sorted

  19. Experience from large scale use of the EuroGenomics custom SNP chip in cattle

    DEFF Research Database (Denmark)

    Boichard, Didier A; Boussaha, Mekki; Capitan, Aurélien

    2018-01-01

    This article presents the strategy to evaluate candidate mutations underlying QTL or responsible for genetic defects, based upon the design and large-scale use of the Eurogenomics custom SNP chip set up for bovine genomic selection. Some variants under study originated from mapping genetic defect...

  20. Assessment of Insert Sizes and Adapter Content in Fastq Data from NexteraXT Libraries

    Directory of Open Access Journals (Sweden)

    Frances Susan Turner

    2014-01-01

    Full Text Available The Illumina NexteraXT transposon protocol is a cost effective way to generate paired end libraries. However the resulting insert size is highly sensitive to the concentration of DNA used, and the variation of insert sizes is often large. One consequence of this is some fragments may have an insert shorter than the length of a single read, particularly where the library is designed to produce overlapping paired end reads in order to produce longer continuous sequences. Such small insert sizes mean fewer longer reads, and also result in the presence of adapter at the end of the read. Here is presented a protocol to use publicly available tools to identify read pairs with small insert sizes and so likely to contain adapter, to check the sequence of the adapter, and remove adapter sequence from the reads. This protocol does not require a reference genome or prior knowledge of the sequence to be trimmed. Whilst the presence of fragments with small insert sizes may be a particular problem for NexteraXT libraries, the principle can be applied to any Illumina dataset in which the presence of such small inserts is suspected.

  1. Floating Collection in an Academic Library: An Audacious Experiment That Succeeded

    Science.gov (United States)

    Coopey, Barbara; Eshbach, Barbara; Notartomas, Trish

    2016-01-01

    Can a floating collection thrive in a large multicampus academic research library? Floating collections have been successful in public libraries for some time, but it is uncommon for academic libraries and unheard of for a large academic library system. This article will discuss the investigation into the feasibility of a floating collection at…

  2. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    OpenAIRE

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Abstract Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Res...

  3. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.

  4. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  5. Construction of an Americn mink Bacterial Artificial Chromosome (BAC) library and sequencing candidate genes important for the fur industry

    DEFF Research Database (Denmark)

    Anistoroaei, Razvan Marian; Hallers, Boudewijn ten; Nefedov, Michael

    2011-01-01

    BACKGROUND: Bacterial artificial chromosome (BAC) libraries continue to be invaluable tools for the genomic analysis of complex organisms. Complemented by the newly and fast growing deep sequencing technologies, they provide an excellent source of information in genomics projects. RESULTS: Here, we...... report the construction and characterization of the CHORI-231 BAC library constructed from a Danish-farmed, male American mink (Neovison vison). The library contains approximately 165,888 clones with an average insert size of 170 kb, representing approximately 10-fold coverage. High-density filters, each...... consisting of 18,432 clones spotted in duplicate, have been produced for hybridization screening and are publicly available. Overgo probes derived from expressed sequence tags (ESTs), representing 21 candidate genes for traits important for the mink industry, were used to screen the BAC library...

  6. Leadership Diversity: A Study of Urban Public Libraries

    Science.gov (United States)

    Winston, Mark; Li, Haipeng

    2007-01-01

    Diversity has been identified as a priority in library and information services for some time. The limited published research on diversity programs in libraries, though, has focused on academic libraries. This article represents the results of a study of leadership diversity in large, urban public libraries. In the study of members of the Urban…

  7. Mixture-based combinatorial libraries from small individual peptide libraries: a case study on α1-antitrypsin deficiency.

    Science.gov (United States)

    Chang, Yi-Pin; Chu, Yen-Ho

    2014-05-16

    The design, synthesis and screening of diversity-oriented peptide libraries using a "libraries from libraries" strategy for the development of inhibitors of α1-antitrypsin deficiency are described. The major buttress of the biochemical approach presented here is the use of well-established solid-phase split-and-mix method for the generation of mixture-based libraries. The combinatorial technique iterative deconvolution was employed for library screening. While molecular diversity is the general consideration of combinatorial libraries, exquisite design through systematic screening of small individual libraries is a prerequisite for effective library screening and can avoid potential problems in some cases. This review will also illustrate how large peptide libraries were designed, as well as how a conformation-sensitive assay was developed based on the mechanism of the conformational disease. Finally, the combinatorially selected peptide inhibitor capable of blocking abnormal protein aggregation will be characterized by biophysical, cellular and computational methods.

  8. Construction of a 7-fold BAC library and cytogenetic mapping of 10 genes in the giant panda (Ailuropoda melanoleuca)

    OpenAIRE

    Liu, Wei; Zhao, Yonghui; Liu, Zhaoliang; Zhang, Ying; Lian, Zhengxing; Li, Ning

    2006-01-01

    Abstract Background The giant panda, one of the most primitive carnivores, is an endangered animal. Although it has been the subject of many interesting studies during recent years, little is known about its genome. In order to promote research on this genome, a bacterial artificial chromosome (BAC) library of the giant panda was constructed in this study. Results This BAC library contains 198,844 clones with an average insert size of 108 kb, which represents approximately seven equivalents o...

  9. A Rapid and Efficient Method for Purifying High Quality Total RNA from Peaches (Prunus persica for Functional Genomics Analyses

    Directory of Open Access Journals (Sweden)

    LEE MEISEL

    2005-01-01

    Full Text Available Prunus persica has been proposed as a genomic model for deciduous trees and the Rosaceae family. Optimized protocols for RNA isolation are necessary to further advance studies in this model species such that functional genomics analyses may be performed. Here we present an optimized protocol to rapidly and efficiently purify high quality total RNA from peach fruits (Prunus persica. Isolating high-quality RNA from fruit tissue is often difficult due to large quantities of polysaccharides and polyphenolic compounds that accumulate in this tissue and co-purify with the RNA. Here we demonstrate that a modified version of the method used to isolate RNA from pine trees and the woody plant Cinnamomun tenuipilum is ideal for isolating high quality RNA from the fruits of Prunus persica. This RNA may be used for many functional genomic based experiments such as RT-PCR and the construction of large-insert cDNA libraries.

  10. Experience of Google's latest deep learning library, TensorFlow, in a large-scale WLCG cluster

    Energy Technology Data Exchange (ETDEWEB)

    Kawamura, Gen; Smith, Joshua Wyatt; Quadt, Arnulf [II. Physikalisches Institut, Georg-August-Universitaet Goettingen (Germany)

    2016-07-01

    The researchers at the Google Brain team released their second generation's Deep Learning library, TensorFlow, as an open-source package under the Apache 2.0 license in November, 2015. Google has already deployed the first generation's library using DistBlief in various systems such as Google Search, advertising systems, speech recognition systems, Google Images, Google Maps, Street View, Google Translate and many other latest products. In addition, many researchers in high energy physics have recently started to understand and use Deep Learning algorithms in their own research and analysis. We conceive a first use-case scenario of TensorFlow to create the Deep Learning models from high-dimensional inputs like physics analysis data in a large-scale WLCG computing cluster. TensorFlow carries out computations using a dataflow model and graph structure onto a wide variety of different hardware platforms and systems, such as many CPU architectures, GPUs and smartphone platforms. Having a single library that can distribute the computations to create a model to the various platforms and systems would significantly simplify the use of Deep Learning algorithms in high energy physics. We deploy TensorFlow with the Docker container environments and present the first use in our grid system.

  11. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    Science.gov (United States)

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  12. Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma

    NARCIS (Netherlands)

    Cerhan, James R.; Berndt, Sonja I.; Vijai, Joseph; Ghesquières, Hervé; McKay, James; Wang, Sophia S.; Wang, Zhaoming; Yeager, Meredith; Conde, Lucia; De Bakker, Paul I W; Nieters, Alexandra; Cox, David; Burdett, Laurie; Monnereau, Alain; Flowers, Christopher R.; De Roos, Anneclaire J.; Brooks-Wilson, Angela R.; Lan, Qing; Severi, Gianluca; Melbye, Mads; Gu, Jian; Jackson, Rebecca D.; Kane, Eleanor; Teras, Lauren R.; Purdue, Mark P.; Vajdic, Claire M.; Spinelli, John J.; Giles, Graham G.; Albanes, Demetrius; Kelly, Rachel S.; Zucca, Mariagrazia; Bertrand, Kimberly A.; Zeleniuch-Jacquotte, Anne; Lawrence, Charles; Hutchinson, Amy; Zhi, Degui; Habermann, Thomas M.; Link, Brian K.; Novak, Anne J.; Dogan, Ahmet; Asmann, Yan W.; Liebow, Mark; Thompson, Carrie A.; Ansell, Stephen M.; Witzig, Thomas E.; Weiner, George J.; Veron, Amelie S.; Zelenika, Diana; Tilly, Hervé; Haioun, Corinne; Molina, Thierry Jo; Hjalgrim, Henrik; Glimelius, Bengt; Adami, Hans Olov; Bracci, Paige M.; Riby, Jacques; Smith, Martyn T.; Holly, Elizabeth A.; Cozen, Wendy; Hartge, Patricia; Morton, Lindsay M.; Severson, Richard K.; Tinker, Lesley F.; North, Kari E.; Becker, Nikolaus; Benavente, Yolanda; Boffetta, Paolo; Brennan, Paul; Foretova, Lenka; Maynadie, Marc; Staines, Anthony; Lightfoot, Tracy; Crouch, Simon; Smith, Alex; Roman, Eve; Diver, W. Ryan; Offit, Kenneth; Zelenetz, Andrew; Klein, Robert J.; Villano, Danylo J.; Zheng, Tongzhang; Zhang, Yawei; Holford, Theodore R.; Kricker, Anne; Turner, Jenny; Southey, Melissa C.; Clavel, Jacqueline; Virtamo, Jarmo; Weinstein, Stephanie; Riboli, Elio; Vineis, Paolo; Kaaks, Rudolph; Trichopoulos, Dimitrios; Vermeulen, Roel C H; Boeing, Heiner; Tjonneland, Anne; Angelucci, Emanuele; Di Lollo, Simonetta; Rais, Marco; Birmann, Brenda M.; Laden, Francine; Giovannucci, Edward; Kraft, Peter; Huang, Jinyan; Ma, Baoshan; Ye, Yuanqing; Chiu, Brian C H; Sampson, Joshua; Liang, Liming; Park, Ju Hyun; Chung, Charles C.; Weisenburger, Dennis D.; Chatterjee, Nilanjan; Fraumeni, Joseph F.; Slager, Susan L.; Wu, Xifeng; De Sanjose, Silvia; Smedby, Karin E.; Salles, Gilles; Skibola, Christine F.; Rothman, Nathaniel; Chanock, Stephen J.

    2014-01-01

    Diffuse large B cell lymphoma (DLBCL) is the most common lymphoma subtype and is clinically aggressive. To identify genetic susceptibility loci for DLBCL, we conducted a meta-analysis of 3 new genome-wide association studies (GWAS) and 1 previous scan, totaling 3,857 cases and 7,666 controls of

  13. ''After the Genome 5 Conference'' to be held October 6-10, 1999 in Jackson Hole, Wyoming

    Energy Technology Data Exchange (ETDEWEB)

    Roger Brent

    1999-10-06

    OAK B139 The postgenomic era is arriving faster than anyone had imagined--sometime during 2000 we'll have a large fraction of the human genome sequence. Heretofore, our understanding of function has come from non-industrial experiments whose conclusions were largely framed in human language. The advent of large amounts of sequence data, and of ''functional genomic'' data types such as mRNA expression data, have changed this picture. These data share the feature that individual observations and measurements are typically relatively low value adding. Such data is now being generated so rapidly that the amount of information contained in it will surpass the amount of biological information collected by traditional means. It is tantalizing to envision using genomic information to create a quantitative biology with a very strong data component. Unfortunately, we are very early in our understanding of how to ''compute on'' genomic information so as to extract biological knowledge from i t. In fact, some current efforts to come to grips with genomic information often resemble a computer savvy library science, where the most important issues concern categories, classification schemes, and information retrieval. When exploring new libraries, a measure of cataloging and inventory is surely inevitable. However, at some point we will need to move from library science to scholarship.We would like to achieve a quantitative and predictive understanding of biological function. We realize that making the bridge from knowledge of systems to the sets of abstractions that constitute computable entities is not easy. The After the Genome meetings were started in 1995 to help the biological community think about and prepare for the changes in biological research in the face of the oncoming flow of genomic information. The term ''After the Genome'' refers to a future in which complete inventories of the gene products of

  14. High-throughput expression of animal venom toxins in Escherichia coli to generate a large library of oxidized disulphide-reticulated peptides for drug discovery.

    Science.gov (United States)

    Turchetto, Jeremy; Sequeira, Ana Filipa; Ramond, Laurie; Peysson, Fanny; Brás, Joana L A; Saez, Natalie J; Duhoo, Yoan; Blémont, Marilyne; Guerreiro, Catarina I P D; Quinton, Loic; De Pauw, Edwin; Gilles, Nicolas; Darbon, Hervé; Fontes, Carlos M G A; Vincentelli, Renaud

    2017-01-17

    Animal venoms are complex molecular cocktails containing a wide range of biologically active disulphide-reticulated peptides that target, with high selectivity and efficacy, a variety of membrane receptors. Disulphide-reticulated peptides have evolved to display improved specificity, low immunogenicity and to show much higher resistance to degradation than linear peptides. These properties make venom peptides attractive candidates for drug development. However, recombinant expression of reticulated peptides containing disulphide bonds is challenging, especially when associated with the production of large libraries of bioactive molecules for drug screening. To date, as an alternative to artificial synthetic chemical libraries, no comprehensive recombinant libraries of natural venom peptides are accessible for high-throughput screening to identify novel therapeutics. In the accompanying paper an efficient system for the expression and purification of oxidized disulphide-reticulated venom peptides in Escherichia coli is described. Here we report the development of a high-throughput automated platform, that could be adapted to the production of other families, to generate the largest ever library of recombinant venom peptides. The peptides were produced in the periplasm of E. coli using redox-active DsbC as a fusion tag, thus allowing the efficient formation of correctly folded disulphide bridges. TEV protease was used to remove fusion tags and recover the animal venom peptides in the native state. Globally, within nine months, out of a total of 4992 synthetic genes encoding a representative diversity of venom peptides, a library containing 2736 recombinant disulphide-reticulated peptides was generated. The data revealed that the animal venom peptides produced in the bacterial host were natively folded and, thus, are putatively biologically active. Overall this study reveals that high-throughput expression of animal venom peptides in E. coli can generate large

  15. Genomics Portals: integrative web-platform for mining genomics data.

    Science.gov (United States)

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  16. Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada.

    Science.gov (United States)

    Kuzmina, Maria L; Braukmann, Thomas W A; Fazekas, Aron J; Graham, Sean W; Dewaard, Stephanie L; Rodrigues, Anuar; Bennett, Bruce A; Dickinson, Timothy A; Saarela, Jeffery M; Catling, Paul M; Newmaster, Steven G; Percy, Diana M; Fenneman, Erin; Lauron-Moreau, Aurélien; Ford, Bruce; Gillespie, Lynn; Subramanyam, Ragupathy; Whitton, Jeannette; Jennings, Linda; Metsger, Deborah; Warne, Connor P; Brown, Allison; Sears, Elizabeth; Dewaard, Jeremy R; Zakharov, Evgeny V; Hebert, Paul D N

    2017-12-01

    Constructing complete, accurate plant DNA barcode reference libraries can be logistically challenging for large-scale floras. Here we demonstrate the promise and challenges of using herbarium collections for building a DNA barcode reference library for the vascular plant flora of Canada. Our study examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada (98%). For 98% of the specimens, at least one of the DNA barcode regions was recovered from the plastid loci rbcL and matK and from the nuclear ITS2 region. We used beta regression to quantify the effects of age, type of preservation, and taxonomic affiliation (family) on DNA sequence recovery. Specimen age and method of preservation had significant effects on sequence recovery for all markers, but influenced some families more (e.g., Boraginaceae) than others (e.g., Asteraceae). Our DNA barcode library represents an unparalleled resource for metagenomic and ecological genetic research working on temperate and arctic biomes. An observed decline in sequence recovery with specimen age may be associated with poor primer matches, intragenomic variation (for ITS2), or inhibitory secondary compounds in some taxa.

  17. Versatile P(acman) BAC Libraries for Transgenesis Studies in Drosophila melanogaster

    Energy Technology Data Exchange (ETDEWEB)

    Venken, Koen J.T.; Carlson, Joseph W.; Schulze, Karen L.; Pan, Hongling; He, Yuchun; Spokony, Rebecca; Wan, Kenneth H.; Koriabine, Maxim; de Jong, Pieter J.; White, Kevin P.; Bellen, Hugo J.; Hoskins, Roger A.

    2009-04-21

    We constructed Drosophila melanogaster BAC libraries with 21-kb and 83-kb inserts in the P(acman) system. Clones representing 12-fold coverage and encompassing more than 95percent of annotated genes were mapped onto the reference genome. These clones can be integrated into predetermined attP sites in the genome using Phi C31 integrase to rescue mutations. They can be modified through recombineering, for example to incorporate protein tags and assess expression patterns.

  18. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    Science.gov (United States)

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  19. Genomic resources for water yam (Dioscorea alata L.): analyses of EST-Sequences, De Novo sequencing and GBS libraries

    Science.gov (United States)

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources such as SSRs, SNPs and InDels in several model and non-model plant species. Yam (Dioscorea spp.) i...

  20. Genome-Wide Mutagenesis in Borrelia burgdorferi.

    Science.gov (United States)

    Lin, Tao; Gao, Lihui

    2018-01-01

    Signature-tagged mutagenesis (STM) is a functional genomics approach to identify bacterial virulence determinants and virulence factors by simultaneously screening multiple mutants in a single host animal, and has been utilized extensively for the study of bacterial pathogenesis, host-pathogen interactions, and spirochete and tick biology. The signature-tagged transposon mutagenesis has been developed to investigate virulence determinants and pathogenesis of Borrelia burgdorferi. Mutants in genes important in virulence are identified by negative selection in which the mutants fail to colonize or disseminate in the animal host and tick vector. STM procedure combined with Luminex Flex ® Map™ technology and next-generation sequencing (e.g., Tn-seq) are the powerful high-throughput tools for the determination of Borrelia burgdorferi virulence determinants. The assessment of multiple tissue sites and two DNA resources at two different time points using Luminex Flex ® Map™ technology provides a robust data set. B. burgdorferi transposon mutant screening indicates that a high proportion of genes are the novel virulence determinants that are required for mouse and tick infection. In this protocol, an effective signature-tagged Himar1-based transposon suicide vector was developed and used to generate a sequence-defined library of nearly 4800 mutants in the infectious B. burgdorferi B31 clone. In STM, signature-tagged suicide vectors are constructed by inserting unique DNA sequences (tags) into the transposable elements. The signature-tagged transposon mutants are generated when transposon suicide vectors are transformed into an infectious B. burgdorferi clone, and the transposable element is transposed into the 5'-TA-3' sequence in the B. burgdorferi genome with the signature tag. The transposon library is created and consists of many sub-libraries, each sub-library has several hundreds of mutants with same tags. A group of mice or ticks are infected with a mixed

  1. A large synthetic peptide and phosphopeptide reference library for mass spectrometry–based proteomics

    NARCIS (Netherlands)

    Marx, H.; Lemeer, S.; Schliep, J.E.; Matheron, L.I.; Mohammed, S.; Cox, J.; Mann, M.; Heck, A.J.R.; Kuster, B.

    2013-01-01

    We present a peptide library and data resource of >100,000 synthetic, unmodified peptides and their phosphorylated counterparts with known sequences and phosphorylation sites. Analysis of the library by mass spectrometry yielded a data set that we used to evaluate the merits of different search

  2. A simple, rapid and efficient method for the extraction of genomic ...

    African Journals Online (AJOL)

    The isolation of intact, high-molecular-mass genomic DNA is essential for many molecular biology applications including long range PCR, endonuclease restriction digestion, southern blot analysis, and genomic library construction. Many protocols are available for the extraction of DNA from plant material, but obtain it is ...

  3. Large scale genomic reorganization of topological domains at the HoxD locus.

    Science.gov (United States)

    Fabre, Pierre J; Leleu, Marion; Mormann, Benjamin H; Lopez-Delisle, Lucille; Noordermeer, Daan; Beccari, Leonardo; Duboule, Denis

    2017-08-07

    The transcriptional activation of HoxD genes during mammalian limb development involves dynamic interactions with two topologically associating domains (TADs) flanking the HoxD cluster. In particular, the activation of the most posterior HoxD genes in developing digits is controlled by regulatory elements located in the centromeric TAD (C-DOM) through long-range contacts. To assess the structure-function relationships underlying such interactions, we measured compaction levels and TAD discreteness using a combination of chromosome conformation capture (4C-seq) and DNA FISH. We assessed the robustness of the TAD architecture by using a series of genomic deletions and inversions that impact the integrity of this chromatin domain and that remodel long-range contacts. We report multi-partite associations between HoxD genes and up to three enhancers. We find that the loss of native chromatin topology leads to the remodeling of TAD structure following distinct parameters. Our results reveal that the recomposition of TAD architectures after large genomic re-arrangements is dependent on a boundary-selection mechanism in which CTCF mediates the gating of long-range contacts in combination with genomic distance and sequence specificity. Accordingly, the building of a recomposed TAD at this locus depends on distinct functional and constitutive parameters.

  4. The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

    Science.gov (United States)

    González-Recio, O; Jiménez-Montero, J A; Alenda, R

    2013-01-01

    In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy

  5. Non PCR-amplified Transcripts and AFLP fragments as reduced representations of the quail genome for 454 Titanium sequencing

    Directory of Open Access Journals (Sweden)

    Leterrier Christine

    2010-07-01

    Full Text Available Abstract Background SNP (Single Nucleotide Polymorphism discovery is now routinely performed using high-throughput sequencing of reduced representation libraries. Our objective was to adapt 454 GS FLX based sequencing methodologies in order to obtain the largest possible dataset from two reduced representations libraries, produced by AFLP (Amplified Fragment Length Polymorphism for genomic DNA, and EST (Expressed Sequence Tag for the transcribed fraction of the genome. Findings The expressed fraction was obtained by preparing cDNA libraries without PCR amplification from quail embryo and brain. To optimize the information content for SNP analyses, libraries were prepared from individuals selected in three quail lines and each individual in the AFLP library was tagged. Sequencing runs produced 399,189 sequence reads from cDNA and 373,484 from genomic fragments, covering close to 250 Mb of sequence in total. Conclusions Both methods used to obtain reduced representations for high-throughput sequencing were successful after several improvements. The protocols may be used for several sequencing applications, such as de novo sequencing, tagged PCR fragments or long fragment sequencing of cDNA.

  6. Development of the adjusted nuclear cross-section library based on JENDL-3.2 for large FBR

    International Nuclear Information System (INIS)

    Yokoyama, Kenji; Ishikawa, Makoto; Numata, Kazuyuki

    1999-04-01

    JNC (and PNC) had developed the adjusted nuclear cross-section library in which the results of the JUPITER experiments were reflected. Using this adjusted library, the distinct improvement of the accuracy in nuclear design of FBR cores had been achieved. As a recent research, JNC develops a database of other integral data in addition to the JUPITER experiments, aiming at further improvement for accuracy and reliability. In 1991, the adjusted library based on JENDL-2, JFS-3-J2 (ADJ91R), was developed, and it has been used on the design research for FBR. As an evaluated nuclear library, however, JENDL-3.2 is recently used. Therefore, the authors developed an adjusted library based on JENDL-3.2 which is called JFS-3-J3.2(ADJ98). It is known that the adjusted library based on JENDL-2 overestimated the sodium void reactivity worth by 10-20%. It is expected that the adjusted library based on JENDL-3.2 solve the problem. The adjusted library JFS-3-J3.2(ADJ98) was produced with the same method as the adjusted library JFS-3-J2(ADJ91R) and used more integral parameters of JUPITER experiments than the adjusted library JFS-3-J2(ADJ91R). This report also describes the design accuracy estimation on a 600 MWe class FBR with the adjusted library JFS-3-J3.2(ADJ98). Its main nuclear design parameters (multiplication factor, burn-up reactivity loss, breeding ratio, etc.) except the sodium void reactivity worth which are calculated with the adjusted library JFS-3-J3.2(ADJ98) are almost the same as those predicted with JFS-3-J2(ADJ91R). As for the sodium void reactivity, the adjusted library JFS-3-J3.2(ADJ98) estimates about 4% smaller than the JFS-3-J2(ADJ91R) because of the change of the basic nuclear library from JENDL-2 to JENDL-3.2. (author)

  7. A stochastic de novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus

    NARCIS (Netherlands)

    Bucur, Doina

    2017-01-01

    A genetic algorithm with stochastic macro mutation operators which merge, split, move, reverse and align DNA contigs on a scaffold is shown to accurately and consistently assemble raw DNA reads from an accurately sequenced single-read library into a contiguous genome. A candidate solution is a

  8. D-GENIES: dot plot large genomes in an interactive, efficient and simple way.

    Science.gov (United States)

    Cabanettes, Floréal; Klopp, Christophe

    2018-01-01

    Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.

  9. Genomics Portals: integrative web-platform for mining genomics data

    Directory of Open Access Journals (Sweden)

    Ghosh Krishnendu

    2010-01-01

    Full Text Available Abstract Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc, and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  10. Nursing educator's satisfaction with library facilities.

    OpenAIRE

    Lenz, E R; Waltz, C F

    1982-01-01

    This study examined nursing faculty perceptions of the importance of adequate library facilities and their satisfaction with them. Library facilities ranked highest in importance among all job characteristics studied, with faculty who had been most productive in terms of publication assigning the highest value to them. A moderate level of satisfaction was found. Faculty most satisfied with library facilities were those teaching in large schools of nursing with graduate programs and open organ...

  11. Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines.

    Directory of Open Access Journals (Sweden)

    Mark Eppinger

    2006-07-01

    Full Text Available Helicobacter pylori infection of humans is so old that its population genetic structure reflects that of ancient human migrations. A closely related species, Helicobacter acinonychis, is specific for large felines, including cheetahs, lions, and tigers, whereas hosts more closely related to humans harbor more distantly related Helicobacter species. This observation suggests a jump between host species. But who ate whom and when did it happen? In order to resolve this question, we determined the genomic sequence of H. acinonychis strain Sheeba and compared it to genomes from H. pylori. The conserved core genes between the genomes are so similar that the host jump probably occurred within the last 200,000 (range 50,000-400,000 years. However, the Sheeba genome also possesses unique features that indicate the direction of the host jump, namely from early humans to cats. Sheeba possesses an unusually large number of highly fragmented genes, many encoding outer membrane proteins, which may have been destroyed in order to bypass deleterious responses from the feline host immune system. In addition, the few Sheeba-specific genes that were found include a cluster of genes encoding sialylation of the bacterial cell surface carbohydrates, which were imported by horizontal genetic exchange and might also help to evade host immune defenses. These results provide a genomic basis for elucidating molecular events that allow bacteria to adapt to novel animal hosts.

  12. Genome-wide mapping of autonomous promoter activity in human cells.

    Science.gov (United States)

    van Arensbergen, Joris; FitzPatrick, Vincent D; de Haas, Marcel; Pagie, Ludo; Sluimer, Jasper; Bussemaker, Harmen J; van Steensel, Bas

    2017-02-01

    Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of the sequences that could be tested. Here we present 'survey of regulatory elements' (SuRE), a method that assays more than 10 8 DNA fragments, each 0.2-2 kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library of random genomic fragments upstream of a 20-bp barcode is constructed, and decoded by paired-end sequencing. This library is used to transfect cells, and barcodes in transcribed RNA are quantified by high-throughput sequencing. When applied to the human genome, we achieve 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide in K562 cells. By computational modeling we delineate subregions within promoters that are relevant for their activity. We show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites.

  13. Genomics With Cloud Computing

    Directory of Open Access Journals (Sweden)

    Sukhamrit Kaur

    2015-04-01

    Full Text Available Abstract Genomics is study of genome which provides large amount of data for which large storage and computation power is needed. These issues are solved by cloud computing that provides various cloud platforms for genomics. These platforms provides many services to user like easy access to data easy sharing and transfer providing storage in hundreds of terabytes more computational power. Some cloud platforms are Google genomics DNAnexus and Globus genomics. Various features of cloud computing to genomics are like easy access and sharing of data security of data less cost to pay for resources but still there are some demerits like large time needed to transfer data less network bandwidth.

  14. A novel genome-wide microsatellite resource for species of Eucalyptus with linkage-to-physical correspondence on the reference genome sequence.

    Science.gov (United States)

    Grattapaglia, Dario; Mamani, Eva M C; Silva-Junior, Orzenil B; Faria, Danielle A

    2015-03-01

    Keystone species in their native ranges, eucalypts, are ecologically and genetically very diverse, growing naturally along extensive latitudinal and altitudinal ranges and variable environments. Besides their ecological importance, eucalypts are also the most widely planted trees for sustainable forestry in the world. We report the development of a novel collection of 535 microsatellites for species of Eucalyptus, 494 designed from ESTs and 41 from genomic libraries. A selected subset of 223 was evaluated for individual identification, parentage testing, and ancestral information content in the two most extensively studied species, Eucalyptus grandis and Eucalyptus globulus. Microsatellites showed high transferability and overlapping allele size range, suggesting they have arisen still in their common ancestor and confirming the extensive genome conservation between these two species. A consensus linkage map with 437 microsatellites, the most comprehensive microsatellite-only genetic map for Eucalyptus, was built by assembling segregation data from three mapping populations and anchored to the Eucalyptus genome. An overall colinearity between recombination-based and physical positioning of 84% of the mapped microsatellites was observed, with some ordering discrepancies and sporadic locus duplications, consistent with the recently described whole genome duplication events in Eucalyptus. The linkage map covered 95.2% of the 605.8-Mbp assembled genome sequence, placing one microsatellite every 1.55 Mbp on average, and an overall estimate of physical to recombination distance of 618 kbp/cM. The genetic parameters estimates together with linkage and physical position data for this large set of microsatellites should assist marker choice for genome-wide population genetics and comparative mapping in Eucalyptus. © 2014 John Wiley & Sons Ltd.

  15. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  16. Supports for libraries'restoration from the Great East Japan Earthquake : Challenges we address at Miyagi Prefectural Library

    Science.gov (United States)

    Kumagai, Shinichiro

    This article overviews the situations of damage and reconstruction of mainly public libraries in Miyagi Prefecture about 9 months after the Great East Japan Earthquake. Serious damage of library buildings was due not only to the tsunami or seismic sea wave but to violent shaking, the latter less reported by the media. We at the Miyagi Prefectural Library implemented reconstruction assistance for regional public libraries in both direct and indirect ways. Among them, we report in detail on the support we offered until the Minami-sanriku Town Library reopened its service. We highlight a prefectural library's role, acting between supporters and those supportees, to consider the necessity of middle organizations. We clarify what challenges we face and examine how best to provide assistance in case of large-scale disasters.

  17. The Amaranth Genome: Genome, Transcriptome, and Physical Map Assembly

    Directory of Open Access Journals (Sweden)

    J. W. Clouse

    2016-03-01

    Full Text Available Amaranth ( L. is an emerging pseudocereal native to the New World that has garnered increased attention in recent years because of its nutritional quality, in particular its seed protein and more specifically its high levels of the essential amino acid lysine. It belongs to the Amaranthaceae family, is an ancient paleopolyploid that shows disomic inheritance (2 = 32, and has an estimated genome size of 466 Mb. Here we present a high-quality draft genome sequence of the grain amaranth. The genome assembly consisted of 377 Mb in 3518 scaffolds with an N of 371 kb. Repetitive element analysis predicted that 48% of the genome is comprised of repeat sequences, of which -like elements were the most commonly classified retrotransposon. A de novo transcriptome consisting of 66,370 contigs was assembled from eight different amaranth tissue and abiotic stress libraries. Annotation of the genome identified 23,059 protein-coding genes. Seven grain amaranths (, , and and their putative progenitor ( were resequenced. A single nucleotide polymorphism (SNP phylogeny supported the classification of as the progenitor species of the grain amaranths. Lastly, we generated a de novo physical map for using the BioNano Genomics’ Genome Mapping platform. The physical map spanned 340 Mb and a hybrid assembly using the BioNano physical maps nearly doubled the N of the assembly to 697 kb. Moreover, we analyzed synteny between amaranth and sugar beet ( L. and estimated, using analysis, the age of the most recent polyploidization event in amaranth.

  18. The laughing librarian a history of American library humor

    CERN Document Server

    Smith, Jeanette C

    2012-01-01

    ""Should be required reading for all librarians and library-school students""--Booklist; ""a must have...recommend""--Library History Buff Blog; ""charts the largely unexplored territory of library wit and satire, both inside and outside the profession""--C&RL News.

  19. Systematic Dissection of Sequence Elements Controlling σ70 Promoters Using a Genomically-Encoded Multiplexed Reporter Assay in E. coli.

    Science.gov (United States)

    Urtecho, Guillaume; Tripp, Arielle D; Insigne, Kimberly; Kim, Hwangbeom; Kosuri, Sriram

    2018-02-01

    Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In E. coli , decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange (RMCE) system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10,898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our dataset using a simple log-linear statistical model. Neural network models can explain greater than 95% of the variance in our dataset, and show the increased power is due to nonlinear interactions of other elements such as the spacer, background, and UP elements.

  20. Twenty years of artificial directional selection have shaped the genome of the Italian Large White pig breed.

    Science.gov (United States)

    Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L

    2016-04-01

    In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni  selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population. © 2015 Stichting International Foundation for Animal Genetics.

  1. Functional assessment of human enhancer activities using whole-genome STARR-sequencing.

    Science.gov (United States)

    Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P

    2017-11-20

    Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome.  In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.

  2. Genomics With Cloud Computing

    OpenAIRE

    Sukhamrit Kaur; Sandeep Kaur

    2015-01-01

    Abstract Genomics is study of genome which provides large amount of data for which large storage and computation power is needed. These issues are solved by cloud computing that provides various cloud platforms for genomics. These platforms provides many services to user like easy access to data easy sharing and transfer providing storage in hundreds of terabytes more computational power. Some cloud platforms are Google genomics DNAnexus and Globus genomics. Various features of cloud computin...

  3. Quantitative linkage genome scan for atopy in a large collection of Caucasian families

    DEFF Research Database (Denmark)

    Webb, BT; van den Oord, E; Akkari, A

    2007-01-01

    adulthood, asthma is frequently associated also with quantitative measures of atopy. Genome wide quantitative multipoint linkage analysis was conducted for serum IgE levels and percentage of positive skin prick test (SPT(per)) using three large groups of families originally ascertained for asthma....... In this report, 438 and 429 asthma families were informative for linkage using IgE and SPT(per) which represents 690 independent families. Suggestive linkage (LOD >/= 2) was found on chromosomes 1, 3, and 8q with maximum LODs of 2.34 (IgE), 2.03 (SPT(per)), and 2.25 (IgE) near markers D1S1653, D3S2322-D3S1764...... represents one of the biggest genome scans so far reported for asthma related phenotypes. This study also demonstrates the utility of increased sample sizes and quantitative phenotypes in linkage analysis of complex disorders....

  4. Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada1

    Science.gov (United States)

    Kuzmina, Maria L.; Braukmann, Thomas W. A.; Fazekas, Aron J.; Graham, Sean W.; Dewaard, Stephanie L.; Rodrigues, Anuar; Bennett, Bruce A.; Dickinson, Timothy A.; Saarela, Jeffery M.; Catling, Paul M.; Newmaster, Steven G.; Percy, Diana M.; Fenneman, Erin; Lauron-Moreau, Aurélien; Ford, Bruce; Gillespie, Lynn; Subramanyam, Ragupathy; Whitton, Jeannette; Jennings, Linda; Metsger, Deborah; Warne, Connor P.; Brown, Allison; Sears, Elizabeth; Dewaard, Jeremy R.; Zakharov, Evgeny V.; Hebert, Paul D. N.

    2017-01-01

    Premise of the study: Constructing complete, accurate plant DNA barcode reference libraries can be logistically challenging for large-scale floras. Here we demonstrate the promise and challenges of using herbarium collections for building a DNA barcode reference library for the vascular plant flora of Canada. Methods: Our study examined 20,816 specimens representing 5076 of 5190 vascular plant species in Canada (98%). For 98% of the specimens, at least one of the DNA barcode regions was recovered from the plastid loci rbcL and matK and from the nuclear ITS2 region. We used beta regression to quantify the effects of age, type of preservation, and taxonomic affiliation (family) on DNA sequence recovery. Results: Specimen age and method of preservation had significant effects on sequence recovery for all markers, but influenced some families more (e.g., Boraginaceae) than others (e.g., Asteraceae). Discussion: Our DNA barcode library represents an unparalleled resource for metagenomic and ecological genetic research working on temperate and arctic biomes. An observed decline in sequence recovery with specimen age may be associated with poor primer matches, intragenomic variation (for ITS2), or inhibitory secondary compounds in some taxa. PMID:29299394

  5. Swabs to genomes: a comprehensive workflow

    Directory of Open Access Journals (Sweden)

    Madison I. Dunitz

    2015-05-01

    Full Text Available The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.

  6. Comparative genomic data of the Avian Phylogenomics Project.

    Science.gov (United States)

    Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

    2014-01-01

    The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of

  7. Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms.

    Science.gov (United States)

    Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

    2018-06-01

    Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.

  8. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    Science.gov (United States)

    Christen, Matthias; Del Medico, Luca; Christen, Heinz; Christen, Beat

    2017-01-01

    Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  9. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    Directory of Open Access Journals (Sweden)

    Matthias Christen

    Full Text Available Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  10. First Insights into the Large Genome of Epimedium sagittatum (Sieb. et Zucc Maxim, a Chinese Traditional Medicinal Plant

    Directory of Open Access Journals (Sweden)

    Gong Xiao

    2013-06-01

    Full Text Available Epimedium sagittatum (Sieb. et Zucc Maxim is a member of the Berberidaceae family of basal eudicot plants, widely distributed and used as a traditional medicinal plant in China for therapeutic effects on many diseases with a long history. Recent data shows that E. sagittatum has a relatively large genome, with a haploid genome size of ~4496 Mbp, divided into a small number of only 12 diploid chromosomes (2n = 2x = 12. However, little is known about Epimedium genome structure and composition. Here we present the analysis of 691 kb of high-quality genomic sequence derived from 672 randomly selected plasmid clones of E. sagittatum genomic DNA, representing ~0.0154% of the genome. The sampled sequences comprised at least 78.41% repetitive DNA elements and 2.51% confirmed annotated gene sequences, with a total GC% content of 39%. Retrotransposons represented the major class of transposable element (TE repeats identified (65.37% of all TE repeats, particularly LTR (Long Terminal Repeat retrotransposons (52.27% of all TE repeats. Chromosome analysis and Fluorescence in situ Hybridization of Gypsy-Ty3 retrotransposons were performed to survey the E. sagittatum genome at the cytological level. Our data provide the first insights into the composition and structure of the E. sagittatum genome, and will facilitate the functional genomic analysis of this valuable medicinal plant.

  11. First Insights into the Large Genome of Epimedium sagittatum (Sieb. et Zucc) Maxim, a Chinese Traditional Medicinal Plant

    Science.gov (United States)

    Liu, Di; Zeng, Shao-Hua; Chen, Jian-Jun; Zhang, Yan-Jun; Xiao, Gong; Zhu, Lin-Yao; Wang, Ying

    2013-01-01

    Epimedium sagittatum (Sieb. et Zucc) Maxim is a member of the Berberidaceae family of basal eudicot plants, widely distributed and used as a traditional medicinal plant in China for therapeutic effects on many diseases with a long history. Recent data shows that E. sagittatum has a relatively large genome, with a haploid genome size of ~4496 Mbp, divided into a small number of only 12 diploid chromosomes (2n = 2x = 12). However, little is known about Epimedium genome structure and composition. Here we present the analysis of 691 kb of high-quality genomic sequence derived from 672 randomly selected plasmid clones of E. sagittatum genomic DNA, representing ~0.0154% of the genome. The sampled sequences comprised at least 78.41% repetitive DNA elements and 2.51% confirmed annotated gene sequences, with a total GC% content of 39%. Retrotransposons represented the major class of transposable element (TE) repeats identified (65.37% of all TE repeats), particularly LTR (Long Terminal Repeat) retrotransposons (52.27% of all TE repeats). Chromosome analysis and Fluorescence in situ Hybridization of Gypsy-Ty3 retrotransposons were performed to survey the E. sagittatum genome at the cytological level. Our data provide the first insights into the composition and structure of the E. sagittatum genome, and will facilitate the functional genomic analysis of this valuable medicinal plant. PMID:23807511

  12. How Users Search the Library from a Single Search Box

    Science.gov (United States)

    Lown, Cory; Sierra, Tito; Boyer, Josh

    2013-01-01

    Academic libraries are turning increasingly to unified search solutions to simplify search and discovery of library resources. Unfortunately, very little research has been published on library user search behavior in single search box environments. This study examines how users search a large public university library using a prominent, single…

  13. Analysis of 16S libraries of mouse gastrointestinal microflora reveals a large new group of mouse intestinal bacteria

    NARCIS (Netherlands)

    Salzman, NH; de Jong, H; Paterson, Y; Harmsen, HJM; Welling, GW; Bos, NA

    2002-01-01

    Total genomic DNA from samples of intact mouse small intestine, large intestine, caecum and faeces was used as template for PCR amplification of 16S rRNA gene sequences with conserved bacterial primers. Phylogenetic analysis of the amplification products revealed 40 unique 16S rDNA sequences. Of

  14. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    Science.gov (United States)

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Simultaneous identification of long similar substrings in large sets of sequences

    Directory of Open Access Journals (Sweden)

    Wittig Burghardt

    2007-05-01

    Full Text Available Abstract Background Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 Medicago truncatula BAC-size sequences published at http://www.medicago.org/genome/assembly_table.php?chr=1. Conclusion The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps. ClustDB is freely available for academic use.

  16. Comparative genome analysis identifies two large deletions in the genome of highly-passaged attenuated Streptococcus agalactiae strain YM001 compared to the parental pathogenic strain HN016.

    Science.gov (United States)

    Wang, Rui; Li, Liping; Huang, Yan; Luo, Fuguang; Liang, Wanwen; Gan, Xi; Huang, Ting; Lei, Aiying; Chen, Ming; Chen, Lianfu

    2015-11-04

    Streptococcus agalactiae (S. agalactiae), also known as group B Streptococcus (GBS), is an important pathogen for neonatal pneumonia, meningitis, bovine mastitis, and fish meningoencephalitis. The global outbreaks of Streptococcus disease in tilapia cause huge economic losses and threaten human food hygiene safety as well. To investigate the mechanism of S. agalactiae pathogenesis in tilapia and develop attenuated S. agalactiae vaccine, this study sequenced and comparatively analyzed the whole genomes of virulent wild-type S. agalactiae strain HN016 and its highly-passaged attenuated strain YM001 derived from tilapia. We performed Illumina sequencing of DNA prepared from strain HN016 and YM001. Sequencedreads were assembled and nucleotide comparisons, single nucleotide polymorphism (SNP) , indels were analyzed between the draft genomes of HN016 and YM001. Clustered regularly interspaced short palindromic repeats (CRISPRs) and prophage were detected and analyzed in different S. agalactiae strains. The genome of S. agalactiae YM001 was 2,047,957 bp with a GC content of 35.61 %; it contained 2044 genes and 88 RNAs. Meanwhile, the genome of S. agalactiae HN016 was 2,064,722 bp with a GC content of 35.66 %; it had 2063 genes and 101 RNAs. Comparative genome analysis indicated that compared with HN016, YM001 genome had two significant large deletions, at the sizes of 5832 and 11,116 bp respectively, resulting in the deletion of three rRNA and ten tRNA genes, as well as the deletion and functional damage of ten genes related to metabolism, transport, growth, anti-stress, etc. Besides these two large deletions, other ten deletions and 28 single nucleotide variations (SNVs) were also identified, mainly affecting the metabolism- and growth-related genes. The genome of attenuated S. agalactiae YM001 showed significant variations, resulting in the deletion of 10 functional genes, compared to the parental pathogenic strain HN016. The deleted and mutated functional genes all

  17. Shoestring Digital Library: If Existing Digital Library Software Doesn't Suit Your Needs, Create Your Own

    Science.gov (United States)

    Weber, Jonathan

    2006-01-01

    Creating a digital library might seem like a task best left to a large research collection with a vast staff and generous budget. However, tools for successfully creating digital libraries are getting easier to use all the time. The explosion of people creating content for the web has led to the availability of many high-quality applications and…

  18. How libraries use publisher metadata

    Directory of Open Access Journals (Sweden)

    Steve Shadle

    2013-11-01

    Full Text Available With the proliferation of electronic publishing, libraries are increasingly relying on publisher-supplied metadata to meet user needs for discovery in library systems. However, many publisher/content provider staff creating metadata are unaware of the end-user environment and how libraries use their metadata. This article provides an overview of the three primary discovery systems that are used by academic libraries, with examples illustrating how publisher-supplied metadata directly feeds into these systems and is used to support end-user discovery and access. Commonly seen metadata problems are discussed, with recommendations suggested. Based on a series of presentations given in Autumn 2012 to the staff of a large publisher, this article uses the University of Washington Libraries systems and services as illustrative examples. Judging by the feedback received from these presentations, publishers (specifically staff not familiar with the big picture of metadata standards work would benefit from a better understanding of the systems and services libraries provide using the data that is created and managed by publishers.

  19. Multiplex engineering of industrial yeast genomes using CRISPRm.

    Science.gov (United States)

    Ryan, Owen W; Cate, Jamie H D

    2014-01-01

    Global demand has driven the use of industrial strains of the yeast Saccharomyces cerevisiae for large-scale production of biofuels and renewable chemicals. However, the genetic basis of desired domestication traits is poorly understood because robust genetic tools do not exist for industrial hosts. We present an efficient, marker-free, high-throughput, and multiplexed genome editing platform for industrial strains of S. cerevisiae that uses plasmid-based expression of the CRISPR/Cas9 endonuclease and multiple ribozyme-protected single guide RNAs. With this multiplex CRISPR (CRISPRm) system, it is possible to integrate DNA libraries into the chromosome for evolution experiments, and to engineer multiple loci simultaneously. The CRISPRm tools should therefore find use in many higher-order synthetic biology applications to accelerate improvements in industrial microorganisms.

  20. Adiabatic quantum-flux-parametron cell library adopting minimalist design

    Energy Technology Data Exchange (ETDEWEB)

    Takeuchi, Naoki, E-mail: takeuchi-naoki-kx@ynu.jp [Institute of Advanced Sciences, Yokohama National University, 79-5 Tokiwadai, Hodogaya, Yokohama 240-8501 (Japan); Yamanashi, Yuki; Yoshikawa, Nobuyuki [Institute of Advanced Sciences, Yokohama National University, 79-5 Tokiwadai, Hodogaya, Yokohama 240-8501 (Japan); Department of Electrical and Computer Engineering, Yokohama National University, 79-5 Tokiwadai, Hodogaya, Yokohama 240-8501 (Japan)

    2015-05-07

    We herein build an adiabatic quantum-flux-parametron (AQFP) cell library adopting minimalist design and a symmetric layout. In the proposed minimalist design, every logic cell is designed by arraying four types of building block cells: buffer, NOT, constant, and branch cells. Therefore, minimalist design enables us to effectively build and customize an AQFP cell library. The symmetric layout reduces unwanted parasitic magnetic coupling and ensures a large mutual inductance in an output transformer, which enables very long wiring between logic cells. We design and fabricate several logic circuits using the minimal AQFP cell library so as to test logic cells in the library. Moreover, we experimentally investigate the maximum wiring length between logic cells. Finally, we present an experimental demonstration of an 8-bit carry look-ahead adder designed using the minimal AQFP cell library and demonstrate that the proposed cell library is sufficiently robust to realize large-scale digital circuits.

  1. Adiabatic quantum-flux-parametron cell library adopting minimalist design

    International Nuclear Information System (INIS)

    Takeuchi, Naoki; Yamanashi, Yuki; Yoshikawa, Nobuyuki

    2015-01-01

    We herein build an adiabatic quantum-flux-parametron (AQFP) cell library adopting minimalist design and a symmetric layout. In the proposed minimalist design, every logic cell is designed by arraying four types of building block cells: buffer, NOT, constant, and branch cells. Therefore, minimalist design enables us to effectively build and customize an AQFP cell library. The symmetric layout reduces unwanted parasitic magnetic coupling and ensures a large mutual inductance in an output transformer, which enables very long wiring between logic cells. We design and fabricate several logic circuits using the minimal AQFP cell library so as to test logic cells in the library. Moreover, we experimentally investigate the maximum wiring length between logic cells. Finally, we present an experimental demonstration of an 8-bit carry look-ahead adder designed using the minimal AQFP cell library and demonstrate that the proposed cell library is sufficiently robust to realize large-scale digital circuits

  2. Transposon mutagenesis in Bifidobacterium breve: construction and characterization of a Tn5 transposon mutant library for Bifidobacterium breve UCC2003.

    Science.gov (United States)

    Ruiz, Lorena; Motherway, Mary O'Connell; Lanigan, Noreen; van Sinderen, Douwe

    2013-01-01

    Bifidobacteria are claimed to contribute positively to human health through a range of beneficial or probiotic activities, including amelioration of gastrointestinal and metabolic disorders, and therefore this particular group of gastrointestinal commensals has enjoyed increasing industrial and scientific attention in recent years. However, the molecular mechanisms underlying these probiotic mechanisms are still largely unknown, mainly due to the fact that molecular tools for bifidobacteria are rather poorly developed, with many strains lacking genetic accessibility. In this work, we describe the generation of transposon insertion mutants in two bifidobacterial strains, B. breve UCC2003 and B. breve NCFB2258. We also report the creation of the first transposon mutant library in a bifidobacterial strain, employing B. breve UCC2003 and a Tn5-based transposome strategy. The library was found to be composed of clones containing single transposon insertions which appear to be randomly distributed along the genome. The usefulness of the library to perform phenotypic screenings was confirmed through identification and analysis of mutants defective in D-galactose, D-lactose or pullulan utilization abilities.

  3. Transposon mutagenesis in Bifidobacterium breve: construction and characterization of a Tn5 transposon mutant library for Bifidobacterium breve UCC2003.

    Directory of Open Access Journals (Sweden)

    Lorena Ruiz

    Full Text Available Bifidobacteria are claimed to contribute positively to human health through a range of beneficial or probiotic activities, including amelioration of gastrointestinal and metabolic disorders, and therefore this particular group of gastrointestinal commensals has enjoyed increasing industrial and scientific attention in recent years. However, the molecular mechanisms underlying these probiotic mechanisms are still largely unknown, mainly due to the fact that molecular tools for bifidobacteria are rather poorly developed, with many strains lacking genetic accessibility. In this work, we describe the generation of transposon insertion mutants in two bifidobacterial strains, B. breve UCC2003 and B. breve NCFB2258. We also report the creation of the first transposon mutant library in a bifidobacterial strain, employing B. breve UCC2003 and a Tn5-based transposome strategy. The library was found to be composed of clones containing single transposon insertions which appear to be randomly distributed along the genome. The usefulness of the library to perform phenotypic screenings was confirmed through identification and analysis of mutants defective in D-galactose, D-lactose or pullulan utilization abilities.

  4. Security Risks Management in Selected Academic Libraries in Osun ...

    African Journals Online (AJOL)

    The survival of a library depends to a large extent on how secured its collections are. Security of collections constitutes a critical challenge facing academic libraries in Nigeria. It is against this background that this study investigated the security risks management in selected academic libraries in Osun State, Nigeria.

  5. Expression sequence tag library derived from peripheral blood mononuclear cells of the chlorocebus sabaeus

    Directory of Open Access Journals (Sweden)

    Tchitchek Nicolas

    2012-06-01

    Full Text Available Abstract Background African Green Monkeys (AGM are amongst the most frequently used nonhuman primate models in clinical and biomedical research, nevertheless only few genomic resources exist for this species. Such information would be essential for the development of dedicated new generation technologies in fundamental and pre-clinical research using this model, and would deliver new insights into primate evolution. Results We have exhaustively sequenced an Expression Sequence Tag (EST library made from a pool of Peripheral Blood Mononuclear Cells from sixteen Chlorocebus sabaeus monkeys. Twelve of them were infected with the Simian Immunodeficiency Virus. The mononuclear cells were or not stimulated in vitro with Concanavalin A, with lipopolysacharrides, or through mixed lymphocyte reaction in order to generate a representative and broad library of expressed sequences in immune cells. We report here 37,787 sequences, which were assembled into 14,410 contigs representing an estimated 12% of the C. sabaeus transcriptome. Using data from primate genome databases, 9,029 assembled sequences from C. sabaeus could be annotated. Sequences have been systematically aligned with ten cDNA references of primate species including Homo sapiens, Pan troglodytes, and Macaca mulatta to identify ortholog transcripts. For 506 transcripts, sequences were quasi-complete. In addition, 6,576 transcript fragments are potentially specific to the C. sabaeus or corresponding to not yet described primate genes. Conclusions The EST library we provide here will prove useful in gene annotation efforts for future sequencing of the African Green Monkey genomes. Furthermore, this library, which particularly well represents immunological and hematological gene expression, will be an important resource for the comparative analysis of gene expression in clinically relevant nonhuman primate and human research.

  6. About the Library - Betty Petersen Memorial Library

    Science.gov (United States)

    branch library of the NOAA Central Library. The library serves the NOAA Science Center in Camp Springs , Maryland. History and Mission: Betty Petersen Memorial Library began as a reading room in the NOAA Science Science Center staff and advises the library on all aspects of the library program. Library Newsletters

  7. Service Quality: An Unobtrusive Investigation of Interlibrary Loan in Large Public Libraries in Canada.

    Science.gov (United States)

    Hebert, Francoise

    1994-01-01

    Describes a study that investigated the quality of interlibrary loan services in Canadian public libraries from the library's and the user's perspectives and then compared results. Measures of interlibrary loan performance are reviewed; an alternative conceptualization of service quality is discussed; and SERVQUAL, a measure of service quality, is…

  8. BLAST Ring Image Generator (BRIG: simple prokaryote genome comparisons

    Directory of Open Access Journals (Sweden)

    Beatson Scott A

    2011-08-01

    Full Text Available Abstract Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons

  9. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.

    Science.gov (United States)

    Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A

    2011-08-08

    Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user

  10. America's Star Libraries, 2010: Top-Rated Libraries

    Science.gov (United States)

    Lyons, Ray; Lance, Keith Curry

    2010-01-01

    The "LJ" Index of Public Library Service 2010, "Library Journal"'s national rating of public libraries, identifies 258 "star" libraries. Created by Ray Lyons and Keith Curry Lance, and based on 2008 data from the IMLS, it rates 7,407 public libraries. The top libraries in each group get five, four, or three stars. All included libraries, stars or…

  11. GIGGLE: a search engine for large-scale integrated genome analysis.

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-02-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  12. Featured Library: Parrish Library

    OpenAIRE

    Kirkwood, Hal P, Jr

    2015-01-01

    The Roland G. Parrish Library of Management & Economics is located within the Krannert School of Management at Purdue University. Between 2005 - 2007 work was completed on a white paper that focused on a student-centered vision for the Management & Economics Library. The next step was a massive collection reduction and a re-envisioning of both the services and space of the library. Thus began a 3 phase renovation from a 2 floor standard, collection-focused library into a single floor, 18,000s...

  13. GIGGLE: a search engine for large-scale integrated genome analysis

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061

  14. Large-scale image-based profiling of single-cell phenotypes in arrayed CRISPR-Cas9 gene perturbation screens.

    Science.gov (United States)

    de Groot, Reinoud; Lüthi, Joel; Lindsay, Helen; Holtackers, René; Pelkmans, Lucas

    2018-01-23

    High-content imaging using automated microscopy and computer vision allows multivariate profiling of single-cell phenotypes. Here, we present methods for the application of the CISPR-Cas9 system in large-scale, image-based, gene perturbation experiments. We show that CRISPR-Cas9-mediated gene perturbation can be achieved in human tissue culture cells in a timeframe that is compatible with image-based phenotyping. We developed a pipeline to construct a large-scale arrayed library of 2,281 sequence-verified CRISPR-Cas9 targeting plasmids and profiled this library for genes affecting cellular morphology and the subcellular localization of components of the nuclear pore complex (NPC). We conceived a machine-learning method that harnesses genetic heterogeneity to score gene perturbations and identify phenotypically perturbed cells for in-depth characterization of gene perturbation effects. This approach enables genome-scale image-based multivariate gene perturbation profiling using CRISPR-Cas9. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.

  15. Isolation of Specific Clones from Nonarrayed BAC Libraries through Homologous Recombination

    Directory of Open Access Journals (Sweden)

    Mikhail Nefedov

    2011-01-01

    Full Text Available We have developed a new approach to screen bacterial artificial chromosome (BAC libraries by recombination selection. To test this method, we constructed an orangutan BAC library using an E. coli strain (DY380 with temperature inducible homologous recombination (HR capability. We amplified one library segment, induced HR at 42∘C to make it recombination proficient, and prepared electrocompetent cells for transformation with a kanamycin cassette to target sequences in the orangutan genome through terminal recombineering homologies. Kanamycin-resistant colonies were tested for the presence of BACs containing the targeted genes by the use of a PCR-assay to confirm the presence of the kanamycin insertion. The results indicate that this is an effective approach for screening clones. The advantage of recombination screening is that it avoids the high costs associated with the preparation, screening, and archival storage of arrayed BAC libraries. In addition, the screening can be conceivably combined with genetic engineering to create knockout and reporter constructs for functional studies.

  16. Primary structure of the human follistatin precursor and its genomic organization

    International Nuclear Information System (INIS)

    Shimasaki, Shunichi; Koga, Makoto; Esch, F.

    1988-01-01

    Follistatin is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release. By use of the recently characterized porcine follistatin cDNA as a probe to screen a human testis cDNA library and a genomic library, the structure of the complete human follistatin precursor as well as its genomic organization have been determined. Three of eight cDNA clones that were sequenced predicted a precursor with 344 amino acids, whereas the remaining five cDNA clones encoded a 317 amino acid precursor, resulting from alternative splicing of the precursor mRNA. Mature follistatins contain four contiguous domains that are encoded by precisely separated exons; three of the domains are highly similar to each other, as well as to human epidermal growth factor and human pancreatic secretory trypsin inhibitor. The genomic organization of the human follistatin is similar to that of the human epidermal growth factor gene and thus supports the notion of exon shuffling during evolution

  17. Multigroup cross section library; WIMS library

    International Nuclear Information System (INIS)

    Kannan, Umasankari

    2000-01-01

    The WIMS library has been extensively used in thermal reactor calculations. This multigroup constants library was originally developed from the UKNDL in the late 60's and has been updated in 1986. This library has been distributed with the WIMS-D code by NEA data bank. The references to WIMS library in literature are the 'old' which is the original as developed by the AEA Winfrith and the 'new' which is the current 1986 WIMS library. IAEA has organised a CRP where a new and fully updated WIMS library will soon be available. This paper gives an overview of the definitions of the group constants that go into any basic nuclear data library used for reactor calculations. This paper also outlines the contents of the WIMS library and some of its shortcomings

  18. Evaluation of library preparation methods for Illumina next generation sequencing of small amounts of DNA from foodborne parasites.

    Science.gov (United States)

    Nascimento, Fernanda S; Wei-Pridgeon, Yuping; Arrowood, Michael J; Moss, Delynn; da Silva, Alexandre J; Talundzic, Eldin; Qvarnstrom, Yvonne

    2016-11-01

    Illumina library preparation methods for ultra-low input amounts were compared using genomic DNA from two foodborne parasites (Angiostrongylus cantonensis and Cyclospora cayetanensis) as examples. The Ovation Ultralow method resulted in libraries with the highest concentration and produced quality sequencing data, even when the input DNA was in the picogram range. Published by Elsevier B.V.

  19. HGVA: the Human Genome Variation Archive.

    Science.gov (United States)

    Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gräf, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio

    2017-07-03

    High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic data for key reference projects in a clean, fast and integrated fashion. HGVA provides an efficient and intuitive web-interface for easy data mining, a comprehensive RESTful API and client libraries in Python, Java and JavaScript for fast programmatic access to its knowledge base. HGVA calculates population frequencies for these projects and enriches their data with variant annotation provided by CellBase, a rich and fast annotation solution. HGVA serves as a proof-of-concept of the genome analysis developments being carried out by the University of Cambridge together with UK's 100 000 genomes project and the National Institute for Health Research BioResource Rare-Diseases, in particular, deploying open-source for Computational Biology (OpenCB) software platform for storing and analyzing massive genomic datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Wheat EST resources for functional genomics of abiotic stress

    Directory of Open Access Journals (Sweden)

    Links Matthew G

    2006-06-01

    Full Text Available Abstract Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets. Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in

  1. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  2. Advances in genome-wide RNAi cellular screens: a case study using the Drosophila JAK/STAT pathway

    Science.gov (United States)

    2012-01-01

    Background Genome-scale RNA-interference (RNAi) screens are becoming ever more common gene discovery tools. However, whilst every screen identifies interacting genes, less attention has been given to how factors such as library design and post-screening bioinformatics may be effecting the data generated. Results Here we present a new genome-wide RNAi screen of the Drosophila JAK/STAT signalling pathway undertaken in the Sheffield RNAi Screening Facility (SRSF). This screen was carried out using a second-generation, computationally optimised dsRNA library and analysed using current methods and bioinformatic tools. To examine advances in RNAi screening technology, we compare this screen to a biologically very similar screen undertaken in 2005 with a first-generation library. Both screens used the same cell line, reporters and experimental design, with the SRSF screen identifying 42 putative regulators of JAK/STAT signalling, 22 of which verified in a secondary screen and 16 verified with an independent probe design. Following reanalysis of the original screen data, comparisons of the two gene lists allows us to make estimates of false discovery rates in the SRSF data and to conduct an assessment of off-target effects (OTEs) associated with both libraries. We discuss the differences and similarities between the resulting data sets and examine the relative improvements in gene discovery protocols. Conclusions Our work represents one of the first direct comparisons between first- and second-generation libraries and shows that modern library designs together with methodological advances have had a significant influence on genome-scale RNAi screens. PMID:23006893

  3. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely-related bacterial genomes.

  4. McDonald and Company Securities Library User Survey, 1996.

    Science.gov (United States)

    Wolfgram, Derek E.

    The library of McDonald and Company Securities is important to the success of the business and its employees. This study assesses the needs and expectations of the library users, and analyzes how well the current library services are meeting those needs and expectations. A questionnaire was distributed to a large random sample of the firm's…

  5. A complete mitochondrial genome of wheat (Triticum aestivum cv ...

    Indian Academy of Sciences (India)

    role in the development and reproduction of the plant. They occupy a specific ... for biosynthetic pathways relative to their free-living cousins. (Gray et al. 1999; Itoh ... A mitochondrial genome BAC library was constructed fol- lowing a previously ...

  6. Research Guidelines in the Era of Large-scale Collaborations: An Analysis of Genome-wide Association Study Consortia

    Science.gov (United States)

    Austin, Melissa A.; Hair, Marilyn S.; Fullerton, Stephanie M.

    2012-01-01

    Scientific research has shifted from studies conducted by single investigators to the creation of large consortia. Genetic epidemiologists, for example, now collaborate extensively for genome-wide association studies (GWAS). The effect has been a stream of confirmed disease-gene associations. However, effects on human subjects oversight, data-sharing, publication and authorship practices, research organization and productivity, and intellectual property remain to be examined. The aim of this analysis was to identify all research consortia that had published the results of a GWAS analysis since 2005, characterize them, determine which have publicly accessible guidelines for research practices, and summarize the policies in these guidelines. A review of the National Human Genome Research Institute’s Catalog of Published Genome-Wide Association Studies identified 55 GWAS consortia as of April 1, 2011. These consortia were comprised of individual investigators, research centers, studies, or other consortia and studied 48 different diseases or traits. Only 14 (25%) were found to have publicly accessible research guidelines on consortia websites. The available guidelines provide information on organization, governance, and research protocols; half address institutional review board approval. Details of publication, authorship, data-sharing, and intellectual property vary considerably. Wider access to consortia guidelines is needed to establish appropriate research standards with broad applicability to emerging forms of large-scale collaboration. PMID:22491085

  7. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium.

    Science.gov (United States)

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur , amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.

  8. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    Analysis of the gerbera genome DNA ('Raon') general library showed that sequences of (AT), (AG), (AAG) and (AAT) repeats appeared most often, whereas (AC), (AAC) and (ACC) were the least frequent. Primer pairs were designed for 80 loci. Only eight primer pairs produced reproducible polymorphic bands in the 28 ...

  9. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    Science.gov (United States)

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Pre-genomic, genomic and post-genomic study of microbial communities involved in bioenergy.

    Science.gov (United States)

    Rittmann, Bruce E; Krajmalnik-Brown, Rosa; Halden, Rolf U

    2008-08-01

    Microorganisms can produce renewable energy in large quantities and without damaging the environment or disrupting food supply. The microbial communities must be robust and self-stabilizing, and their essential syntrophies must be managed. Pre-genomic, genomic and post-genomic tools can provide crucial information about the structure and function of these microbial communities. Applying these tools will help accelerate the rate at which microbial bioenergy processes move from intriguing science to real-world practice.

  11. Rebuilding the Mackintosh Library collection

    OpenAIRE

    Buri, David

    2015-01-01

    Summarises progress to date in rebuilding the former Mackintosh Library book and journal collections. Focuses on the need to restore the library's holdings of pre-1985 architectural magazines, and describes the large donation of such material donated by the Royal Incorporation of Architects in Scotland (RIAS) in early 2014. Looks at the history of some of these magazines and their contribution to the architectural profession, and invites further donations of material to help rebuild the Macki...

  12. Reconstruction of Oomycete Genome Evolution Identifies Differences in Evolutionary Trajectories Leading to Present-Day Large Gene Families

    NARCIS (Netherlands)

    Seidl, M.F.; Ackerveken, van den G.; Govers, F.; Snel, B.

    2012-01-01

    The taxonomic class of oomycetes contains numerous pathogens of plants and animals but is related to nonpathogenic diatoms and brown algae. Oomycetes have flexible genomes comprising large gene families that play roles in pathogenicity. The evolutionary processes that shaped the gene content have

  13. An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

    Science.gov (United States)

    Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

    2009-01-21

    The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for

  14. Library 3.0 intelligent libraries and apomediation

    CERN Document Server

    Kwanya, Tom; Underwood, Peter

    2015-01-01

    The emerging generation of research and academic library users expect the delivery of user-centered information services. 'Apomediation' refers to the supporting role librarians can give users by stepping in when users need help. Library 3.0 explores the ongoing debates on the "point oh” phenomenon and its impact on service delivery in libraries. This title analyses Library 3.0 and its potential in creating intelligent libraries capable of meeting contemporary needs, and the growing role of librarians as apomediators. Library 3.0 is divided into four chapters. The first chapter introduces and places the topic in context. The second chapter considers "point oh” libraries. The third chapter covers library 3.0 librarianship, while the final chapter explores ways libraries can move towards '3.0'.

  15. SVA retrotransposon insertion-associated deletion represents a novel mutational mechanism underlying large genomic copy number changes with non-recurrent breakpoints

    Science.gov (United States)

    2014-01-01

    Background Genomic disorders are caused by copy number changes that may exhibit recurrent breakpoints processed by nonallelic homologous recombination. However, region-specific disease-associated copy number changes have also been observed which exhibit non-recurrent breakpoints. The mechanisms underlying these non-recurrent copy number changes have not yet been fully elucidated. Results We analyze large NF1 deletions with non-recurrent breakpoints as a model to investigate the full spectrum of causative mechanisms, and observe that they are mediated by various DNA double strand break repair mechanisms, as well as aberrant replication. Further, two of the 17 NF1 deletions with non-recurrent breakpoints, identified in unrelated patients, occur in association with the concomitant insertion of SINE/variable number of tandem repeats/Alu (SVA) retrotransposons at the deletion breakpoints. The respective breakpoints are refractory to analysis by standard breakpoint-spanning PCRs and are only identified by means of optimized PCR protocols designed to amplify across GC-rich sequences. The SVA elements are integrated within SUZ12P intron 8 in both patients, and were mediated by target-primed reverse transcription of SVA mRNA intermediates derived from retrotranspositionally active source elements. Both SVA insertions occurred during early postzygotic development and are uniquely associated with large deletions of 1 Mb and 867 kb, respectively, at the insertion sites. Conclusions Since active SVA elements are abundant in the human genome and the retrotranspositional activity of many SVA source elements is high, SVA insertion-associated large genomic deletions encompassing many hundreds of kilobases could constitute a novel and as yet under-appreciated mechanism underlying large-scale copy number changes in the human genome. PMID:24958239

  16. Managing intellectual capital in libraries beyond the balance sheet

    CERN Document Server

    Kostagiolas, Petros

    2012-01-01

    In the knowledge economy, professionals have to make decisions about non-tangible, non-monetary, and largely invisible resources. Information professionals need to understand the potential uses, contributions, value, structure, and creation of broadly intangible intellectual capital in libraries. In order to fully realize intellectual capital in libraries, new practices and skills are required for library management practitioners and researchers.Managing Intellectual Capital in Libraries provides research advances, guidelines, methods and techniques for managing intellectual capital in a libra

  17. Libraries of Synthetic TALE-Activated Promoters: Methods and Applications.

    Science.gov (United States)

    Schreiber, T; Tissier, A

    2016-01-01

    The discovery of proteins with programmable DNA-binding specificities triggered a whole array of applications in synthetic biology, including genome editing, regulation of transcription, and epigenetic modifications. Among those, transcription activator-like effectors (TALEs) due to their natural function as transcription regulators, are especially well-suited for the development of orthogonal systems for the control of gene expression. We describe here the construction and testing of libraries of synthetic TALE-activated promoters which are under the control of a single TALE with a given DNA-binding specificity. These libraries consist of a fixed DNA-binding element for the TALE, a TATA box, and variable sequences of 19 bases upstream and 43 bases downstream of the DNA-binding element. These libraries were cloned using a Golden Gate cloning strategy making them usable as standard parts in a modular cloning system. The broad range of promoter activities detected and the versatility of these promoter libraries make them valuable tools for applications in the fine-tuning of expression in metabolic engineering projects or in the design and implementation of regulatory circuits. © 2016 Elsevier Inc. All rights reserved.

  18. Working in the Library

    CERN Multimedia

    Maximilien Brice

    2009-01-01

    Head Librarian Jens Vigen seeking information on the first discussions concerning the construction of the Large Hadron Collider in the LEP Tunnel (1984), here assisted by two of the library apprentices, Barbara Veyre and Dina-Elisabeth Bimbu (seated).

  19. Molecular descriptor data explain market prices of a large commercial chemical compound library

    Science.gov (United States)

    Polanski, Jaroslaw; Kucia, Urszula; Duszkiewicz, Roksana; Kurczyk, Agata; Magdziarz, Tomasz; Gasteiger, Johann

    2016-06-01

    The relationship between the structure and a property of a chemical compound is an essential concept in chemistry guiding, for example, drug design. Actually, however, we need economic considerations to fully understand the fate of drugs on the market. We are performing here for the first time the exploration of quantitative structure-economy relationships (QSER) for a large dataset of a commercial building block library of over 2.2 million chemicals. This investigation provided molecular statistics that shows that on average what we are paying for is the quantity of matter. On the other side, the influence of synthetic availability scores is also revealed. Finally, we are buying substances by looking at the molecular graphs or molecular formulas. Thus, those molecules that have a higher number of atoms look more attractive and are, on average, also more expensive. Our study shows how data binning could be used as an informative method when analyzing big data in chemistry.

  20. Croatian library leaders’ views on (their library quality

    Directory of Open Access Journals (Sweden)

    Kornelija Petr Balog

    2014-04-01

    Full Text Available The purpose of this paper is to determine and describe the library culture in Croatian public libraries. Semi-structured interviews with 14 library directors (ten public and four academic were conducted. The tentative discussion topics were: definition of quality, responsibility for quality, satisfaction with library services, familiarization with user perspective of library and librarians, monitoring of user expectations and opinions. These interviews incorporate some of the findings of the project Evaluation of library and information services: public and academic libraries. The project investigates library culture in Croatian public and academic libraries and their preparedness for activities of performance measurement. The interviews reveal that library culture has changed positively in the past few years and that library leaders have positive attitude towards quality and evaluation activities. Library culture in Croatian libraries is a relatively new concept and as such was not actively developed and/or created. This article looks into the library culture of Croatian libraries, but at the same time investigates whether there is any trace of culture of assessment in them. Also, this article brings the latest update on views, opinions and atmosphere in Croatian public and academic libraries.

  1. News from the Library

    CERN Multimedia

    CERN Library

    2010-01-01

    Even more books available electronically!   For several years now, the Library has been offering a large collection of electronic books in a wide range of disciplines. The books can be accessed by all CERN users with a Nice account and, like printed books, can be borrowed for a given period. In a few clicks of the mouse, you can leaf through and read books and even print parts of them from your computer. The Library catalogue now comprises a total of more than 10,000 different e-books. The long-awaited electronic versions of O'Reilly book titles are now available: 70 titles have recently been added to the Library's collection and many others will follow in the coming weeks. This collection of books, mainly on IT subjects, is widely used in the development field. Their availability on line is thus a clear bonus. But there's no need for fans of paper versions to worry: the Library will continue to expand its collection of printed books. The two collections exist side by side and even complement ea...

  2. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    Directory of Open Access Journals (Sweden)

    Johanna Rhodes

    Full Text Available The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina, along with two new kits: the TruSeq Nano DNA kit (Illumina and the NEBNext Ultra DNA kit (New England Biolabs to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality being considered when ultimately deciding on which library prep method to use.

  3. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    Science.gov (United States)

    Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C

    2014-01-01

    The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.

  4. Math Branding in a Community College Library

    Science.gov (United States)

    Brantz, Malcolm; Sadowski, Edward B.

    2010-01-01

    As a strategy to promote the Arapahoe Community College Library's collections and services, the Library undertook to brand itself as a math resource center. In promoting one area of expertise, math was selected to help address the problem of a large portion of high school graduates' inability to work at college-level math. A "Math…

  5. Modular Construction of Large Non-Immune Human Antibody Phage-Display Libraries from Variable Heavy and Light Chain Gene Cassettes.

    Science.gov (United States)

    Lee, Nam-Kyung; Bidlingmaier, Scott; Su, Yang; Liu, Bin

    2018-01-01

    Monoclonal antibodies and antibody-derived therapeutics have emerged as a rapidly growing class of biological drugs for the treatment of cancer, autoimmunity, infection, and neurological diseases. To support the development of human antibodies, various display techniques based on antibody gene repertoires have been constructed over the last two decades. In particular, scFv-antibody phage display has been extensively utilized to select lead antibodies against a variety of target antigens. To construct a scFv phage display that enables efficient antibody discovery, and optimization, it is desirable to develop a system that allows modular assembly of highly diverse variable heavy chain and light chain (Vκ and Vλ) repertoires. Here, we describe modular construction of large non-immune human antibody phage-display libraries built on variable gene cassettes from heavy chain and light chain repertoires (Vκ- and Vλ-light can be made into independent cassettes). We describe utility of such libraries in antibody discovery and optimization through chain shuffling.

  6. Random small interfering RNA library screen identifies siRNAs that induce human erythroleukemia cell differentiation.

    Science.gov (United States)

    Fan, Cuiqing; Xiong, Yuan; Zhu, Ning; Lu, Yabin; Zhang, Jiewen; Wang, Song; Liang, Zicai; Shen, Yan; Chen, Meihong

    2011-03-01

    Cancers are characterized by poor differentiation. Differentiation therapy is a strategy to alleviate malignant phenotypes by inducing cancer cell differentiation. Here we carried out a combinatorial high-throughput screen with a random siRNA library on human erythroleukemia K-562 cell differentiation. Two siRNAs screened from the library were validated to be able to induce erythroid differentiation to varying degrees, determined by CD235 and globin up-regulation, GATA-2 down-regulation, and cell growth inhibition. The screen we performed here is the first trial of screening cancer differentiation-inducing agents from a random siRNA library, demonstrating that a random siRNA library can be considered as a new resource in efforts to seek new therapeutic agents for cancers. As a random siRNA library has a broad coverage for the entire genome, including known/unknown genes and protein coding/non-coding sequences, screening using a random siRNA library can be expected to greatly augment the repertoire of therapeutic siRNAs for cancers.

  7. Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation

    Directory of Open Access Journals (Sweden)

    Li Sen

    2012-03-01

    Full Text Available Abstract Background The Approximate Bayesian Computation (ABC approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years. Results We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC. Conclusions We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from

  8. Diverse circovirus-like genome architectures revealed by environmental metagenomics.

    Science.gov (United States)

    Rosario, Karyna; Duffy, Siobain; Breitbart, Mya

    2009-10-01

    Single-stranded DNA (ssDNA) viruses with circular genomes are the smallest viruses known to infect eukaryotes. The present study identified 10 novel genomes similar to ssDNA circoviruses through data-mining of public viral metagenomes. The metagenomic libraries included samples from reclaimed water and three different marine environments (Chesapeake Bay, British Columbia coastal waters and Sargasso Sea). All the genomes have similarities to the replication (Rep) protein of circoviruses; however, only half have genomic features consistent with known circoviruses. Some of the genomes exhibit a mixture of genomic features associated with different families of ssDNA viruses (i.e. circoviruses, geminiviruses and parvoviruses). Unique genome architectures and phylogenetic analysis of the Rep protein suggest that these viruses belong to novel genera and/or families. Investigating the complex community of ssDNA viruses in the environment can lead to the discovery of divergent species and help elucidate evolutionary links between ssDNA viruses.

  9. A simple, high throughput method to locate single copy sequences from Bacterial Artificial Chromosome (BAC libraries using High Resolution Melt analysis

    Directory of Open Access Journals (Sweden)

    Caligari Peter DS

    2010-05-01

    Full Text Available Abstract Background The high-throughput anchoring of genetic markers into contigs is required for many ongoing physical mapping projects. Multidimentional BAC pooling strategies for PCR-based screening of large insert libraries is a widely used alternative to high density filter hybridisation of bacterial colonies. To date, concerns over reliability have led most if not all groups engaged in high throughput physical mapping projects to favour BAC DNA isolation prior to amplification by conventional PCR. Results Here, we report the first combined use of Multiplex Tandem PCR (MT-PCR and High Resolution Melt (HRM analysis on bacterial stocks of BAC library superpools as a means of rapidly anchoring markers to BAC colonies and thereby to integrate genetic and physical maps. We exemplify the approach using a BAC library of the model plant Arabidopsis thaliana. Super pools of twenty five 384-well plates and two-dimension matrix pools of the BAC library were prepared for marker screening. The entire procedure only requires around 3 h to anchor one marker. Conclusions A pre-amplification step during MT-PCR allows high multiplexing and increases the sensitivity and reliability of subsequent HRM discrimination. This simple gel-free protocol is more reliable, faster and far less costly than conventional PCR screening. The option to screen in parallel 3 genetic markers in one MT-PCR-HRM reaction using templates from directly pooled bacterial stocks of BAC-containing bacteria further reduces time for anchoring markers in physical maps of species with large genomes.

  10. NSUF Irradiated Materials Library

    Energy Technology Data Exchange (ETDEWEB)

    Cole, James Irvin [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The Nuclear Science User Facilities has been in the process of establishing an innovative Irradiated Materials Library concept for maximizing the value of previous and on-going materials and nuclear fuels irradiation test campaigns, including utilization of real-world components retrieved from current and decommissioned reactors. When the ATR national scientific user facility was established in 2007 one of the goals of the program was to establish a library of irradiated samples for users to access and conduct research through competitively reviewed proposal process. As part of the initial effort, staff at the user facility identified legacy materials from previous programs that are still being stored in laboratories and hot-cell facilities at the INL. In addition other materials of interest were identified that are being stored outside the INL that the current owners have volunteered to enter into the library. Finally, over the course of the last several years, the ATR NSUF has irradiated more than 3500 specimens as part of NSUF competitively awarded research projects. The Logistics of managing this large inventory of highly radioactive poses unique challenges. This document will describe materials in the library, outline the policy for accessing these materials and put forth a strategy for making new additions to the library as well as establishing guidelines for minimum pedigree needed to be included in the library to limit the amount of material stored indefinitely without identified value.

  11. Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

    Directory of Open Access Journals (Sweden)

    Rachel A Mann

    Full Text Available The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus and strains infecting Rubus (raspberries and blackberries. Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains, the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea and a putative secondary metabolite pathway only present in Rubus-infecting strains.

  12. SOL: A Library for Scalable Online Learning Algorithms

    OpenAIRE

    Wu, Yue; Hoi, Steven C. H.; Liu, Chenghao; Lu, Jing; Sahoo, Doyen; Yu, Nenghai

    2016-01-01

    SOL is an open-source library for scalable online learning algorithms, and is particularly suitable for learning with high-dimensional data. The library provides a family of regular and sparse online learning algorithms for large-scale binary and multi-class classification tasks with high efficiency, scalability, portability, and extensibility. SOL was implemented in C++, and provided with a collection of easy-to-use command-line tools, python wrappers and library calls for users and develope...

  13. Analysis of transposons and repeat composition of the sunflower (Helianthus annuus L.) genome.

    Science.gov (United States)

    Cavallini, Andrea; Natali, Lucia; Zuccolo, Andrea; Giordani, Tommaso; Jurman, Irena; Ferrillo, Veronica; Vitacolonna, Nicola; Sarri, Vania; Cattonaro, Federica; Ceccarelli, Marilena; Cionini, Pier Giorgio; Morgante, Michele

    2010-02-01

    A sample-sequencing strategy combined with slot-blot hybridization and FISH was used to study the composition of the repetitive component of the sunflower genome. One thousand six hundred thirty-eight sequences for a total of 954,517 bp were analyzed. The fraction of sequences that can be classified as repetitive using computational and hybridization approaches amounts to 62% in total. Almost two thirds remain as yet uncharacterized in nature. Of those characterized, most belong to the gypsy superfamily of LTR-retrotransposons. Unlike in other species, where single families can account for large fractions of the genome, it appears that no transposon family has been amplified to very high levels in sunflower. All other known classes of transposable elements were also found. One family of unknown nature (contig 61) was the most repeated in the sunflower genome. The evolution of the repetitive component in the Helianthus genus and in other Asteraceae was studied by comparative analysis of the hybridization of total genomic DNAs from these species to the sunflower small-insert library and compared to gene-based phylogeny. Very little similarity is observed between Helianthus species and two related Asteraceae species outside of the genus. Most repetitive elements are similar in annual and perennial Helianthus species indicating that sequence amplification largely predates such divergence. Gypsy-like elements are more represented in the annuals than in the perennials, while copia-like elements are similarly represented, attesting a different amplification history of the two superfamilies of LTR-retrotransposons in the Helianthus genus.

  14. Quality evaluation of tandem mass spectral libraries.

    Science.gov (United States)

    Oberacher, Herbert; Weinmann, Wolfgang; Dresen, Sebastian

    2011-06-01

    Tandem mass spectral libraries are gaining more and more importance for the identification of unknowns in different fields of research, including metabolomics, forensics, toxicology, and environmental analysis. Particularly, the recent invention of reliable, robust, and transferable libraries has increased the general acceptance of these tools. Herein, we report on results obtained from thorough evaluation of the match reliabilities of two tandem mass spectral libraries: the MSforID library established by the Oberacher group in Innsbruck and the Weinmann library established by the Weinmann group in Freiburg. Three different experiments were performed: (1) Spectra of the libraries were searched against their corresponding library after excluding either this single compound-specific spectrum or all compound-specific spectra prior to searching; (2) the libraries were searched against each other using either library as reference set or sample set; (3) spectra acquired on different mass spectrometric instruments were matched to both libraries. Almost 13,000 tandem mass spectra were included in this study. The MSforID search algorithm was used for spectral matching. Statistical evaluation of the library search results revealed that principally both libraries enable the sensitive and specific identification of compounds. Due to higher mass accuracy of the QqTOF compared with the QTrap instrument, matches to the MSforID library were more reliable when comparing spectra with both libraries. Furthermore, only the MSforID library was shown to be efficiently transferable to different kinds of tandem mass spectrometers, including "tandem-in-time" instruments; this is due to the coverage of a large range of different collision energy settings-including the very low range-which is an outstanding characteristics of the MSforID library.

  15. BESST--efficient scaffolding of large fragmented assemblies.

    Science.gov (United States)

    Sahlin, Kristoffer; Vezzi, Francesco; Nystedt, Björn; Lundeberg, Joakim; Arvestad, Lars

    2014-08-15

    The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software's general performance. We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding.

  16. The Switchgrass Genome: Tools and Strategies

    Directory of Open Access Journals (Sweden)

    Michael D. Casler

    2011-11-01

    Full Text Available Switchgrass ( L. is a perennial grass species receiving significant focus as a potential bioenergy crop. In the last 5 yr the switchgrass research community has produced a genetic linkage map, an expressed sequence tag (EST database, a set of single nucleotide polymorphism (SNP markers that are distributed across the 18 linkage groups, 4x sampling of the AP13 genome in 400-bp reads, and bacterial artificial chromosome (BAC libraries containing over 200,000 clones. These studies have revealed close collinearity of the switchgrass genome with those of sorghum [ (L. Moench], rice ( L., and (L. P. Beauv. Switchgrass researchers have also developed several microarray technologies for gene expression studies. Switchgrass genomic resources will accelerate the ability of plant breeders to enhance productivity, pest resistance, and nutritional quality. Because switchgrass is a relative newcomer to the genomics world, many secrets of the switchgrass genome have yet to be revealed. To continue to efficiently explore basic and applied topics in switchgrass, it will be critical to capture and exploit the knowledge of plant geneticists and breeders on the next logical steps in the development and utilization of genomic resources for this species. To this end, the community has established a switchgrass genomics executive committee and work group ( [verified 28 Oct. 2011].

  17. A Blumeria graminis f.sp. hordei BAC library - contig building and microsynteny studies

    DEFF Research Database (Denmark)

    Pedersen, C.; Wu, B.; Giese, H.

    2002-01-01

    A bacterial artificial chromosome (BAC) library of Blumeria graminis f.sp. hordei, containing 12,000 clones with an average insert size of 41 kb, was constructed. The library represents about three genome equivalents and BAC-end sequencing showed a high content of repetitive sequences, making...... contigs, at or close to avirulence loci, were constructed. Single nucleotide polymorphism (SNP) markers were developed from BAC-end sequences to link the contigs to the genetic maps. Two other BAC contigs were used to study microsynteny between B. graminis and two other ascomycetes, Neurospora crassa...

  18. AutoWIG: automatic generation of python bindings for C++ libraries

    Directory of Open Access Journals (Sweden)

    Pierre Fernique

    2018-04-01

    Full Text Available Most of Python and R scientific packages incorporate compiled scientific libraries to speed up the code and reuse legacy libraries. While several semi-automatic solutions exist to wrap these compiled libraries, the process of wrapping a large library is cumbersome and time consuming. In this paper, we introduce AutoWIG, a Python package that wraps automatically compiled libraries into high-level languages using LLVM/Clang technologies and the Mako templating engine. Our approach is automatic, extensible, and applies to complex C++ libraries, composed of thousands of classes or incorporating modern meta-programming constructs.

  19. The ARL 2030 Scenarios: A User's Guide for Research Libraries

    Science.gov (United States)

    Association of Research Libraries, 2010

    2010-01-01

    This user's guide was developed to advance local planning at ARL member libraries. It is written for library leaders writ large and for anyone leading or contributing to research library planning processes. Users do not need advanced facilitation skills to benefit from this guide, but facilitators charged with supporting scenario planning will…

  20. Encyclopedia and library in the contemporary age

    Directory of Open Access Journals (Sweden)

    Paolo Traniello

    2017-06-01

    Full Text Available With the publication of the Encyclopédie of Diderot and D’Alembert, which dates from 1751, the concept of library is declined in a negative sense, as a closing tool and impediment in the hands of the State, compared to the Encyclopedia, considered the book of the people. The constitution of the public library (the british Public Libraries Act is dated 1850, funded with public money, determines to change and both realities, Library and Encyclopedia, begin to interact contributing to the development of contemporary culture. Such as the Library, the Encyclopedia also supposes a work of organization of knowledge through classifications. From this point of view, both institutions seem closer in a purpose that unites them. In view of the information revolution, the Library should help reduce complexity of digital information and should play, against the vastness of the network, a role similar to what the Enlightenment Encyclopedia has assigned to itself in the face of large library collections, which were seen as points of accumulation, and also as points of the knowledge’s dispersion.

  1. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  2. A lightweight communication library for distributed computing

    International Nuclear Information System (INIS)

    Groen, Derek; Rieder, Steven; Zwart, Simon Portegies; Grosso, Paola; Laat, Cees de

    2010-01-01

    We present MPWide, a platform-independent communication library for performing message passing between computers. Our library allows coupling of several local message passing interface (MPI) applications through a long-distance network and is specifically optimized for such communications. The implementation is deliberately kept lightweight and platform independent, and the library can be installed and used without administrative privileges. The only requirements are a C++ compiler and at least one open port to a wide-area network on each site. In this paper we present the library, describe the user interface, present performance tests and apply MPWide in a large-scale cosmological N-body simulation on a network of two computers, one in Amsterdam and the other in Tokyo.

  3. Access to IEEE Electronic Library

    CERN Multimedia

    2007-01-01

    From 2007, the CERN Library now offers readers online access to the complete IEEE Electronic Library (Institute of Electrical and Electronics Engineers). This new licence gives unlimited online access to all IEEE and IET (previously IEE) journals and proceedings as well as all current IEEE standards and selected archived ones. Some of the titles offer volumes back to 1913. This service currently represents more than 1,400,000 full-text articles! This leading engineering information resource replaces the previous service, a sub-product of the IEEE database called 'IEEE Enterprise', which offered online access to the complete collection of IEEE journals and proceedings, but with limited features. The service had become so popular that the CERN Working Group for Acquisitions recommended that the Library subscribe to the complete IEEE Electronic Library for 2007. Usage statistics for recent months showed there was a demand for the service from a large community of CERN users and we were aware that many users h...

  4. Bacterial Artificial Chromosome Libraries of Pulse Crops: Characteristics and Applications

    Directory of Open Access Journals (Sweden)

    Kangfu Yu

    2012-01-01

    Full Text Available Pulse crops are considered minor on a global scale despite their nutritional value for human consumption. Therefore, they are relatively less extensively studied in comparison with the major crops. The need to improve pulse crop production and quality will increase with the increasing global demand for food security and people's awareness of nutritious food. The improvement of pulse crops will require fully utilizing all their genetic resources. Bacterial artificial chromosome (BAC libraries of pulse crops are essential genomic resources that have the potential to accelerate gene discovery and enhance molecular breeding in these crops. Here, we review the availability, characteristics, applications, and potential applications of the BAC libraries of pulse crops.

  5. Bacterial Artificial Chromosome Libraries of Pulse Crops: Characteristics and Applications

    Science.gov (United States)

    Yu, Kangfu

    2012-01-01

    Pulse crops are considered minor on a global scale despite their nutritional value for human consumption. Therefore, they are relatively less extensively studied in comparison with the major crops. The need to improve pulse crop production and quality will increase with the increasing global demand for food security and people's awareness of nutritious food. The improvement of pulse crops will require fully utilizing all their genetic resources. Bacterial artificial chromosome (BAC) libraries of pulse crops are essential genomic resources that have the potential to accelerate gene discovery and enhance molecular breeding in these crops. Here, we review the availability, characteristics, applications, and potential applications of the BAC libraries of pulse crops. PMID:21811383

  6. E-book Reader Devices and Libraries

    Directory of Open Access Journals (Sweden)

    Pažur, I.

    2011-03-01

    Full Text Available Most library studies thematically related to electronic books don't consider readers of electronic books. Only in recent years librarians conduct studies in which they want to find out readers' opinions about the possibilities, advantages and disadvantages of reading using e-readers, as well as their possible application in the libraries.User studies of e-readers have shown that their opinion is generally positive, but great attachment to traditional books is still present, e-readers are still seen only as an additional tool for reading. Sony with its e-reader (the latest Reader model Daily and Reader Store online bookstore (http://ebookstore.sony.com/is the only one who cooperate with libraries and has made lending electronic books possible. Cooperation was launched in 2009th,and the New York Public Library was the first library that offered such a service.Cooperation between Sony and libraries, indicates clearly what the near future could be if other online booksellers / publishers begin to follow the model of lending e-books through the libraries over the network. However it is possible that a large online bookstores / publishers consider that the further price reduction of e-readers and electronic books will constantly increase their sales, and in that case lending e-books will be unnecessary.Are the libraries ready for this scenario?

  7. User services in the central library of Juelich Research Center

    International Nuclear Information System (INIS)

    Lapp, E.

    1993-01-01

    The central library is a sci/tech special library providing information for the KFA researchers and staff. The library has a large collection of sci/tech materials to meet the information demands of the KFA employees and over 3.000 external users. Among the outside users are students fromthe universities and polytechnics of the region Aachen, Cologne, Duesseldorf, and industry. The library acquires about 8.000 volumes per year and subscribes to 2000 journals. (orig.)

  8. Issues for bringing digital libraries into public use

    Science.gov (United States)

    Flater, David W.; Yesha, Yelena

    1993-01-01

    In much the same way that the field of artificial intelligence produced a cult which fervently believed that computers would soon think like human beings, the existence of electronic books has resurrected the paperless society as a utopian vision to some, an apocalyptic horror to others. In this essay we have attempted to provide realistic notions of what digital libraries are likely to become if they are a popular success. E-books are capable of subsuming most of the media we use today and have the potential for added functionality by being interactive. The environmental impact of having millions more computers will be offset to some degree, perhaps even exceeded, by the fact that televisions, stereos, VCR's, CD players, newspapers, magazines, and books will become part of the computer system or be made redundant. On the whole, large-scale use of digital libraries is likely to be a winning proposition. Whether or not this comes to pass depends on the directions taken by today's researchers and software developers. By involving the public, the effort being put into digital libraries can be leveraged into something which is big enough to make a real change for the better. If digital libraries remain the exclusive property of government, universities, and large research firms, then large parts of the world will remain without digital libraries for years to come, just as they have remained without digital phone service for far too long. If software companies try to scuttle the project by patenting crucial algorithms and using proprietary data formats, all of us will suffer. Let us reverse the errors of the past and create a truly open digital library system.

  9. Genomics and the human genome project: implications for psychiatry

    OpenAIRE

    Kelsoe, J R

    2004-01-01

    In the past decade the Human Genome Project has made extraordinary strides in understanding of fundamental human genetics. The complete human genetic sequence has been determined, and the chromosomal location of almost all human genes identified. Presently, a large international consortium, the HapMap Project, is working to identify a large portion of genetic variation in different human populations and the structure and relationship of these variants to each other. The Human Genome Project h...

  10. Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don

    Directory of Open Access Journals (Sweden)

    Dillon Shannon K

    2009-01-01

    Full Text Available Abstract Background Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs from genes involved in wood formation in radiata pine. Results Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs, transition (11 yrs and mature (30 yrs ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology terms and their functions are unknown or unclassified. More than half (52.1% of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of

  11. Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q.

    Science.gov (United States)

    Xie, Wen; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Yang, Xin; Wang, Dan; Chen, Ming; Huang, Jinqun; Wen, Yanan; Zeng, Yang; Liu, Yating; Xia, Jixing; Tian, Lixia; Cui, Hongying; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Li, Xianchun; Tan, Xinqiu; Ghanim, Murad; Qiu, Baoli; Pan, Huipeng; Chu, Dong; Delatte, Helene; Maruthi, M N; Ge, Feng; Zhou, Xueping; Wang, Xiaowei; Wan, Fanghao; Du, Yuzhou; Luo, Chen; Yan, Fengming; Preisser, Evan L; Jiao, Xiaoguo; Coates, Brad S; Zhao, Jinyang; Gao, Qiang; Xia, Jinquan; Yin, Ye; Liu, Yong; Brown, Judith K; Zhou, Xuguo Joe; Zhang, Youjun

    2017-05-01

    The sweetpotato whitefly Bemisia tabaci is a highly destructive agricultural and ornamental crop pest. It damages host plants through both phloem feeding and vectoring plant pathogens. Introductions of B. tabaci are difficult to quarantine and eradicate because of its high reproductive rates, broad host plant range, and insecticide resistance. A total of 791 Gb of raw DNA sequence from whole genome shotgun sequencing, and 13 BAC pooling libraries were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 437 kb, and a total length of 658 Mb. Annotation of repetitive elements and coding regions resulted in 265.0 Mb TEs (40.3%) and 20 786 protein-coding genes with putative gene family expansions, respectively. Phylogenetic analysis based on orthologs across 14 arthropod taxa suggested that MED/Q is clustered into a hemipteran clade containing A. pisum and is a sister lineage to a clade containing both R. prolixus and N. lugens. Genome completeness, as estimated using the CEGMA and Benchmarking Universal Single-Copy Orthologs pipelines, reached 96% and 79%. These MED/Q genomic resources lay a foundation for future 'pan-genomic' comparisons of invasive vs. noninvasive, invasive vs. invasive, and native vs. exotic Bemisia, which, in return, will open up new avenues of investigation into whitefly biology, evolution, and management. © The Author 2017. Published by Oxford University Press.

  12. Genome size analyses of Pucciniales reveal the largest fungal genomes.

    Science.gov (United States)

    Tavares, Sílvia; Ramos, Ana Paula; Pires, Ana Sofia; Azinheira, Helena G; Caldeirinha, Patrícia; Link, Tobias; Abranches, Rita; Silva, Maria do Céu; Voegele, Ralf T; Loureiro, João; Talhinhas, Pedro

    2014-01-01

    Rust fungi (Basidiomycota, Pucciniales) are biotrophic plant pathogens which exhibit diverse complexities in their life cycles and host ranges. The completion of genome sequencing of a few rust fungi has revealed the occurrence of large genomes. Sequencing efforts for other rust fungi have been hampered by uncertainty concerning their genome sizes. Flow cytometry was recently applied to estimate the genome size of a few rust fungi, and confirmed the occurrence of large genomes in this order (averaging 225.3 Mbp, while the average for Basidiomycota was 49.9 Mbp and was 37.7 Mbp for all fungi). In this work, we have used an innovative and simple approach to simultaneously isolate nuclei from the rust and its host plant in order to estimate the genome size of 30 rust species by flow cytometry. Genome sizes varied over 10-fold, from 70 to 893 Mbp, with an average genome size value of 380.2 Mbp. Compared to the genome sizes of over 1800 fungi, Gymnosporangium confusum possesses the largest fungal genome ever reported (893.2 Mbp). Moreover, even the smallest rust genome determined in this study is larger than the vast majority of fungal genomes (94%). The average genome size of the Pucciniales is now of 305.5 Mbp, while the average Basidiomycota genome size has shifted to 70.4 Mbp and the average for all fungi reached 44.2 Mbp. Despite the fact that no correlation could be drawn between the genome sizes, the phylogenomics or the life cycle of rust fungi, it is interesting to note that rusts with Fabaceae hosts present genomes clearly larger than those with Poaceae hosts. Although this study comprises only a small fraction of the more than 7000 rust species described, it seems already evident that the Pucciniales represent a group where genome size expansion could be a common characteristic. This is in sharp contrast to sister taxa, placing this order in a relevant position in fungal genomics research.

  13. Evaluation of microsatellite loci from libraries derived from the wild diploid 'Calcutta 4' and 'Ouro' banana cultivars.

    Science.gov (United States)

    Silva, P R O; Jesus, O N J; Creste, S; Figueira, A; Amorim, E P; Ferreira, C F

    2015-09-25

    Microsatellite markers have been widely used in the quantification of genetic variability and for genetic breeding in Musa spp. The objective of the present study was to evaluate the discriminatory power of microsatellite markers derived from 'Calcutta 4' and 'Ouro' genomic libraries, and to analyze the genetic variability among 30 banana accessions. Thirty-eight markers were used: 15 from the 'Ouro' library and 23 from the 'Calcutta 4' library. Genetic diversity was evaluated by considering SSR markers as both dominant markers because of the presence of triploid accessions, and co-dominant markers. For the dominant analysis, polymorphism information content (PIC) values for 44 polymorphic markers ranged from 0.063 to 0.533, with a mean value of 0.24. A dendrogram analysis separated the BGB-Banana accessions into 4 groups: the 'Ouro' and 'Muísa Tia' accessions were the most dissimilar (93% dissimilarity), while the most similar accessions were 'Pacovan' and 'Walha'. The mean genetic distance between samples was 0.74. For the analysis considering SSR markers as co-dominants, using only diploid accessions, two groups were separated based on their genome contents (A and B). The PIC values for the markers from the 'Calcutta 4' library varied from 0.4836 to 0.7886, whereas those from the 'Ouro' library ranged from 0.3800 to 0.7521. Given the high PIC values, the markers from both the libraries showed high discriminatory power, and can therefore be widely applied for analysis of genetic diversity, population structures, and linkage mapping in Musa spp.

  14. Dramatic improvement in genome assembly achieved using doubled-haploid genomes.

    Science.gov (United States)

    Zhang, Hong; Tan, Engkong; Suzuki, Yutaka; Hirose, Yusuke; Kinoshita, Shigeharu; Okano, Hideyuki; Kudoh, Jun; Shimizu, Atsushi; Saito, Kazuyoshi; Watabe, Shugo; Asakawa, Shuichi

    2014-10-27

    Improvement in de novo assembly of large genomes is still to be desired. Here, we improved draft genome sequence quality by employing doubled-haploid individuals. We sequenced wildtype and doubled-haploid Takifugu rubripes genomes, under the same conditions, using the Illumina platform and assembled contigs with SOAPdenovo2. We observed 5.4-fold and 2.6-fold improvement in the sizes of the N50 contig and scaffold of doubled-haploid individuals, respectively, compared to the wildtype, indicating that the use of a doubled-haploid genome aids in accurate genome analysis.

  15. Exploiting Chemical Libraries, Structure, and Genomics in the Search for Kinase Inhibitors

    NARCIS (Netherlands)

    Gray, Nathanael S.; Wodicka, Lisa; Thunnissen, Andy-Mark W.H.; Norman, Thea C.; Kwon, Soojin; Espinoza, F. Hernan; Morgan, David O.; Barnes, Georjana; LeClerc, Sophie; Meijer, Laurent; Kim, Sung-Hou; Lockhart, David J.; Schultz, Peter G.

    1998-01-01

    Selective protein kinase inhibitors were developed on the basis of the unexpected binding mode of 2,6,9-trisubstituted purines to the adenosine triphosphate-binding site of the human cyclin-dependent kinase 2 (CDK2). By iterating chemical library synthesis and biological screening, potent inhibitors

  16. Breeding and Genetics Symposium: really big data: processing and analysis of very large data sets.

    Science.gov (United States)

    Cole, J B; Newman, S; Foertter, F; Aguilar, I; Coffey, M

    2012-03-01

    Modern animal breeding data sets are large and getting larger, due in part to recent availability of high-density SNP arrays and cheap sequencing technology. High-performance computing methods for efficient data warehousing and analysis are under development. Financial and security considerations are important when using shared clusters. Sound software engineering practices are needed, and it is better to use existing solutions when possible. Storage requirements for genotypes are modest, although full-sequence data will require greater storage capacity. Storage requirements for intermediate and results files for genetic evaluations are much greater, particularly when multiple runs must be stored for research and validation studies. The greatest gains in accuracy from genomic selection have been realized for traits of low heritability, and there is increasing interest in new health and management traits. The collection of sufficient phenotypes to produce accurate evaluations may take many years, and high-reliability proofs for older bulls are needed to estimate marker effects. Data mining algorithms applied to large data sets may help identify unexpected relationships in the data, and improved visualization tools will provide insights. Genomic selection using large data requires a lot of computing power, particularly when large fractions of the population are genotyped. Theoretical improvements have made possible the inversion of large numerator relationship matrices, permitted the solving of large systems of equations, and produced fast algorithms for variance component estimation. Recent work shows that single-step approaches combining BLUP with a genomic relationship (G) matrix have similar computational requirements to traditional BLUP, and the limiting factor is the construction and inversion of G for many genotypes. A naïve algorithm for creating G for 14,000 individuals required almost 24 h to run, but custom libraries and parallel computing reduced that to

  17. Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

    Science.gov (United States)

    Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro

    2014-01-01

    The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631

  18. Libraries for users services in academic libraries

    CERN Document Server

    Alvite, Luisa

    2010-01-01

    This book reviews the quality and evolution of academic library services. It revises service trends offered by academic libraries and the challenge of enhancing traditional ones such as: catalogues, repositories and digital collections, learning resources centres, virtual reference services, information literacy and 2.0 tools.studies the role of the university library in the new educational environment of higher educationrethinks libraries in academic contextredefines roles for academic libraries

  19. Is There a Global Role for Metropolitan City Libraries?

    Science.gov (United States)

    Mason, Marilyn Gell

    1994-01-01

    Discusses the potential for linking large metropolitan public libraries to international interlibrary loan networks. Issues involved in international networking, including funding, standards, network connectivity, and protectionism, are discussed. Examples of libraries capable of participating and brief descriptions of their collections are given.…

  20. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

    DEFF Research Database (Denmark)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo

    2012-01-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....

  1. Insights into the genome of large sulfur bacteria revealed by analysis of single filaments

    DEFF Research Database (Denmark)

    Mussmann, Marc; Hu, Fen Z.; Richter, Michael

    2007-01-01

    Beggiatoa to overcome non-overlapping availabilities of electron donors and acceptors while gliding between oxic and sulfidic zones. The first look into the genome of these filamentous sulfur-oxidizing bacteria substantially deepens the understanding of their evolution and their contribution to sulfur......Marine sediments are frequently covered by mats of the filamentous Beggiatoa and other large nitrate-storing bacteria that oxidize hydrogen sulfide using either oxygen or nitrate, which they store in intracellular vacuoles. Despite their conspicuous metabolic properties and their biogeochemical...

  2. Outsourcing the law firm library: the UK experience

    OpenAIRE

    Brown, Fiona

    2017-01-01

    Since 2009, a number of large and leading UK law firms have outsourced their in-house law library and research service to outsource service providers. Integreon, the leading provider of these services in the UK, commenced operations in Australia in 2011. Since that time, a number of other providers of outsourced law library and legal research services have attracted a number of top-tier Australian law firms as clients. These outsource providers are not currently providing law library and lega...

  3. Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing.

    Science.gov (United States)

    Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong

    2018-05-04

    Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.

  4. Segregation distortion causes large-scale differences between male and female genomes in hybrid ants.

    Science.gov (United States)

    Kulmuni, Jonna; Seifert, Bernhard; Pamilo, Pekka

    2010-04-20

    Hybridization in isolated populations can lead either to hybrid breakdown and extinction or in some cases to speciation. The basis of hybrid breakdown lies in genetic incompatibilities between diverged genomes. In social Hymenoptera, the consequences of hybridization can differ from those in other animals because of haplodiploidy and sociality. Selection pressures differ between sexes because males are haploid and females are diploid. Furthermore, sociality and group living may allow survival of hybrid genotypes. We show that hybridization in Formica ants has resulted in a stable situation in which the males form two highly divergent gene pools whereas all the females are hybrids. This causes an exceptional situation with large-scale differences between male and female genomes. The genotype differences indicate strong transmission ratio distortion depending on offspring sex, whereby the mother transmits some alleles exclusively to her daughters and other alleles exclusively to her sons. The genetic differences between the sexes and the apparent lack of multilocus hybrid genotypes in males can be explained by recessive incompatibilities which cause the elimination of hybrid males because of their haploid genome. Alternatively, differentiation between sexes could be created by prezygotic segregation into male-forming and female-forming gametes in diploid females. Differentiation between sexes is stable and maintained throughout generations. The present study shows a unique outcome of hybridization and demonstrates that hybridization has the potential of generating evolutionary novelties in animals.

  5. SCHOOL COMMUNITY PERCEPTION OF LIBRARY APPS AGAINTS LIBRARY EMPOWERMENT

    Directory of Open Access Journals (Sweden)

    Achmad Riyadi Alberto

    2017-07-01

    Full Text Available Abstract. This research is motivated by the development of information and communication technology (ICT in the library world so rapidly that allows libraries in the present to develop its services into digital-based services. This study aims to find out the school community’s perception of library apps developed by Riche Cynthia Johan, Hana Silvana, and Holin Sulistyo and its influence on library empowerment at the library of SD Laboratorium Percontohan UPI Bandung. Library apps in this research belong to the context of m-libraries, which is a library that meets the needs of its users by using mobile platforms such as smartphones,computers, and other mobile devices. Empowerment of library is the utilization of all aspects of the implementation of libraries to the best in order to achieve the expected goals. An analysis of the schoolcommunity’s perception of library apps using the Technology Acceptance Model (TAM includes: ease of use, usefulness, usability, usage trends, and real-use conditions. While the empowerment of the library includes aspects: information empowerment, empowerment of learning resources, empowerment of human resources, empowerment of library facilities, and library promotion. The research method used in this research is descriptive method with quantitative approach. Population and sample in this research is school community at SD Laboratorium Percontohan UPI Bandung. Determination of sample criteria by using disproportionate stratified random sampling with the number of samples of 83 respondents. Data analysis using simple linear regression to measure the influence of school community perception about library apps to library empowerment. The result of data analysis shows that there is influence between school community perception about library apps to library empowerment at library of SD Laboratorium Percontohan UPI Bandung which is proved by library acceptance level and library empowerment improvement.

  6. Theoretical Foundations for Digital Libraries

    CERN Document Server

    Fox, Edward A; Shen, Rao

    2012-01-01

    In 1991, a group of researchers chose the term digital libraries to describe an emerging field of research, development, and practice. Since then, Virginia Tech has had funded research in this area, largely through its Digital Library Research Laboratory. This book is the first in a four book series that reports our key findings and current research investigations. Underlying this book series are five completed dissertations (Gonçalves, Kozievitch, Murthy, Shen, Torres), nine dissertations underway, and many masters theses. These reflect our experience with a long string of prototype or produc

  7. A high-density Diversity Arrays Technology (DArT microarray for genome-wide genotyping in Eucalyptus

    Directory of Open Access Journals (Sweden)

    Myburg Alexander A

    2010-06-01

    Full Text Available Abstract Background A number of molecular marker technologies have allowed important advances in the understanding of the genetics and evolution of Eucalyptus, a genus that includes over 700 species, some of which are used worldwide in plantation forestry. Nevertheless, the average marker density achieved with current technologies remains at the level of a few hundred markers per population. Furthermore, the transferability of markers produced with most existing technology across species and pedigrees is usually very limited. High throughput, combined with wide genome coverage and high transferability are necessary to increase the resolution, speed and utility of molecular marker technology in eucalypts. We report the development of a high-density DArT genome profiling resource and demonstrate its potential for genome-wide diversity analysis and linkage mapping in several species of Eucalyptus. Findings After testing several genome complexity reduction methods we identified the PstI/TaqI method as the most effective for Eucalyptus and developed 18 genomic libraries from PstI/TaqI representations of 64 different Eucalyptus species. A total of 23,808 cloned DNA fragments were screened and 13,300 (56% were found to be polymorphic among 284 individuals. After a redundancy analysis, 6,528 markers were selected for the operational array and these were supplemented with 1,152 additional clones taken from a library made from the E. grandis tree whose genome has been sequenced. Performance validation for diversity studies revealed 4,752 polymorphic markers among 174 individuals. Additionally, 5,013 markers showed segregation when screened using six inter-specific mapping pedigrees, with an average of 2,211 polymorphic markers per pedigree and a minimum of 859 polymorphic markers that were shared between any two pedigrees. Conclusions This operational DArT array will deliver 1,000-2,000 polymorphic markers for linkage mapping in most eucalypt pedigrees

  8. BAC end sequencing of Pacific white shrimp Litopenaeus vannamei: a glimpse into the genome of Penaeid shrimp

    Science.gov (United States)

    Zhao, Cui; Zhang, Xiaojun; Liu, Chengzhang; Huan, Pin; Li, Fuhua; Xiang, Jianhai; Huang, Chao

    2012-05-01

    Little is known about the genome of Pacific white shrimp ( Litopenaeus vannamei). To address this, we conducted BAC (bacterial artificial chromosome) end sequencing of L. vannamei. We selected and sequenced 7 812 BAC clones from the BAC library LvHE from the two ends of the inserts by Sanger sequencing. After trimming and quality filtering, 11 279 BAC end sequences (BESs) including 4 609 pairedends BESs were obtained. The total length of the BESs was 4 340 753 bp, representing 0.18% of the L. vannamei haploid genome. The lengths of the BESs ranged from 100 bp to 660 bp with an average length of 385 bp. Analysis of the BESs indicated that the L. vannamei genome is AT-rich and that the primary repeats patterns were simple sequence repeats (SSRs) and low complexity sequences. Dinucleotide and hexanucleotide repeats were the most common SSR types in the BESs. The most abundant transposable element was gypsy, which may contribute to the generation of the large genome size of L. vannamei. We successfully annotated 4 519 BESs by BLAST searching, including genes involved in immunity and sex determination. Our results provide an important resource for functional gene studies, map construction and integration, and complete genome assembly for this species.

  9. Using scientific evidence to improve hospital library services: Southern Chapter/Medical Library Association journal usage study.

    Science.gov (United States)

    Dee, C R; Rankin, J A; Burns, C A

    1998-07-01

    Journal usage studies, which are useful for budget management and for evaluating collection performance relative to library use, have generally described a single library or subject discipline. The Southern Chapter/Medical Library Association (SC/MLA) study has examined journal usage at the aggregate data level with the long-term goal of developing hospital library benchmarks for journal use. Thirty-six SC/MLA hospital libraries, categorized for the study by size as small, medium, or large, reported current journal title use centrally for a one-year period following standardized data collection procedures. Institutional and aggregate data were analyzed for the average annual frequency of use, average costs per use and non-use, and average percent of non-used titles. Permutation F-type tests were used to measure difference among the three hospital groups. Averages were reported for each data set analysis. Statistical tests indicated no significant differences between the hospital groups, suggesting that benchmarks can be derived applying to all types of hospital libraries. The unanticipated lack of commonality among heavily used titles pointed to a need for uniquely tailored collections. Although the small sample size precluded definitive results, the study's findings constituted a baseline of data that can be compared against future studies.

  10. A CRISPR CASe for High-Throughput Silencing

    Directory of Open Access Journals (Sweden)

    Jacob eHeintze

    2013-10-01

    Full Text Available Manipulation of gene expression on a genome-wide level is one of the most important systematic tools in the post-genome era. Such manipulations have largely been enabled by expression cloning approaches using sequence-verified cDNA libraries, large-scale RNA interference libraries (shRNA or siRNA and zinc finger nuclease technologies. More recently, the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated (Cas9-mediated gene editing technology has been described that holds great promise for future use of this technology in genomic manipulation. It was suggested that the CRISPR system has the potential to be used in high-throughput, large-scale loss of function screening. Here we discuss some of the challenges in engineering of CRISPR/Cas genomic libraries and some of the aspects that need to be addressed in order to use this technology on a high-throughput scale.

  11. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome.

    Science.gov (United States)

    Klann, Tyler S; Black, Joshua B; Chellappan, Malathi; Safi, Alexias; Song, Lingyun; Hilton, Isaac B; Crawford, Gregory E; Reddy, Timothy E; Gersbach, Charles A

    2017-06-01

    Large genome-mapping consortia and thousands of genome-wide association studies have identified non-protein-coding elements in the genome as having a central role in various biological processes. However, decoding the functions of the millions of putative regulatory elements discovered in these studies remains challenging. CRISPR-Cas9-based epigenome editing technologies have enabled precise perturbation of the activity of specific regulatory elements. Here we describe CRISPR-Cas9-based epigenomic regulatory element screening (CERES) for improved high-throughput screening of regulatory element activity in the native genomic context. Using dCas9 KRAB repressor and dCas9 p300 activator constructs and lentiviral single guide RNA libraries to target DNase I hypersensitive sites surrounding a gene of interest, we carried out both loss- and gain-of-function screens to identify regulatory elements for the β-globin and HER2 loci in human cells. CERES readily identified known and previously unidentified regulatory elements, some of which were dependent on cell type or direction of perturbation. This technology allows the high-throughput functional annotation of putative regulatory elements in their native chromosomal context.

  12. Consortia for Electronic Library Provision in Belgium

    Directory of Open Access Journals (Sweden)

    Julien Van Borm

    2001-07-01

    Full Text Available E-libraries just like the former paper-based libraries will become increasingly essential and indispensable tools in research and education. Library consortia seem to be the way to get e-libraries started all over the world. However, it is unclear yet whether this is going to be a longlasting workable model. The Belgian research libraries follow the international pattern and are rapidly becoming hybrid libraries especially in business, science, applied sciences and biomedicine (the STM disciplines. Still they have large paper bound collections on board and no library is willing to replace these in the near future by a purely electronic collection of journals. The fear of losing the content and thus the „raison d’être“ of the library and the concern for users not yet familiar with e-information sources are the cornerstone for a prudent, yet conservative policy. Increasingly e-information and e-journals are being taken on board. Paper and electronic go side by side in new hybrid libraries partly also due to the market policy set by the publishers in combining paper and electronic in an attempt to keep or improve the annual turnover reached during the past paper period. The transition from paper to electronic occurred in Belgium somewhat later than in other Western European countries. This confirms the position of Belgium often taking up an average position in Western Europe.

  13. Heroes and Holidays: The Status of Diversity Initiatives at Liberal Arts College Libraries

    Science.gov (United States)

    Gilbert, Julie

    2016-01-01

    Studies about diversity initiatives in academic libraries have primarily focused on large research libraries. But what kinds of diversity work occur at smaller libraries? This study examines the status of diversity initiatives, especially those aimed at students, at national liberal arts college libraries. Results from a survey of library…

  14. Software engineering the mixed model for genome-wide association studies on large samples.

    Science.gov (United States)

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

  15. Fragment library design: using cheminformatics and expert chemists to fill gaps in existing fragment libraries.

    Science.gov (United States)

    Kutchukian, Peter S; So, Sung-Sau; Fischer, Christian; Waller, Chris L

    2015-01-01

    Fragment based screening (FBS) has emerged as a mainstream lead discovery strategy in academia, biotechnology start-ups, and large pharma. As a prerequisite of FBS, a structurally diverse library of fragments is desirable in order to identify chemical matter that will interact with the range of diverse target classes that are prosecuted in contemporary screening campaigns. In addition, it is also desirable to offer synthetically amenable starting points to increase the probability of a successful fragment evolution through medicinal chemistry. Herein we describe a method to identify biologically relevant chemical substructures that are missing from an existing fragment library (chemical gaps), and organize these chemical gaps hierarchically so that medicinal chemists can efficiently navigate the prioritized chemical space and subsequently select purchasable fragments for inclusion in an enhanced fragment library.

  16. Los Alamos Science: The Human Genome Project. Number 20, 1992

    Science.gov (United States)

    Cooper, N. G.; Shea, N. eds.

    1992-01-01

    This document provides a broad overview of the Human Genome Project, with particular emphasis on work being done at Los Alamos. It tries to emphasize the scientific aspects of the project, compared to the more speculative information presented in the popular press. There is a brief introduction to modern genetics, including a review of classic work. There is a broad overview of the Genome Project, describing what the project is, what are some of its major five-year goals, what are major technological challenges ahead of the project, and what can the field of biology, as well as society expect to see as benefits from this project. Specific results on the efforts directed at mapping chromosomes 16 and 5 are discussed. A brief introduction to DNA libraries is presented, bearing in mind that Los Alamos has housed such libraries for many years prior to the Genome Project. Information on efforts to do applied computational work related to the project are discussed, as well as experimental efforts to do rapid DNA sequencing by means of single-molecule detection using applied spectroscopic methods. The article introduces the Los Alamos staff which are working on the Genome Project, and concludes with brief discussions on ethical, legal, and social implications of this work; a brief glimpse of genetics as it may be practiced in the next century; and a glossary of relevant terms.

  17. Los Alamos Science: The Human Genome Project. Number 20, 1992

    Energy Technology Data Exchange (ETDEWEB)

    Cooper, N G; Shea, N [eds.

    1992-01-01

    This article provides a broad overview of the Human Genome Project, with particular emphasis on work being done at Los Alamos. It tries to emphasize the scientific aspects of the project, compared to the more speculative information presented in the popular press. There is a brief introduction to modern genetics, including a review of classic work. There is a broad overview of the Genome Project, describing what the project is, what are some of its major five-year goals, what are major technological challenges ahead of the project, and what can the field of biology, as well as society expect to see as benefits from this project. Specific results on the efforts directed at mapping chromosomes 16 and 5 are discussed. A brief introduction to DNA libraries is presented, bearing in mind that Los Alamos has housed such libraries for many years prior to the Genome Project. Information on efforts to do applied computational work related to the project are discussed, as well as experimental efforts to do rapid DNA sequencing by means of single-molecule detection using applied spectroscopic methods. The article introduces the Los Alamos staff which are working on the Genome Project, and concludes with brief discussions on ethical, legal, and social implications of this work; a brief glimpse of genetics as it may be practiced in the next century; and a glossary of relevant terms.

  18. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    Energy Technology Data Exchange (ETDEWEB)

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-08-16

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  19. Encyclopedia of bacterial gene circuits whose presence or absence correlate with pathogenicity--a large-scale system analysis of decoded bacterial genomes.

    Science.gov (United States)

    Shestov, Maksim; Ontañón, Santiago; Tozeren, Aydin

    2015-10-13

    Bacterial infections comprise a global health challenge as the incidences of antibiotic resistance increase. Pathogenic potential of bacteria has been shown to be context dependent, varying in response to environment and even within the strains of the same genus. We used the KEGG repository and extensive literature searches to identify among the 2527 bacterial genomes in the literature those implicated as pathogenic to the host, including those which show pathogenicity in a context dependent manner. Using data on the gene contents of these genomes, we identified sets of genes highly abundant in pathogenic but relatively absent in commensal strains and vice versa. In addition, we carried out genome comparison within a genus for the seventeen largest genera in our genome collection. We projected the resultant lists of ortholog genes onto KEGG bacterial pathways to identify clusters and circuits, which can be linked to either pathogenicity or synergy. Gene circuits relatively abundant in nonpathogenic bacteria often mediated biosynthesis of antibiotics. Other synergy-linked circuits reduced drug-induced toxicity. Pathogen-abundant gene circuits included modules in one-carbon folate, two-component system, type-3 secretion system, and peptidoglycan biosynthesis. Antibiotics-resistant bacterial strains possessed genes modulating phagocytosis, vesicle trafficking, cytoskeletal reorganization, and regulation of the inflammatory response. Our study also identified bacterial genera containing a circuit, elements of which were previously linked to Alzheimer's disease. Present study produces for the first time, a signature, in the form of a robust list of gene circuitry whose presence or absence could potentially define the pathogenicity of a microbiome. Extensive literature search substantiated a bulk majority of the commensal and pathogenic circuitry in our predicted list. Scanning microbiome libraries for these circuitry motifs will provide further insights into the complex

  20. Construction, database integration, and application of an Oenothera EST library.

    Science.gov (United States)

    Mrácek, Jaroslav; Greiner, Stephan; Cho, Won Kyong; Rauwolf, Uwe; Braun, Martha; Umate, Pavan; Altstätter, Johannes; Stoppel, Rhea; Mlcochová, Lada; Silber, Martina V; Volz, Stefanie M; White, Sarah; Selmeier, Renate; Rudd, Stephen; Herrmann, Reinhold G; Meurer, Jörg

    2006-09-01

    Coevolution of cellular genetic compartments is a fundamental aspect in eukaryotic genome evolution that becomes apparent in serious developmental disturbances after interspecific organelle exchanges. The genus Oenothera represents a unique, at present the only available, resource to study the role of the compartmentalized plant genome in diversification of populations and speciation processes. An integrated approach involving cDNA cloning, EST sequencing, and bioinformatic data mining was chosen using Oenothera elata with the genetic constitution nuclear genome AA with plastome type I. The Gene Ontology system grouped 1621 unique gene products into 17 different functional categories. Application of arrays generated from a selected fraction of ESTs revealed significantly differing expression profiles among closely related Oenothera species possessing the potential to generate fertile and incompatible plastid/nuclear hybrids (hybrid bleaching). Furthermore, the EST library provides a valuable source of PCR-based polymorphic molecular markers that are instrumental for genotyping and molecular mapping approaches.

  1. Cross-section libraries and kerma factors

    International Nuclear Information System (INIS)

    Little, R.C.; MacFarlane, R.E.; Seamon, R.E.

    1991-01-01

    A large amount of data is required in order to accurately simulate various aspects of Cold Neutron Sources using radiation transport codes such as MCNP and TWODANT. In particular, the following types of data are needed: couple neutron/photon transport libraries, neutron thermal S(α,β) data, response function data (including energy deposition), and proton interaction data. This paper concentrates on the coupled neutron/photon transport libraries and energy deposition. Data libraries available to radiation transport codes are obtained as a result of efforts in many areas, including differential and integral measurements, theoretical model codes, data evaluations, data processing, and data testing. A wide variety of data libraries are available to users of radiation transport codes, including pointwise and multigroup libraries. At Los Alamos, the authors generally recommend the use of data libraries derived from ENDF/B-V. It is often important to know how much energy is deposited in various regions of a device. This problem is typically modeled in radiation transport codes by folding the calculated fluences with an energy-dependent 'heating number'. The heating number represents the average energy deposited locally per collision. Calculation of these heating numbers from evaluated data libraries is fraught with difficulty. Many past difficulties related to energy deposition should be resolved by the release of ENDF/B-VI

  2. Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

    Science.gov (United States)

    Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

    2017-10-09

    Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

  3. Generation of Mouse Haploid Somatic Cells by Small Molecules for Genome-wide Genetic Screening

    Directory of Open Access Journals (Sweden)

    Zheng-Quan He

    2017-08-01

    Full Text Available The recent success of derivation of mammalian haploid embryonic stem cells (haESCs has provided a powerful tool for large-scale functional analysis of the mammalian genome. However, haESCs rapidly become diploidized after differentiation, posing challenges for genetic analysis. Here, we show that the spontaneous diploidization of haESCs happens in metaphase due to mitotic slippage. Diploidization can be suppressed by small-molecule-mediated inhibition of CDK1 and ROCK. Through ROCK inhibition, we can generate haploid somatic cells of all three germ layers from haESCs, including terminally differentiated neurons. Using piggyBac transposon-based insertional mutagenesis, we generated a haploid neural cell library harboring genome-wide mutations for genetic screening. As a proof of concept, we screened for Mn2+-mediated toxicity and identified the Park2 gene. Our findings expand the applications of mouse haploid cell technology to somatic cell types and may also shed light on the mechanisms of ploidy maintenance.

  4. Analysis of genomic regions of Trichoderma harzianum IOC-3844 related to biomass degradation.

    Science.gov (United States)

    Crucello, Aline; Sforça, Danilo Augusto; Horta, Maria Augusta Crivelente; dos Santos, Clelton Aparecido; Viana, Américo José Carvalho; Beloti, Lilian Luzia; de Toledo, Marcelo Augusto Szymanski; Vincentz, Michel; Kuroshu, Reginaldo Massanobu; de Souza, Anete Pereira

    2015-01-01

    Trichoderma harzianum IOC-3844 secretes high levels of cellulolytic-active enzymes and is therefore a promising strain for use in biotechnological applications in second-generation bioethanol production. However, the T. harzianum biomass degradation mechanism has not been well explored at the genetic level. The present work investigates six genomic regions (~150 kbp each) in this fungus that are enriched with genes related to biomass conversion. A BAC library consisting of 5,760 clones was constructed, with an average insert length of 90 kbp. The assembled BAC sequences revealed 232 predicted genes, 31.5% of which were related to catabolic pathways, including those involved in biomass degradation. An expression profile analysis based on RNA-Seq data demonstrated that putative regulatory elements, such as membrane transport proteins and transcription factors, are located in the same genomic regions as genes related to carbohydrate metabolism and exhibit similar expression profiles. Thus, we demonstrate a rapid and efficient tool that focuses on specific genomic regions by combining a BAC library with transcriptomic data. This is the first BAC-based structural genomic study of the cellulolytic fungus T. harzianum, and its findings provide new perspectives regarding the use of this species in biomass degradation processes.

  5. The bacterial artificial chromosome (BAC) library of the narrow-leafed lupin (Lupinus angustifolius L.)

    Czech Academy of Sciences Publication Activity Database

    Kasprzak, A.; Šafář, Jan; Janda, Jaroslav; Doležel, Jaroslav; Wolko, B.; Naganowska, B.

    2006-01-01

    Roč. 11, - (2006), s. 396-407 ISSN 1425-8153 R&D Projects: GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50380511 Keywords : BAC * genomic DNA library * Lupinus angustifolius L. Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 1.238, year: 2006

  6. Generation and analysis of a barcode-tagged insertion mutant library in the fission yeast Schizosaccharomyces pombe

    Science.gov (United States)

    2012-01-01

    Background Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF) haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques. Results An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites. Conclusions This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning approaches. PMID:22554201

  7. Generation and analysis of a barcode-tagged insertion mutant library in the fission yeast Schizosaccharomyces pombe

    Directory of Open Access Journals (Sweden)

    Chen Bo-Ruei

    2012-05-01

    Full Text Available Abstract Background Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques. Results An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites. Conclusions This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning

  8. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

    DEFF Research Database (Denmark)

    de Vries, Paul S; Sabater-Lleal, Maria; Chasman, Daniel I

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In...

  9. Generation and analysis of large-scale expressed sequence tags (ESTs from a full-length enriched cDNA library of porcine backfat tissue

    Directory of Open Access Journals (Sweden)

    Lee Hae-Young

    2006-02-01

    Full Text Available Abstract Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761. For all the expressed sequence tags (ESTs, approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp. Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46% and 3,232 singleton (65.54% ESTs. From a total of 5,008 unique sequences, 3,154 (62.98% were similar to other sequences, and 1,854 (37.02% were identified as having no hit or low identity (Sus scrofa. Gene ontology (GO annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64% and a small proportion of contigs (13.36%. Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the

  10. Welcome to the National Wetlands Research Center Library: Not Just Another Library-A Special Library

    Science.gov (United States)

    Broussard, Linda

    2007-01-01

    Libraries are grouped into four major types: public, school, academic, and special. The U.S. Geological Survey's (USGS) National Wetlands Research Center (NWRC) library is classified as a special library because it is sponsored by the Federal government, and the collections focus on a specific subject. The NWRC library is the only USGS library dedicated to wetland science. Library personnel offer expert research services to meet the informational needs of NWRC scientists, managers, and support personnel. The NWRC library participates in international cataloging and resource sharing, which allows libraries from throughout the world to borrow from its collections. This sharing facilitates the research of other governmental agencies, universities, and those interested in the study of wetlands.

  11. Interlibrary loan in primary access libraries: challenging the traditional view.

    Science.gov (United States)

    Dudden, R F; Coldren, S; Condon, J E; Katsh, S; Reiter, C M; Roth, P L

    2000-10-01

    Primary access libraries serve as the foundation of the National Network of Libraries of Medicine (NN/LM) interlibrary loan (ILL) hierarchy, yet few published reports directly address the important role these libraries play in the ILL system. This may reflect the traditional view that small, primary access libraries are largely users of ILL, rather than important contributors to the effectiveness and efficiency of the national ILL system. This study was undertaken to test several commonly held beliefs regarding ILL system use by primary access libraries. Three hypotheses were developed. HI: Colorado and Wyoming primary access libraries comply with the recommended ILL guideline of adhering to a hierarchical structure, emphasizing local borrowing. H2: The closures of two Colorado Council of Medical Librarians (CCML) primary access libraries in 1996 resulted in twenty-three Colorado primary access libraries' borrowing more from their state resource library in 1997. H3: The number of subscriptions held by Colorado and Wyoming primary access libraries is positively correlated with the number of items they loan and negatively correlated with the number of items they borrow. The hypotheses were tested using the 1992 and 1997 DOCLINE and OCLC data of fifty-four health sciences libraries, including fifty primary access libraries, two state resource libraries, and two general academic libraries in Colorado and Wyoming. The ILL data were obtained electronically and analyzed using Microsoft Word 98, Microsoft Excel 98, and JMP 3.2.2. CCML primary access libraries comply with the recommended guideline to emphasize local borrowing by supplying each other with the majority of their ILLs, instead of overburdening libraries located at higher levels in the ILL hierarchy (H1). The closures of two CCML primary access libraries appear to have affected the entire ILL system, resulting in a greater volume of ILL activity for the state resource library and other DOCLINE libraries higher

  12. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  13. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    Directory of Open Access Journals (Sweden)

    Karolina Chwialkowska

    2017-11-01

    Full Text Available Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq. We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation

  14. Assessing Library Automation and Virtual Library Development in Four Academic Libraries in Oyo, Oyo State, Nigeria

    Science.gov (United States)

    Gbadamosi, Belau Olatunde

    2011-01-01

    The paper examines the level of library automation and virtual library development in four academic libraries. A validated questionnaire was used to capture the responses from academic librarians of the libraries under study. The paper discovers that none of the four academic libraries is fully automated. The libraries make use of librarians with…

  15. Design of Concept Libraries for C++

    KAUST Repository

    Sutton, Andrew

    2012-01-01

    We present a set of concepts (requirements on template arguments) for a large subset of the ISO C++ standard library. The goal of our work is twofold: to identify a minimal and useful set of concepts required to constrain the library\\'s generic algorithms and data structures and to gain insights into how best to support such concepts within C++. We start with the design of concepts rather than the design of supporting language features; the language design must be made to fit the concepts, rather than the other way around. A direct result of the experiment is the realization that to simply and elegantly support generic programming we need two kinds of abstractions: constraints are predicates on static properties of a type, and concepts are abstract specifications of an algorithm\\'s syntactic and semantic requirements. Constraints are necessary building blocks of concepts. Semantic properties are represented as axioms. We summarize our approach: concepts = constraints + axioms. This insight is leveraged to develop a library containing only 14 concepts that encompassing the functional, iterator, and algorithm components of the C++ Standard Library (the STL). The concepts are implemented as constraint classes and evaluated using Clang\\'s and GCC\\'s Standard Library test suites. © 2012 Springer-Verlag.

  16. Public libraries and lifelong learning

    DEFF Research Database (Denmark)

    Nielsen, Bo Gerner; Borlund, Pia

    2015-01-01

    society as a result of easy and free access to information. A basic understanding of the concept is ‘learning throughout life, either continuously or periodically’. This implies that learning is not restricted to educational institutions, but can also take place in for example the public library. Public...... libraries thus may play an important role in supporting the learning process not the least because lifelong learning is characterised by the inclusion of informal elements of learning, flexible learning opportunities, and a shift towards selfdirected learning. This self-directed learning promotes active...... at teaching? The study reports on data from 12 interviews of purposely selected public librarians and a large-scale e-mail survey (questionnaire). The e-mail survey contained 28 questions and was sent to all staff members in public libraries in Denmark, and resulted in 986 responses. The results show...

  17. Medium-resolution isaac newton telescope library of empirical spectra

    NARCIS (Netherlands)

    Sanchez-Blazquez, P.; Peletier, R. F.; Jimenez-Vicente, J.; Cardiel, N.; Cenarro, A. J.; Falcon-Barroso, J.; Gorgas, J.; Selam, S.; Vazdekis, A.

    2006-01-01

    A new stellar library developed for stellar population synthesis modelling is presented. The library consists of 985 stars spanning a large range in atmospheric parameters. The spectra were obtained at the 2.5-m Isaac Newton Telescope and cover the range lambda lambda 3525-7500 angstrom at 2.3

  18. Library and information services: impact on patient care quality.

    Science.gov (United States)

    Marshall, Joanne Gard; Morgan, Jennifer Craft; Thompson, Cheryl A; Wells, Amber L

    2014-01-01

    The purpose of this paper is to explore library and information service impact on patient care quality. A large-scale critical incident survey of physicians and residents at 56 library sites serving 118 hospitals in the USA and Canada. Respondents were asked to base their answers on a recent incident in which they had used library resources to search for information related to a specific clinical case. Of 4,520 respondents, 75 percent said that they definitely or probably handled patient care differently using information obtained through the library. In a multivariate analysis, three summary clinical outcome measures were used as value and impact indicators: first, time saved; second, patient care changes; and third, adverse events avoided. The outcomes were examined in relation to four information access methods: first, asking librarian for assistance; second, performing search in a physical library; third, searching library's web site; or fourth, searching library resources on an institutional intranet. All library access methods had consistently positive relationships with the clinical outcomes, providing evidence that library services have a positive impact on patient care quality. Electronic collections and services provided by the library and the librarian contribute to patient care quality.

  19. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    Science.gov (United States)

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as

  20. Genomic research in Eucalyptus.

    Science.gov (United States)

    Poke, Fiona S; Vaillancourt, René E; Potts, Brad M; Reid, James B

    2005-09-01

    Eucalyptus L'Hérit. is a genus comprised of more than 700 species that is of vital importance ecologically to Australia and to the forestry industry world-wide, being grown in plantations for the production of solid wood products as well as pulp for paper. With the sequencing of the genomes of Arabidopsis thaliana and Oryza sativa and the recent completion of the first tree genome sequence, Populus trichocarpa, attention has turned to the current status of genomic research in Eucalyptus. For several eucalypt species, large segregating families have been established, high-resolution genetic maps constructed and large EST databases generated. Collaborative efforts have been initiated for the integration of diverse genomic projects and will provide the framework for future research including exploiting the sequence of the entire eucalypt genome which is currently being sequenced. This review summarises the current position of genomic research in Eucalyptus and discusses the direction of future research.

  1. A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana

    OpenAIRE

    Nowell, RW; Elsworth, B; Oostra, Vicencio; Zwaan, Bas J.; Wheat, Christopher West; Saastamoinen, Marjo Anna Kaarina; Saccheri, Ilik; Van't Hof, AE; Wasik, BR; Connahs, H; Kumar, S; Challis, RJ; Aslam, L; Monteiro, Antonia; Brakefield, Paul M.

    2017-01-01

    Background: The mycalesine butterfly Bicyclus anynana, the 'Squinting bush brown', is a model organism in the study of lepidopteran ecology, development and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Findings: Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology...

  2. Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

    Directory of Open Access Journals (Sweden)

    Edgar E. Lara-Ramírez

    2014-01-01

    Full Text Available The increasing number of dengue virus (DENV genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4 has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3 as well as the effective number of codons (ENC, ENCp versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA and clustering analysis on relative synonymous codon usage (RSCU within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution.

  3. MARKETING LIBRARY SERVICES IN ACADEMIC LIBRARIES: A ...

    African Journals Online (AJOL)

    MARKETING LIBRARY SERVICES IN ACADEMIC LIBRARIES: A TOOL FOR SURVIVAL IN THE ... This article discusses the concept of marketing library and information services as an ... EMAIL FREE FULL TEXT EMAIL FREE FULL TEXT

  4. STP: A mathematically and physically consistent library of steam properties

    International Nuclear Information System (INIS)

    Aguilar, F.; Hutter, A.C.; Tuttle, P.G.

    1982-01-01

    A new FORTRAN library of subroutines has been developed from the fundamental equation of Keenan et al. to evaluate a large set of water properties including derivatives such as sound speed and isothermal compressibility. The STP library uses the true saturation envelope of the Keenan et al. fundamental equation. The evaluation of the true envelope by a continuation method is explained. This envelope, along with other design features, imparts an exceptionally high degree of thermodynamic and mathematical consistency to the STP library, even at the critical point. Accuracy and smoothness, library self-consistency, and designed user convenience make the STP library a reliable and versatile water property package

  5. Up-date of the BCG code library

    International Nuclear Information System (INIS)

    Caldeira, A.D.; Garcia, R.D.M.

    1990-01-01

    Procedures for generating an up-date material library for the BCG code were established. A new library was generated by processing ENDF/B-IV data with the 89-1 version of the LINEAR, RECENT and SIGMA1 programs. The effect of library change in the neutron spectrum and effective multiplication factor of a fast reactor cell was analized. During the course of this study, an error was detected in the BCG code. Although localized in a narrow energy range, the discrepancies in neutron spectrum caused by the error were large enough to yield a difference of about 1% in the effective multiplication factor of the test cell. (author)

  6. Libraries and Accessibility: Istanbul Public Libraries Case

    Directory of Open Access Journals (Sweden)

    Gül Yücel

    2016-12-01

    Full Text Available In the study; the assessment of accessibility has been conducted in Istanbul public libraries within the scope of public area. Public libraries commonly serve with its user of more than 20 million in total, spread to the general of Turkey, having more than one thousand branches in the centrums and having more than one million registered members. The building principles and standards covering the subjects such as the selection of place, historical and architectural specification of the region, distance to the centre of population and design in a way that the disabled people could benefit from the library services fully have been determined with regulations in the construction of new libraries. There are works for the existent libraries such as access for the disabled, fire safety precautions etc. within the scope of the related standards. Easy access by everyone is prioritized in the public libraries having a significant role in life-long learning. The purpose of the study is to develop solution suggestions for the accessibility problems in the public libraries. The study based on the eye inspection and assessments carried out within the scope of accessibility in the public libraries subsidiary to Istanbul Culture and Tourism Provincial Directorate Library and Publications Department within the provincial borders of Istanbul. The arrangements such as reading halls, study areas, book shelves etc. have been examined within the frame of accessible building standards. Building entrances, ramps and staircases, horizontal and vertical circulation of building etc. have been taken into consideration within the scope of accessible building standards. The subjects such as the reading and studying areas and book shelf arrangements for the library have been assessed within the scope of specific buildings. There are a total of 34 public libraries subsidiary to Istanbul Culture and Tourism Provincial Directorate on condition that 20 ea. of them are in the

  7. Hyb-Seq: Combining Target Enrichment and Genome Skimming for Plant Phylogenomics

    Directory of Open Access Journals (Sweden)

    Kevin Weitemier

    2014-08-01

    Full Text Available Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics.

  8. Genome shuffling of Lactobacillus plantarum C88 improves adhesion.

    Science.gov (United States)

    Zhao, Yujuan; Duan, Cuicui; Gao, Lei; Yu, Xue; Niu, Chunhua; Li, Shengyu

    2017-01-01

    Genome shuffling is an important method for rapid improvement in microbial strains for desired phenotypes. In this study, ultraviolet irradiation and nitrosoguanidine were used as mutagens to enhance the adhesion of the wild-type Lactobacillus plantarum C88. Four strains with better property were screened after mutagenesis to develop a library of parent strains for three rounds of genome shuffling. Fusants F3-1, F3-2, F3-3, and F3-4 were screened as the improved strains. The in vivo and in vitro tests results indicated that the population after three rounds of genome shuffling exhibited improved adhesive property. Random Amplified Polymorphic DNA results showed significant differences between the parent strain and recombinant strains at DNA level. These results suggest that the adhesive property of L. plantarum C88 can be significantly improved by genome shuffling. Improvement in the adhesive property of bacterial cells by genome shuffling enhances the colonization of probiotic strains which further benefits to exist probiotic function.

  9. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: It is difficult to identify copy number variations (CNV in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs; CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR. CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.

  10. The current connections between the library and bookselling activity with an overview of the public library development

    Directory of Open Access Journals (Sweden)

    Miha Kovač

    2006-01-01

    Full Text Available Among the set goals of improving book consumption in Slovenia the National Culture Program 2004-2007 also includes encouraging the development of network of bookshops and ‘quality’ bookshops, that will be evenly distributed across Slovenia. The program suggests them to be formed within the framework of already existing public cultural institutions. Book consumption in Slovenia has been characterised by poorly developed retail book market and well developed network of public libraries with a high number of book lendings that serves as a substitute for the paperback market as it is known in bigger language communities abroad. Along with surveying the possibilities of directing the libraries towards book marketing (as well, the paper also examines the historical development of commercial publishing companies / bookshops / libraries, existing in Europe in 18th and 19th century and in some places existing side by side with public libraries until the middle of 20th century. The paper shows that the present ratio between bookshops and libraries on some big book markets is different than in Slovenia, as libraries are loosing patrons as a consequence of the growth of big bookshop chains. A characteristic feature of book consumption in Slovenia can be seen in the use of the internet in libraries as well as bookshops. The ‘merging’ of the marketing segment and book lending is to a smaller extent already under way abroad, where large online bookshops encourage libraries by means of provision to act as the mediator in the sale of books. In this way, a library can turn into a bookshop with the help of connections via ISBN numbers, without having to jeopardise their original mission.

  11. Systematic hybrid LOH: a new method to reduce false positives and negatives during screening of yeast gene deletion libraries

    DEFF Research Database (Denmark)

    Alvaro, D.; Sunjevaric, I.; Reid, R. J.

    2006-01-01

    We have developed a new method, systematic hybrid loss of heterozygosity, to facilitate genomic screens utilizing the yeast gene deletion library. Screening is performed using hybrid diploid strains produced through mating the library haploids with strains from a different genetic background......, to minimize the contribution of unpredicted recessive genetic factors present in the individual library strains. We utilize a set of strains where each contains a conditional centromere construct on one of the 16 yeast chromosomes that allows the destabilization and selectable loss of that chromosome. After...... complementation of any spurious recessive mutations in the library strain, facilitating attribution of the observed phenotype to the documented gene deletion and dramatically reducing false positive results commonly obtained in library screens. The systematic hybrid LOH method can be applied to virtually any...

  12. Public libraries in the library regions in the year 2009

    Directory of Open Access Journals (Sweden)

    Milena Bon

    2011-01-01

    Full Text Available Purpose: Regional public libraries were initiated in 2003 to connect professional activities of libraries within regional networks and to ensure coordinated library development in a region in cooperation with the Library System Development Centre at the National and University Library performing a coordinating role. The article analyses the performance of public libraries and their integration in regional library networks in order to find out the level of development of conditions of performance of public libraries.Methodology/approach: Statistical data for the year 2009 were the basis for the overview of library activities of ten library regions with regard to applicable legislation and library standards. The level of regional library activities is compared to the socio-economic situation of statistical regions thus representing a new approach to the presentation of Slovenian’s public libraries’ development.Results: Absolute values indicate better development of nine libraries in the central Slovenia region while relative values offer a totally different picture. Four libraries in the region of Nova Gorica prove the highest level of development.Research limitation: Research is limited to the year 2009 and basic statistical analysis.Originality/practical implications: Findings of the analysis are useful for public libraries to plan their development strategy within a region and for financial bodies to provide for adequate financing for library activities in a specific region. The basic condition for successful public library performance is the even and harmonized development of conditions of performance as recommended by library standards.

  13. Deep whole-genome sequencing of 90 Han Chinese genomes.

    Science.gov (United States)

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000

  14. High Throughput Screening in Duchenne Muscular Dystrophy: From Drug Discovery to Functional Genomics

    OpenAIRE

    Thomas J.J. Gintjee; Alvin S.H. Magh; Carmen Bertoni

    2014-01-01

    Centers for the screening of biologically active compounds and genomic libraries are becoming common in the academic setting and have enabled researchers devoted to developing strategies for the treatment of diseases or interested in studying a biological phenomenon to have unprecedented access to libraries that, until few years ago, were accessible only by pharmaceutical companies. As a result, new drugs and genetic targets have now been identified for the treatment of Duchenne muscular dyst...

  15. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    Directory of Open Access Journals (Sweden)

    Ritland Carol

    2009-08-01

    Full Text Available Abstract Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs and full-length (FLcDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR and a cytochrome P450 (CYP720B4 from a non-arrayed genomic BAC library of white spruce (Picea glauca. Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR and 94 kbp (CYP720B4 long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs, high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene

  16. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    Science.gov (United States)

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The

  17. Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes.

    Science.gov (United States)

    Feschotte, Cédric; Keswani, Umeshkumar; Ranganathan, Nirmal; Guibotsy, Marcel L; Levine, David

    2009-07-23

    Eukaryotic genomes contain large amount of repetitive DNA, most of which is derived from transposable elements (TEs). Progress has been made to develop computational tools for ab initio identification of repeat families, but there is an urgent need to develop tools to automate the annotation of TEs in genome sequences. Here we introduce REPCLASS, a tool that automates the classification of TE sequences. Using control repeat libraries, we show that the program can classify accurately virtually any known TE types. Combining REPCLASS to ab initio repeat finding in the genomes of Caenorhabditis elegans and Drosophila melanogaster allowed us to recover the contrasting TE landscape characteristic of these species. Unexpectedly, REPCLASS also uncovered several novel TE families in both genomes, augmenting the TE repertoire of these model species. When applied to the genomes of distant Caenorhabditis and Drosophila species, the approach revealed a remarkable conservation of TE composition profile within each genus, despite substantial interspecific covariations in genome size and in the number of TEs and TE families. Lastly, we applied REPCLASS to analyze 10 fungal genomes from a wide taxonomic range, most of which have not been analyzed for TE content previously. The results showed that TE diversity varies widely across the fungi "kingdom" and appears to positively correlate with genome size, in particular for DNA transposons. Together, these data validate REPCLASS as a powerful tool to explore the repetitive DNA landscapes of eukaryotes and to shed light onto the evolutionary forces shaping TE diversity and genome architecture.

  18. Conserved microstructure of the Brassica B Genome of Brassica nigra in relation to homologous regions of Arabidopsis thaliana, B. rapa and B. oleracea

    Science.gov (United States)

    2013-01-01

    Background The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. Results Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. Conclusions Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage. PMID:23586706

  19. High-throughput automated microfluidic sample preparation for accurate microbial genomics.

    Science.gov (United States)

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C

    2017-01-27

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications.

  20. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  1. LRSim: A Linked-Reads Simulator Generating Insights for Better Genome Partitioning

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    Full Text Available Linked-read sequencing, using highly-multiplexed genome partitioning and barcoding, can span hundreds of kilobases to improve de novo assembly, haplotype phasing, and other applications. Based on our analysis of 14 datasets, we introduce LRSim that simulates linked-reads by emulating the library preparation and sequencing process with fine control over variants, linked-read characteristics, and the short-read profile. We conclude from the phasing and assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing genomes of different sizes and complexities. These optimizations improve results by orders of magnitude, and enable the development of novel methods. LRSim is available at https://github.com/aquaskyline/LRSIM. Keywords: Linked-read, Molecular barcoding, Reads partitioning, Phasing, Reads simulation, Genome assembly, 10X Genomics

  2. Ocean biogeochemistry modeled with emergent trait-based genomics

    Science.gov (United States)

    Coles, V. J.; Stukel, M. R.; Brooks, M. T.; Burd, A.; Crump, B. C.; Moran, M. A.; Paul, J. H.; Satinsky, B. M.; Yager, P. L.; Zielinski, B. L.; Hood, R. R.

    2017-12-01

    Marine ecosystem models have advanced to incorporate metabolic pathways discovered with genomic sequencing, but direct comparisons between models and “omics” data are lacking. We developed a model that directly simulates metagenomes and metatranscriptomes for comparison with observations. Model microbes were randomly assigned genes for specialized functions, and communities of 68 species were simulated in the Atlantic Ocean. Unfit organisms were replaced, and the model self-organized to develop community genomes and transcriptomes. Emergent communities from simulations that were initialized with different cohorts of randomly generated microbes all produced realistic vertical and horizontal ocean nutrient, genome, and transcriptome gradients. Thus, the library of gene functions available to the community, rather than the distribution of functions among specific organisms, drove community assembly and biogeochemical gradients in the model ocean.

  3. Library Information System Time-Sharing (LISTS) Project. Final Report.

    Science.gov (United States)

    Black, Donald V.

    The Library Information System Time-Sharing (LISTS) experiment was based on three innovations in data processing technology: (1) the advent of computer time-sharing on third-generation machines, (2) the development of general-purpose file-management software and (3) the introduction of large, library-oriented data bases. The main body of the…

  4. Extreme-Scale De Novo Genome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Georganas, Evangelos [Intel Corporation, Santa Clara, CA (United States); Hofmeyr, Steven [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Egan, Rob [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Rokhsar, Daniel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.

    2017-09-26

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.

  5. Functional Screening of Antibiotic Resistance Genes from a Representative Metagenomic Library of Food Fermenting Microbiota

    Directory of Open Access Journals (Sweden)

    Chiara Devirgiliis

    2014-01-01

    Full Text Available Lactic acid bacteria (LAB represent the predominant microbiota in fermented foods. Foodborne LAB have received increasing attention as potential reservoir of antibiotic resistance (AR determinants, which may be horizontally transferred to opportunistic pathogens. We have previously reported isolation of AR LAB from the raw ingredients of a fermented cheese, while AR genes could be detected in the final, marketed product only by PCR amplification, thus pointing at the need for more sensitive microbial isolation techniques. We turned therefore to construction of a metagenomic library containing microbial DNA extracted directly from the food matrix. To maximize yield and purity and to ensure that genomic complexity of the library was representative of the original bacterial population, we defined a suitable protocol for total DNA extraction from cheese which can also be applied to other lipid-rich foods. Functional library screening on different antibiotics allowed recovery of ampicillin and kanamycin resistant clones originating from Streptococcus salivarius subsp. thermophilus and Lactobacillus helveticus genomes. We report molecular characterization of the cloned inserts, which were fully sequenced and shown to confer AR phenotype to recipient bacteria. We also show that metagenomics can be applied to food microbiota to identify underrepresented species carrying specific genes of interest.

  6. Cloud computing for comparative genomics.

    Science.gov (United States)

    Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

    2010-05-18

    Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.

  7. Libraries Today, Libraries Tomorrow: Contemporary Library Practices and the Role of Library Space in the L

    Directory of Open Access Journals (Sweden)

    Ana Vogrinčič Čepič

    2013-09-01

    Full Text Available ABSTRACTPurpose: The article uses sociological concepts in order to rethink the changes in library practices. Contemporary trends are discussed with regard to the changing nature of working habits, referring mostly to the new technology, and the (emergence of the third space phenomenon. The author does not regard libraries only as concrete public service institutions, but rather as complex cultural forms, taking in consideration wider social context with a stress on users’ practices in relation to space.Methodology/approach: The article is based on the (self- observation of the public library use, and on the (discourse analysis of internal library documents (i.e. annual reports and plans and secondary sociological literature. As such, the cultural form approach represents a classic method of sociology of culture.Results: The study of relevant material in combination with direct personal experiences reveals socio-structural causes for the change of users’ needs and habits, and points at the difficulty of spatial redefinition of libraries as well as at the power of the discourse.Research limitations: The article is limited to an observation of users’ practices in some of the public libraries in Ljubljana and examines only a small number of annual reports – the discoveries are then further debated from the sociological perspective.Originality/practical implications: The article offers sociological insight in the current issues of the library science and tries to suggest a wider explanation that could answer some of the challenges of the contemporary librarianship.

  8. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium

    OpenAIRE

    Henrique Machado; Henrique Machado; Lone Gram

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationship...

  9. National Libraries Section. General Research Libraries Division. Papers.

    Science.gov (United States)

    International Federation of Library Associations, The Hague (Netherlands).

    Papers on national library services and activities, which were presented at the 1983 International Federation of Library Associations (IFLA) conference, include: (1) "The National Library of China in its Gradual Application of Modern Technology," a discussion by Zhu Nan and Zhu Yan (China) of microform usage and library automation; (2)…

  10. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome.

    Directory of Open Access Journals (Sweden)

    Tim van Opijnen

    2016-09-01

    Full Text Available The interaction between an antibiotic and bacterium is not merely restricted to the drug and its direct target, rather antibiotic induced stress seems to resonate through the bacterium, creating selective pressures that drive the emergence of adaptive mutations not only in the direct target, but in genes involved in many different fundamental processes as well. Surprisingly, it has been shown that adaptive mutations do not necessarily have the same effect in all species, indicating that the genetic background influences how phenotypes are manifested. However, to what extent the genetic background affects the manner in which a bacterium experiences antibiotic stress, and how this stress is processed is unclear. Here we employ the genome-wide tool Tn-Seq to construct daptomycin-sensitivity profiles for two strains of the bacterial pathogen Streptococcus pneumoniae. Remarkably, over half of the genes that are important for dealing with antibiotic-induced stress in one strain are dispensable in another. By confirming over 100 genotype-phenotype relationships, probing potassium-loss, employing genetic interaction mapping as well as temporal gene-expression experiments we reveal genome-wide conditionally important/essential genes, we discover roles for genes with unknown function, and uncover parts of the antibiotic's mode-of-action. Moreover, by mapping the underlying genomic network for two query genes we encounter little conservation in network connectivity between strains as well as profound differences in regulatory relationships. Our approach uniquely enables genome-wide fitness comparisons across strains, facilitating the discovery that antibiotic responses are complex events that can vary widely between strains, which suggests that in some cases the emergence of resistance could be strain specific and at least for species with a large pan-genome less predictable.

  11. [Genome editing of industrial microorganism].

    Science.gov (United States)

    Zhu, Linjiang; Li, Qi

    2015-03-01

    Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.

  12. LIBRARY SKILL INSTRUCTION IN NIGERIAN ACADEMIC LIBRARIES

    African Journals Online (AJOL)

    DJFLEX

    www.globaljournalseries.com; Info@globaljournalseries.com. LIBRARY SKILL INSTRUCTION IN NIGERIAN ACADEMIC. LIBRARIES. P. C. AZIAGBA AND E. H. UZOEZI. (Received 10, September 2009; Revision Accepted 8, February 2010). ABSTRACT. This survey was undertaken to portray the level of library involvement ...

  13. A Genome-Wide Landscape of Retrocopies in Primate Genomes.

    Science.gov (United States)

    Navarro, Fábio C P; Galante, Pedro A F

    2015-07-29

    Gene duplication is a key factor contributing to phenotype diversity across and within species. Although the availability of complete genomes has led to the extensive study of genomic duplications, the dynamics and variability of gene duplications mediated by retrotransposition are not well understood. Here, we predict mRNA retrotransposition and use comparative genomics to investigate their origin and variability across primates. Analyzing seven anthropoid primate genomes, we found a similar number of mRNA retrotranspositions (∼7,500 retrocopies) in Catarrhini (Old Word Monkeys, including humans), but a surprising large number of retrocopies (∼10,000) in Platyrrhini (New World Monkeys), which may be a by-product of higher long interspersed nuclear element 1 activity in these genomes. By inferring retrocopy orthology, we dated most of the primate retrocopy origins, and estimated a decrease in the fixation rate in recent primate history, implying a smaller number of species-specific retrocopies. Moreover, using RNA-Seq data, we identified approximately 3,600 expressed retrocopies. As expected, most of these retrocopies are located near or within known genes, present tissue-specific and even species-specific expression patterns, and no expression correlation to their parental genes. Taken together, our results provide further evidence that mRNA retrotransposition is an active mechanism in primate evolution and suggest that retrocopies may not only introduce great genetic variability between lineages but also create a large reservoir of potentially functional new genomic loci in primate genomes. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Life-cycle and genome of OtV5, a large DNA virus of the pelagic marine unicellular green alga Ostreococcus tauri.

    Directory of Open Access Journals (Sweden)

    Evelyne Derelle

    Full Text Available Large DNA viruses are ubiquitous, infecting diverse organisms ranging from algae to man, and have probably evolved from an ancient common ancestor. In aquatic environments, such algal viruses control blooms and shape the evolution of biodiversity in phytoplankton, but little is known about their biological functions. We show that Ostreococcus tauri, the smallest known marine photosynthetic eukaryote, whose genome is completely characterized, is a host for large DNA viruses, and present an analysis of the life-cycle and 186,234 bp long linear genome of OtV5. OtV5 is a lytic phycodnavirus which unexpectedly does not degrade its host chromosomes before the host cell bursts. Analysis of its complete genome sequence confirmed that it lacks expected site-specific endonucleases, and revealed the presence of 16 genes whose predicted functions are novel to this group of viruses. OtV5 carries at least one predicted gene whose protein closely resembles its host counterpart and several other host-like sequences, suggesting that horizontal gene transfers between host and viral genomes may occur frequently on an evolutionary scale. Fifty seven percent of the 268 predicted proteins present no similarities with any known protein in Genbank, underlining the wealth of undiscovered biological diversity present in oceanic viruses, which are estimated to harbour 200Mt of carbon.

  15. E-library Implementation in Library University of Riau

    Science.gov (United States)

    Yuhelmi; Rismayeti

    2017-12-01

    This research aims to see how the e-book implementation in Library University of Riau and the obstacle in its implementation. In the Globalization era, digital libraries should be developed or else it will decrease the readers’ interest, with the recent advanced technology, digital libraries are one of the learning tools that can be used to finding an information through the internet access, hence digital libraries or commonly known as E-Library is really helping the students and academic community in finding information. The methods that used in this research is Observation, Interview, and Literature Study. The respondents in this research are the staff who involved in the process of digitization in Library University of Riau. The result of this research shows that implementation of e-library in Library University of Riau is already filled the user needs for now, although there is obstacle faced just like technical problems for example the internet connection speed and the technical problem to convert the format from Microsoft Word .doc to Adobe.pdf

  16. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  17. Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.

    Science.gov (United States)

    Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

    2007-09-18

    Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.

  18. Comparison of techniques for quantification of next-generation sequencing libraries

    DEFF Research Database (Denmark)

    Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt

    2015-01-01

    by quantifying NGS libraries for the Ion TorrentTM and Illumina1 platforms as well as dsDNA oligos with known DNA concentrations. Rather large variations in library concentration estimates were observed. The differences between the highest and lowest concentration estimates varied with a factor of 5...

  19. Marketing library and information services in academic libraries in ...

    African Journals Online (AJOL)

    Marketing library and information services in academic libraries in Niger State, Nigeria. ... This study was designed to investigate the marketing of library services in academic libraries in Niger state, ... EMAIL FULL TEXT EMAIL FULL TEXT

  20. Information Literacy and the Public Library

    DEFF Research Database (Denmark)

    Nielsen, Bo Gerner; Borlund, Pia

    2013-01-01

    This paper reports on the results of an empirical study of Danish public librarians’ conceptions of information literacy and user education in order to support and optimize lifelong learning of library users. The study builds on data from interviews of purposely selected public librarians...... and a large-scale e-mail survey (questionnaire). The results show that the public librarians consider the public library an important place for learning, but also that they do not share a common understanding of the concepts of information literacy and lifelong learning. The study further reveals a diversity...

  1. Systematic cloning of human minisatellites from ordered array charomid libraries.

    Science.gov (United States)

    Armour, J A; Povey, S; Jeremiah, S; Jeffreys, A J

    1990-11-01

    We present a rapid and efficient method for the isolation of minisatellite loci from human DNA. The method combines cloning a size-selected fraction of human MboI DNA fragments in a charomid vector with hybridization screening of the library in ordered array. Size-selection of large MboI fragments enriches for the longer, more variable minisatellites and reduces the size of the library required. The library was screened with a series of multi-locus probes known to detect a large number of hypervariable loci in human DNA. The gridded library allowed both the rapid processing of positive clones and the comparative evaluation of the different multi-locus probes used, in terms of both the relative success in detecting hypervariable loci and the degree of overlap between the sets of loci detected. We report 23 new human minisatellite loci isolated by this method, which map to 14 autosomes and the sex chromosomes.

  2. Assembly of the Boechera retrofracta Genome and Evolutionary Analysis of Apomixis-Associated Genes

    Directory of Open Access Journals (Sweden)

    Sergei Kliver

    2018-03-01

    Full Text Available Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of apomictic species. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. Using a hybrid approach that combines homology-based and de novo methods 27,048 protein-coding genes were predicted. Also repeats, transfer RNA (tRNA and ribosomal RNA (rRNA genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. In addition, we explored the histidine exonuclease APOLLO locus, related to apomixis in Boechera, and proposed model of its evolution through the series of duplications. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species.

  3. Genomic prediction using subsampling

    OpenAIRE

    Xavier, Alencar; Xu, Shizhong; Muir, William; Rainey, Katy Martin

    2017-01-01

    Background Genome-wide assisted selection is a critical tool for the?genetic improvement of plants and animals. Whole-genome regression models in Bayesian framework represent the main family of prediction methods. Fitting such models with a large number of observations involves a prohibitive computational burden. We propose the use of subsampling bootstrap Markov chain in genomic prediction. Such method consists of fitting whole-genome regression models by subsampling observations in each rou...

  4. Systematic CpT (ApG) Depletion and CpG Excess Are Unique Genomic Signatures of Large DNA Viruses Infecting Invertebrates

    Science.gov (United States)

    Upadhyay, Mohita; Sharma, Neha; Vivekanandan, Perumal

    2014-01-01

    Differences in the relative abundance of dinucleotides, if any may provide important clues on host-driven evolution of viruses. We studied dinucleotide frequencies of large DNA viruses infecting vertebrates (n = 105; viruses infecting mammals = 99; viruses infecting aves = 6; viruses infecting reptiles = 1) and invertebrates (n = 88; viruses infecting insects = 84; viruses infecting crustaceans = 4). We have identified systematic depletion of CpT(ApG) dinucleotides and over-representation of CpG dinucleotides as the unique genomic signature of large DNA viruses infecting invertebrates. Detailed investigation of this unique genomic signature suggests the existence of invertebrate host-induced pressures specifically targeting CpT(ApG) and CpG dinucleotides. The depletion of CpT dinucleotides among large DNA viruses infecting invertebrates is at least in part, explained by non-canonical DNA methylation by the infected host. Our findings highlight the role of invertebrate host-related factors in shaping virus evolution and they also provide the necessary framework for future studies on evolution, epigenetics and molecular biology of viruses infecting this group of hosts. PMID:25369195

  5. Personal Virtual Libraries

    Science.gov (United States)

    Pappas, Marjorie L.

    2004-01-01

    Virtual libraries are becoming more and more common. Most states have a virtual library. A growing number of public libraries have a virtual presence on the Web. Virtual libraries are a growing addition to school library media collections. The next logical step would be personal virtual libraries. A personal virtual library (PVL) is a collection…

  6. The Availability of Web 2.0 Tools from Community College Libraries' Websites Serving Large Student Bodies

    Science.gov (United States)

    Blummer, Barbara; Kenton, Jeffrey M.

    2014-01-01

    Web 2.0 tools offer academic libraries new avenues for delivering services and resources to students. In this research we report on a content analysis of 100 US community college libraries' Websites for the availability of Web 2.0 applications. We found Web 2.0 tools utilized by 97% of our sample population and many of these sites contained more…

  7. Ensembl 2002: accommodating comparative genomics.

    Science.gov (United States)

    Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

    2003-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

  8. The Ensembl genome database project.

    Science.gov (United States)

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  9. Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

    Science.gov (United States)

    Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629

  10. America's Star Libraries

    Science.gov (United States)

    Lyons, Ray; Lance, Keith Curry

    2009-01-01

    "Library Journal"'s new national rating of public libraries, the "LJ" Index of Public Library Service, identifies 256 "star" libraries. It rates 7,115 public libraries. The top libraries in each group get five, four, or three Michelin guide-like stars. All included libraries, stars or not, can use their scores to learn from their peers and improve…

  11. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  12. Generation of a BAC-based physical map of the melon genome

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2010-05-01

    Full Text Available Abstract Background Cucumis melo (melon belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has high intra-specific genetic variation, morphologic diversity and a small genome size (450 Mb, which make this species suitable for a great variety of molecular and genetic studies that can lead to the development of tools for breeding varieties of the species. A number of genetic and genomic resources have already been developed, such as several genetic maps and BAC genomic libraries. These tools are essential for the construction of a physical map, a valuable resource for map-based cloning, comparative genomics and assembly of whole genome sequencing data. However, no physical map of any Cucurbitaceae has yet been developed. A project has recently been started to sequence the complete melon genome following a whole-genome shotgun strategy, which makes use of massive sequencing data. A BAC-based melon physical map will be a useful tool to help assemble and refine the draft genome data that is being produced. Results A melon physical map was constructed using a 5.7 × BAC library and a genetic map previously developed in our laboratories. High-information-content fingerprinting (HICF was carried out on 23,040 BAC clones, digesting with five restriction enzymes and SNaPshot labeling, followed by contig assembly with FPC software. The physical map has 1,355 contigs and 441 singletons, with an estimated physical length of 407 Mb (0.9 × coverage of the genome and the longest contig being 3.2 Mb. The anchoring of 845 BAC clones to 178 genetic markers (100 RFLPs, 76 SNPs and 2 SSRs also allowed the genetic positioning of 183 physical map contigs/singletons, representing 55 Mb (12% of the melon genome, to individual chromosomal loci. The melon FPC database is available for download at http://melonomics.upv.es/static/files/public/physical_map/. Conclusions Here we report the construction

  13. GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

    NARCIS (Netherlands)

    K. Estrada Gil (Karol); A. Abuseiris (Anis); F.G. Grosveld (Frank); A.G. Uitterlinden (André); T.A. Knoch (Tobias); F. Rivadeneira Ramirez (Fernando)

    2009-01-01

    textabstractThe current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic

  14. Application of LOD Technology in German Libraries and Archives

    Directory of Open Access Journals (Sweden)

    Dong Jie

    2017-12-01

    Full Text Available [Purpose/significance] Linked Open Data (LOD has been widely used in large industries, as well as non-profit organizations and government organizations. Libraries and archives are ones of the early adopters of LOD technology. Libraries and archives promote the development of LOD. Germany is one of the developed countries in the libraries and archives industry, and there are many successful cases about the application of LOD in the libraries and archives. [Method/process] This paper analyzed the successful application of LOD technology in German libraries and archives by using the methods of document investigation, network survey and content analysis. [Result/conclusion] These cases reveal in the traditional field of computer science the relationship among research topics related to libraries and archives such as artificial intelligence, database and knowledge discovery. Summing up the characteristics and experience of German practice can provide more reference value for the development of relevant practice in China.

  15. Research on the Impact of a Computerized Circulation System on the Performance of a Large College Library. Final Report.

    Science.gov (United States)

    Frohmberg, Katherine A.; Moffett, William A.

    In order to study the effects of introducing an automated circulation system at Oberlin College, Ohio, data were collected from September 1978 until June 1982 on book availability, usage of library facilities, attitudes of library users toward the library, and the efficiency of circulation activities. Data collection methods included circulation…

  16. The Facebook challenge for public libraries in Romania

    Directory of Open Access Journals (Sweden)

    Octavia-Luciana Madge

    2014-01-01

    Full Text Available Social networks have rapidly won their place in our life and in the activity of many private and public organizations and institutions. At library level, social networks have demonstrated their role as good marketing tools and their utility for a better communication with users. Public libraries in Romania have lately experienced great transformations in terms of the approach of their relationship with the users and the improvement of their activity. They have also launched a series of new services in order to meet the needs of the community they serve and to attract new users. Implementing and using new applications such as online social networks, more exactly Facebook was one of these recent developments. This paper analyzes the way in which social networks are used at the level of public libraries in Romania by the example of three large public libraries and the way in which these libraries advertise their services through Facebook.

  17. Cloud computing for comparative genomics

    Directory of Open Access Journals (Sweden)

    Pivovarov Rimma

    2010-05-01

    Full Text Available Abstract Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD, to run within Amazon's Elastic Computing Cloud (EC2. We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more t