Genome editing with engineered nucleases (zinc finger nucleases, TAL effector nucleases s and Clustered regularly inter-spaced short palindromic repeats/CRISPR-associated) has recently been shown to have great promise in a variety of therapeutic and biotechnological applications. However, their exploitation in genetic analysis and clinical settings largely depends on their specificity for the intended genomic target. Large and complex genomes often contain highly homologous/repetitive sequences, which limits the specificity of genome editing tools and could result in off-target activity. Over the past few years, various computational approaches have been developed to assist the design process and predict/reduce the off-target activity of these nucleases. These tools could be efficiently used to guide the design of constructs for engineered nucleases and evaluate results after genome editing. This review provides a comprehensive overview of various databases, tools, web servers and resources for genome editing and compares their features and functionalities. Additionally, it also describes tools that have been developed to analyse post-genome editing results. The article also discusses important design parameters that could be considered while designing these nucleases. This review is intended to be a quick reference guide for experimentalists as well as computational biologists working in the field of genome editing with engineered nucleases. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Kapopoulou, Adamandia; Lew, Jocelyne M; Cole, Stewart T
In this paper, we present the MycoBrowser portal (http://mycobrowser.epfl.ch/), a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. A central component of MycoBrowser is TubercuList (http://tuberculist.epfl.ch), which has recently benefited from a new data management system and web interface. These improvements were extended to all MycoBrowser databases. We provide an overview of the functionalities available and the different ways of interrogating the data then discuss how both the new information and the latest features are helping the mycobacterial research communities. Copyright © 2010 Elsevier Ltd. All rights reserved.
The medaka Oryzias latipes is a small egg-laying freshwater teleost, and has become an excellent model system for developmental genetics and evolutionary biology. The medaka genome is relatively small in size, approximately 800 Mb, and the genome sequencing project was recently completed by Japanese research groups, providing a high-quality draft genome sequence of the inbred Hd-rR strain of medaka. In this review, I present an overview of the medaka genome project including genome resources, followed by specific findings obtained with the medaka draft genome. In particular, I focus on the analysis that was done by taking advantage of the medaka system, such as the sex chromosome differentiation and the regional history of medaka species using single nucleotide polymorphisms as genomic markers.
Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F; Li, Shuaicheng; Hu, Kailin
The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum.
Madison I. Dunitz
Full Text Available The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.
Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X
The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.
Weber, Tilmann; Blin, Kai; Duddela, Srikanth
Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we...... introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration...... of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products...
Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H
Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
This page provides links to research resources, complied by the Epidemiology and Genomics Research Program, that may be of interest to genetic epidemiologists conducting cancer research, but is not exhaustive.
Azam Qureshi, Matloob; Rotenberg, Eva; Stærfeldt, Hans Henrik
with scripts and algorithms developed in a variety of programming languages at the Centre for Biological Sequence Analysis in order to create a three-tier software application for genome analysis. The results are made available via a web interface developed in Java, PHP and Perl CGI. User...
The Department of Energy (DOE) and its predecessor agencies have a long history of epidemiologic research programs. The main focus of these programs has been the Health and Mortality Study of the DOE work force. This epidemiologic study began in 1964 with a feasibility study of workers at the Hanford facility. Studies of other populations exposed to radiation have also been supported, including the classic epidemiologic study of radium dial painters and studies of atomic bomb survivors. From a scientific perspective, these epidemiologic research program have been productive, highly credible, and formed the bases for many radiological protection standards. Recently, there has been concern that, although research results were available, the data on which these results were based were not easily obtained by interested investigators outside DOE. Therefore, as part of an effort to integrate and broaden access to its epidemiologic information, the DOE has developed the Comprehensive Epidemiologic Data Resource (CEDR) Program. Included in this effort is the development of a computer information system for accessing the collection of CEDR data and its related descriptive information. The epidemiologic data currently available through the CEDAR Program consist of analytic data sets, working data sets, and their associated documentation files. In general, data sets are the result of epidemiologic studies that have been conducted on various groups of workers at different DOE facilities during the past 30 years.
Robbertse, B.; Tatusova, T.
The National Center for Biotechnology Information (NCBI) is well known for the nucleotide sequence archive, GenBank and sequence analysis tool BLAST. However, NCBI integrates many types of biomolecular data from variety of sources and makes it available to the scientific community as interactive web resources as well as organized releases of bulk data. These tools are available to explore and compare fungal genomes. Searching all databases with Fungi [organism] at http://www.ncbi.nlm.nih.gov/ is the quickest way to find resources of interest with fungal entries. Some tools though are resources specific and can be indirectly accessed from a particular database in the Entrez system. These include graphical viewers and comparative analysis tools such as TaxPlot, TaxMap and UniGene DDD (found via UniGene Homepage). Gene and BioProject pages also serve as portals to external data such as community annotation websites, BioGrid and UniProt. There are many different ways of accessing genomic data at NCBI. Depending on the focus and goal of research projects or the level of interest, a user would select a particular route for accessing genomic databases and resources. This review article describes methods of accessing fungal genome data and provides examples that illustrate the use of analysis tools. PMID:22737589
The complete assembly of viral genomes from metagenomic datasets (short genomic sequences gathered from environmental samples) has proven to be challenging, so there are significant blind spots when we view viral genomes through the lens of metagenomics. One approach to overcoming this problem is to leverage the thousands of complete viral genomes that are publicly available. Here we describe our efforts to assemble a comprehensive resource that provides a quantitative snapshot of viral genomic trends – such as gene density, noncoding percentage, and abundances of functional gene categories – across thousands of viral genomes. We have also developed a coarse-grained method for visualizing viral genome organization for hundreds of genomes at once, and have explored the extent of the overlap between bacterial and bacteriophage gene pools. Existing viral classification systems were developed prior to the sequencing era, so we present our analysis in a way that allows us to assess the utility of the different classification systems for capturing genomic trends. PMID:29624169
Koshy, Remya; Ranawat, Anop; Scaria, Vinod
Middle East and North Africa (MENA) encompass very unique populations, with a rich history and encompasses characteristic ethnic, linguistic and genetic diversity. The genetic diversity of MENA region has been largely unknown. The recent availability of whole-exome and whole-genome sequences from the region has made it possible to collect population-specific allele frequencies. The integration of data sets from this region would provide insights into the landscape of genetic variants in this region. We integrated genetic variants from multiple data sets systematically, available from this region to create a compendium of over 26 million genetic variations. The variants were systematically annotated and their allele frequencies in the data sets were computed and available as a web interface which enables quick query. As a proof of principle for application of the compendium for genetic epidemiology, we analyzed the allele frequencies for variants in transglutaminase 1 (TGM1) gene, associated with autosomal recessive lamellar ichthyosis. Our analysis revealed that the carrier frequency of selected variants differed widely with significant interethnic differences. To the best of our knowledge, al mena is the first and most comprehensive repertoire of genetic variations from the Arab, Middle Eastern and North African region. We hope al mena would accelerate Precision Medicine in the region.
Shen, Lishuang; Attimonelli, Marcella; Bai, Renkui; Lott, Marie T; Wallace, Douglas C; Falk, Marni J; Gai, Xiaowu
Accurate mitochondrial DNA (mtDNA) variant annotation is essential for the clinical diagnosis of diverse human diseases. Substantial challenges to this process include the inconsistency in mtDNA nomenclatures, the existence of multiple reference genomes, and a lack of reference population frequency data. Clinicians need a simple bioinformatics tool that is user-friendly, and bioinformaticians need a powerful informatics resource for programmatic usage. Here, we report the development and functionality of the MSeqDR mtDNA Variant Tool set (mvTool), a one-stop mtDNA variant annotation and analysis Web service. mvTool is built upon the MSeqDR infrastructure (https://mseqdr.org), with contributions of expert curated data from MITOMAP (https://www.mitomap.org) and HmtDB (https://www.hmtdb.uniba.it/hmdb). mvTool supports all mtDNA nomenclatures, converts variants to standard rCRS- and HGVS-based nomenclatures, and annotates novel mtDNA variants. Besides generic annotations from dbNSFP and Variant Effect Predictor (VEP), mvTool provides allele frequencies in more than 47,000 germline mitogenomes, and disease and pathogenicity classifications from MSeqDR, Mitomap, HmtDB and ClinVar (Landrum et al., 2013). mvTools also provides mtDNA somatic variants annotations. "mvTool API" is implemented for programmatic access using inputs in VCF, HGVS, or classical mtDNA variant nomenclatures. The results are reported as hyperlinked html tables, JSON, Excel, and VCF formats. MSeqDR mvTool is freely accessible at https://mseqdr.org/mvtool.php. © 2018 Wiley Periodicals, Inc.
L. Shen (Lishuang); M.A. Diroma (Maria Angela); M. Gonzalez (Michael); D. Navarro-Gomez (Daniel); J. Leipzig (Jeremy); M.T. Lott (Marie T.); M. van Oven (Mannis); D.C. Wallace; C.C. Muraresku (Colleen Clarke); Z. Zolkipli-Cunningham (Zarazuela); P.F. Chinnery (Patrick); M. Attimonelli (Marcella); S. Zuchner (Stephan); M.J. Falk (Marni J.); X. Gai (Xiaowu)
textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes,
The Department of Energy has established the Comprehensive Epidemiologic Data Resource (CEDR) as a public-use data base with the goal of broadening independent access to data collected during studies of the health effects of exposure to radiation and other physical or chemical agents associated with the production of nuclear materials. This catalog is intended for use by any individual interested in obtaining information about, or access to, CEDR data. This catalog provides information that will help users identify and request data file sets of interest
The Department of Energy has established the Comprehensive Epidemiologic Data Resource (CEDR) as a public-use data base with the goal of broadening independent access to data collected during studies of the health effects of exposure to radiation and other physical or chemical agents associated with the production of nuclear materials. This catalog is intended for use by any individual interested in obtaining information about, or access to, CEDR data. This catalog provides information that will help users identify and request data file sets of interest.
U.S. Department of Health & Human Services — The NIMH Repository and Genomics Resource (RGR) stores biosamples, genetic, pedigree and clinical data collected in designated NIMH-funded human subject studies. The...
Genomics and high through-put phenotyping are ushering in a new era of accessing genetic diversity held in plant genetic resources, the cornerstone of both traditional and genomics-assisted breeding efforts of food legume crops. Acknowledged or not, yield plateaus must be broken given the daunting ...
Palkopoulou, Eleftheria; Lipson, Mark; Mallick, Swapan
Elephantids are the world's most iconic megafaunal family, yet there is no comprehensive genomic assessment of their relationships. We report a total of 14 genomes, including 2 from the American mastodon, which is an extinct elephantid relative, and 12 spanning all three extant and three extinct...
Full Text Available Gramene (http://www.gramene.org is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationships to enrich the annotation of genomic data and provides tools to perform powerful comparative analyses across a wide spectrum of plant species. It consists of an integrated portal for querying, visualizing and analyzing data for 44 plant reference genomes, genetic variation data sets for 12 species, expression data for 16 species, curated rice pathways and orthology-based pathway projections for 66 plant species including various crops. Here we briefly describe the functions and uses of the Gramene database.
Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul
The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno
Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.
Pfeiffer, Matthias; Martis, Mihaela; Asp, Torben
(Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold......Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass...... to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous...
Shabalov, Igor; Grigoriev, Igor
MycoCosm is a web-based interactive fungal genomics resource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/month or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.
Otto, Thomas D
Background: Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function. Results: We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilized it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the `Plasmodium interspersed repeat genes\\' (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family. Conclusions: Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.
Otsuka, Yuta; Muto, Ai; Takeuchi, Rikiya; Okada, Chihiro; Ishikawa, Motokazu; Nakamura, Koichiro; Yamamoto, Natsuko; Dose, Hitomi; Nakahigashi, Kenji; Tanishima, Shigeki; Suharnan, Sivasundaram; Nomura, Wataru; Nakayashiki, Toru; Aref, Walid G; Bochner, Barry R; Conway, Tyrrell; Gribskov, Michael; Kihara, Daisuke; Rudd, Kenneth E; Tohsato, Yukako; Wanner, Barry L; Mori, Hirotada
Comprehensive experimental resources, such as ORFeome clone libraries and deletion mutant collections, are fundamental tools for elucidation of gene function. Data sets by omics analysis using these resources provide key information for functional analysis, modeling and simulation both in individual and systematic approaches. With the long-term goal of complete understanding of a cell, we have over the past decade created a variety of clone and mutant sets for functional genomics studies of Escherichia coli K-12. We have made these experimental resources freely available to the academic community worldwide. Accordingly, these resources have now been used in numerous investigations of a multitude of cell processes. Quality control is extremely important for evaluating results generated by these resources. Because the annotation has been changed since 2005, which we originally used for the construction, we have updated these genomic resources accordingly. Here, we describe GenoBase (http://ecoli.naist.jp/GB/), which contains key information about comprehensive experimental resources of E. coli K-12, their quality control and several omics data sets generated using these resources. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gay, Laurie M; Kim, Sungeun; Fedorchak, Kyle; Kundranda, Madappa; Odia, Yazmin; Nangia, Chaitali; Battiste, James; Colon-Otero, Gerardo; Powell, Steven; Russell, Jeffery; Elvin, Julia A; Vergilio, Jo-Anne; Suh, James; Ali, Siraj M; Stephens, Philip J; Miller, Vincent A; Ross, Jeffrey S
Esthesioneuroblastoma (ENB), also known as olfactory neuroblastoma, is a rare malignant neoplasm of the olfactory mucosa. Despite surgical resection combined with radiotherapy and adjuvant chemotherapy, ENB often relapses with rapid progression. Current multimodality, nontargeted therapy for relapsed ENB is of limited clinical benefit. We queried whether comprehensive genomic profiling (CGP) of relapsed or refractory ENB can uncover genomic alterations (GA) that could identify potential targeted therapies for these patients. CGP was performed on formalin-fixed, paraffin-embedded sections from 41 consecutive clinical cases of ENBs using a hybrid-capture, adaptor ligation based next-generation sequencing assay to a mean coverage depth of 593X. The results were analyzed for base substitutions, insertions and deletions, select rearrangements, and copy number changes (amplifications and homozygous deletions). Clinically relevant GA (CRGA) were defined as GA linked to drugs on the market or under evaluation in clinical trials. A total of 28 ENBs harbored GA, with a mean of 1.5 GA per sample. Approximately half of the ENBs (21, 51%) featured at least one CRGA, with an average of 1 CRGA per sample. The most commonly altered gene was TP53 (17%), with GA in PIK3CA , NF1 , CDKN2A , and CDKN2C occurring in 7% of samples. We report comprehensive genomic profiles for 41 ENB tumors. CGP revealed potential new therapeutic targets, including targetable GA in the mTOR, CDK and growth factor signaling pathways, highlighting the clinical value of genomic profiling in ENB. Comprehensive genomic profiling of 41 relapsed or refractory ENBs reveals recurrent alterations or classes of mutation, including amplification of tyrosine kinases encoded on chromosome 5q and mutations affecting genes in the mTOR/PI3K pathway. Approximately half of the ENBs (21, 51%) featured at least one clinically relevant genomic alteration (CRGA), with an average of 1 CRGA per sample. The most commonly altered
Nagesh A. Kuravadi
Full Text Available Neem (Azadirachta indica A. Juss is one of the most versatile tropical evergreen tree species known in India since the Vedic period (1500 BC–600 BC. Neem tree is a rich source of limonoids, having a wide spectrum of activity against insect pests and microbial pathogens. Complex tetranortriterpenoids such as azadirachtin, salanin and nimbin are the major active principles isolated from neem seed. Absolutely nothing is known about the biochemical pathways of these metabolites in neem tree. To identify genes and pathways in neem, we sequenced neem genomes and transcriptomes using next generation sequencing technologies. Assembly of Illumina and 454 sequencing reads resulted in 267 Mb, which accounts for 70% of estimated size of neem genome. We predicted 44,495 genes in the neem genome, of which 32,278 genes were expressed in neem tissues. Neem genome consists about 32.5% (87 Mb of repetitive DNA elements. Neem tree is phylogenetically related to citrus, Citrus sinensis. Comparative analysis anchored 62% (161 Mb of assembled neem genomic contigs onto citrus chromomes. Ultrahigh performance liquid chromatography-mass spectrometry-selected reaction monitoring (UHPLC-MS/SRM method was used to quantify azadirachtin, nimbin, and salanin from neem tissues. Weighted Correlation Network Analysis (WCGNA of expressed genes and metabolites resulted in identification of possible candidate genes involved in azadirachtin biosynthesis pathway. This study provides genomic, transcriptomic and quantity of top three neem metabolites resource, which will accelerate basic research in neem to understand biochemical pathways.
Rangiah, Kannan; Mahesh, HB; Rajamani, Anantharamanan; Shirke, Meghana D.; Russiachand, Heikham; Loganathan, Ramya Malarini; Shankara Lingu, Chandana; Siddappa, Shilpa; Ramamurthy, Aishwarya; Sathyanarayana, BN
Neem (Azadirachta indica A. Juss) is one of the most versatile tropical evergreen tree species known in India since the Vedic period (1500 BC–600 BC). Neem tree is a rich source of limonoids, having a wide spectrum of activity against insect pests and microbial pathogens. Complex tetranortriterpenoids such as azadirachtin, salanin and nimbin are the major active principles isolated from neem seed. Absolutely nothing is known about the biochemical pathways of these metabolites in neem tree. To identify genes and pathways in neem, we sequenced neem genomes and transcriptomes using next generation sequencing technologies. Assembly of Illumina and 454 sequencing reads resulted in 267 Mb, which accounts for 70% of estimated size of neem genome. We predicted 44,495 genes in the neem genome, of which 32,278 genes were expressed in neem tissues. Neem genome consists about 32.5% (87 Mb) of repetitive DNA elements. Neem tree is phylogenetically related to citrus, Citrus sinensis. Comparative analysis anchored 62% (161 Mb) of assembled neem genomic contigs onto citrus chromomes. Ultrahigh performance liquid chromatography-mass spectrometry-selected reaction monitoring (UHPLC-MS/SRM) method was used to quantify azadirachtin, nimbin, and salanin from neem tissues. Weighted Correlation Network Analysis (WCGNA) of expressed genes and metabolites resulted in identification of possible candidate genes involved in azadirachtin biosynthesis pathway. This study provides genomic, transcriptomic and quantity of top three neem metabolites resource, which will accelerate basic research in neem to understand biochemical pathways. PMID:26290780
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; Oven, Mannis; Wallace, D.C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J.; Gai, Xiaowu
textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR ...
Checcucci, Alice; Mengoni, Alessio
Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.
Yu, Jun; Wong, Gane Ka-Shu; Liu, Siqi; Wang, Jian; Yang, Huanming
In May 2000, the Beijing Institute of Genomics formally announced the launch of a comprehensive crop genome research project on rice genomics, the Chinese Superhybrid Rice Genome Project. SRGP is not simply a sequencing project targeted to a single rice (Oryza sativa L.) genome, but a full-swing research effort with an ultimate goal of providing inclusive basic genomic information and molecular tools not only to understand biology of the rice, both as an important crop species and a model organism of cereals, but also to focus on a popular superhybrid rice landrace, LYP9. We have completed the first phase of SRGP and provide the rice research community with a finished genome sequence of an indica variety, 93-11 (the paternal cultivar of LYP9), together with ample data on subspecific (between subspecies) polymorphisms, transcriptomes and proteomes, useful for within-species comparative studies. In the second phase, we have acquired the genome sequence of the maternal cultivar, PA64S, together with the detailed catalogues of genes uniquely expressed in the parental cultivars and the hybrid as well as allele-specific markers that distinguish parental alleles. Although SRGP in China is not an open-ended research programme, it has been designed to pave a way for future plant genomics research and application, such as to interrogate fundamentals of plant biology, including genome duplication, polyploidy and hybrid vigour, as well as to provide genetic tools for crop breeding and to carry along a social burden-leading a fight against the world's hunger. It began with genomics, the newly developed and industry-scale research field, and from the world's most populous country. In this review, we summarize our scientific goals and noteworthy discoveries that exploit new territories of systematic investigations on basic and applied biology of rice and other major cereal crops.
Cooper, James W.; Wilson, Michael H.; Derks, M.F.L.; Smit, Sandra; Kunert, Karl J.; Cullis, Christopher; Foyer, C.H.
Grain legume improvement is currently impeded by a lack of genomic resources. The paucity of genome information for faba bean can be attributed to the intrinsic difficulties of assembling/annotating its giant (~13 Gb) genome. In order to address this challenge, RNA-sequencing analysis was performed
comprehensive genomic resource database of date palm. It can serve as a bioinformatics platform for date palm genomics, genetics, and molecular breeding. DRDB is freely available at http://drdb.big.ac.cn/home.
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.
The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at
Full Text Available Campylobacter species.are phenotypically diverse in many aspects including host habitats and pathogenicities, which demands comprehensive characterization of the entire Campylobacter genus to study their underlying genetic diversification. Up to now, 34 Campylobacter strains have been sequenced and published in public databases, providing good opportunity to systemically analyze their genomic diversities. In this study, we first conducted genomic characterization, which includes genome-wide alignments, pan-genome analysis, and phylogenetic identification, to depict the genetic diversity of Campylobacter genus. Afterward, we improved the tetranucleotide usage pattern-based naïve Bayesian classifier to identify the abnormal composition fragments (ACFs, fragments with significantly different tetranucleotide frequency profiles from its genomic tetranucleotide frequency profiles including horizontal gene transfers (HGTs to explore the mechanisms for the genetic diversity of this organism. Finally, we analyzed the HGTs transferred via bacteriophage transductions. To our knowledge, this study is the first to use single nucleotide polymorphism information to construct liable microevolution phylogeny of 21 Campylobacter jejuni strains. Combined with the phylogeny of all the collected Campylobacter species based on genome-wide core gene information, comprehensive phylogenetic inference of all 34 Campylobacter organisms was determined. It was found that C. jejuni harbors a high fraction of ACFs possibly through intraspecies recombination, whereas other Campylobacter members possess numerous ACFs possibly via intragenus recombination. Furthermore, some Campylobacter strains have undergone significant ancient viral integration during their evolution process. The improved method is a powerful tool for bacterial genomic analysis. Moreover, the findings would provide useful information for future research on Campylobacter genus.
Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
Full Text Available Abstract Background The number of completed eukaryotic genome sequences and cDNA projects has increased exponentially in the past few years although most of them have not been published yet. In addition, many microarray analyses yielded thousands of sequenced EST and cDNA clones. For the researcher interested in single gene analyses (from a phylogenetic, a structural biology or other perspective it is therefore important to have up-to-date knowledge about the various resources providing primary data. Description The database is built around 3 central tables: species, sequencing projects and publications. The species table contains commonly and alternatively used scientific names, common names and the complete taxonomic information. For projects the sequence type and links to species project web-sites and species homepages are stored. All publications are linked to projects. The web-interface provides comprehensive search modules with detailed options and three different views of the selected data. We have especially focused on developing an elaborate taxonomic tree search tool that allows the user to instantaneously identify e.g. the closest relative to the organism of interest. Conclusion We have developed a database, called diArk, to store, organize, and present the most relevant information about completed genome projects and EST/cDNA data from eukaryotes. Currently, diArk provides information about 415 eukaryotes, 823 sequencing projects, and 248 publications.
Lucía Martínez Vázquez; Eva María Iñesta Mena
Reading comprehension is a complex process, whose teaching involves multiple factors, as highlighted by Psychology, Didactics of languages, and others disciplines. Nevertheless, theoretical frameworks need to be applied by means of innovative practices and resources. The aim of this work is to present an innovation implemented in 2016-2017 in the third year of primary school, in the frame of an action-research, with the objective of reinforcing the learning of reading. In order to cope whit t...
Full Text Available Abstract Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.
Jung, Sook; Jesudurai, Christopher; Staton, Margaret; Du, Zhidian; Ficklin, Stephen; Cho, Ilhyung; Abbott, Albert; Tomkins, Jeffrey; Main, Dorrie
Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.
Genome resource banking is the systematic collection, storage, and redistribution of biomaterials in an organized, logistical, and secure manner. Genome cryobanks usually contain biomaterials and associated genomic information essential for progression of biomedicine, human health, and research. In that regard, appropriate genome cryobanks could provide essential biomaterials for both current and future research projects in the form of various cell types and tissues, including sperm, oocytes, embryos, embryonic or adult stem cells, induced pluripotent stem cells, and gonadal tissues. In addition to cryobanked germplasm, cryobanking of DNA, serum, blood products, and tissues from scientifically, economically, and ecologically important species has become a common practice. For revitalization of the whole organism, cryopreserved germplasm in conjunction with assisted reproductive technologies, offer a powerful approach for research model management, as well as assisting in animal production for agriculture, conservation, and human reproductive medicine. Recently, many developed and developing countries have allocated substantial resources to establish genome resources banks which are responsible for safeguarding scientifically, economically, and ecologically important wild type, mutant, and transgenic plants, fish, and local livestock breeds, as well as wildlife species. This review is dedicated to the memory of Dr. John K. Critser, who has made profound contributions to the science of cryobiology and establishment of genome research and resources centers for mice, rats, and swine. Emphasis will be given to application of genome resource banks to species with substantial contributions to the advancement of biomedicine and human health. Copyright © 2012 Elsevier Inc. All rights reserved.
Full Text Available Abstract Background Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop. Description Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I and 6,029 DNA transposons (Class II with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95% of these elements (particularly a few hundred low-copy-number families are first described in this study. Conclusion SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually
Lucía Martínez Vázquez
Full Text Available Reading comprehension is a complex process, whose teaching involves multiple factors, as highlighted by Psychology, Didactics of languages, and others disciplines. Nevertheless, theoretical frameworks need to be applied by means of innovative practices and resources. The aim of this work is to present an innovation implemented in 2016-2017 in the third year of primary school, in the frame of an action-research, with the objective of reinforcing the learning of reading. In order to cope whit the comprehension difficulties involved in attention and concentration abilities, a didactic intervention was designed with the musical tale as a resource. Different approaches to this sort of text, integrated in diverse activities, facilitated the learning of active listening of tales, expressing reading, and guided the attention of readers to metacognitive strategies. The experience allows better identify some difficulties in the reading process, and prove the usefulness of the musical tale, as a meaningful resource to support the teaching and learning of reading.
Full Text Available Bacteria of the genus Methylobacterium are widespread in diverse habitats ranging from soil, water and plant (phyllosphere, rhizosphere and endosphere. In the present study, we in house generated genomic data resource of six type strains along with fourteen database genomes of the Methylobacterium genus to carry out phylogenomic, taxonomic, comparative and ecological studies of this genus. Overall, the genus shows high diversity and genetic variation primarily due to its ability to acquire genetic material from diverse sources through horizontal gene transfer. As majority of species identified in this study are plant associated with their genomes equipped with methylotrophy and photosynthesis related gene along with genes for plant probiotic traits. Most of the species genomes are equipped with genes for adaptation and defense for UV radiation, oxidative stress and desiccation. The genus has an open pan-genome and we predicted the role of gain/loss of prophages and CRISPR elements in diversity and evolution. Our genomic resource with annotation and analysis provides a platform for interspecies genomic comparisons in the genus Methylobacterium, and to unravel their natural genome diversity and to study how natural selection shapes their genome with the adaptive mechanisms which allow them to acquire diverse habitat lifestyles. This type strains genomic data display power of Next Generation Sequencing in rapidly creating resource paving the way for studies on phylogeny and taxonomy as well as for basic and applied research for this important genus.
Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno
Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232
Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X
PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.
Chi, Yixia; Xue, Lianqing; Zhang, Hui
The purpose of the water resources comprehensive benefits analysis is to maximize the comprehensive benefits on the aspects of social, economic and ecological environment. Aiming at the defects of the traditional analytic hierarchy process in the evaluation of water resources, it proposed a comprehensive benefit evaluation of social, economic and environmental benefits index from the perspective of water resources comprehensive benefit in the social system, economic system and environmental system; determined the index weight by the improved fuzzy analytic hierarchy process (AHP), calculated the relative index of water resources comprehensive benefit and analyzed the comprehensive benefit of water resources in Xiangshui County by the multi-objective evaluation model. Based on the water resources data in Xiangshui County, 20 main comprehensive benefit assessment factors of 5 districts belonged to Xiangshui County were evaluated. The results showed that the comprehensive benefit of Xiangshui County was 0.7317, meanwhile the social economy has a further development space in the current situation of water resources.
Jung, Sook; Main, Dorrie
Recent technological advances in biology promise unprecedented opportunities for rapid and sustainable advancement of crop quality. Following this trend, the Rosaceae research community continues to generate large amounts of genomic, genetic and breeding data. These include annotated whole genome sequences, transcriptome and expression data, proteomic and metabolomic data, genotypic and phenotypic data, and genetic and physical maps. Analysis, storage, integration and dissemination of these data using bioinformatics tools and databases are essential to provide utility of the data for basic, translational and applied research. This review discusses the currently available genomics and bioinformatics resources for the Rosaceae family.
Full Text Available Chinese yam has been used both as a food and in traditional herbal medicine. Developing more effective genetic markers in this species is necessary to assess its genetic diversity and perform cultivar identification. In this study, new chloroplast genomic resources were developed using whole chloroplast genomes from six genotypes originating from different geographical locations. The Dioscorea polystachya chloroplast genome is a circular molecule consisting of two single-copy regions separated by a pair of inverted repeats. Comparative analyses of six D. polystachya chloroplast genomes revealed 141 single nucleotide polymorphisms (SNPs. Seventy simple sequence repeats (SSRs were found in the six genotypes, including 24 polymorphic SSRs. Forty-three common indels and five small inversions were detected. Phylogenetic analysis based on the complete chloroplast genome provided the best resolution among the genotypes. Our evaluation of chloroplast genome resources among these genotypes led us to consider the complete chloroplast genome sequence of D. polystachya as a source of reliable and valuable molecular markers for revealing biogeographical structure and the extent of genetic variation in wild populations and for identifying different cultivars.
Romero-Lankao, P.; Bourgeron, P.; Gochis, D. J.; Rothman, D. S.; Wilhelmi, O.
During the past decades urbanization has proceeded at unprecedented - yet varied - rates across urban areas globally. The social and environmental transformations implied by urban development have put many regions at risk of transforming the very characteristics that make them attractive and healthy. Meanwhile, climate change is adding new sources of risk and an array of uncertainties to the mix. These changes create risks that vary according to the characteristics of the demographic, economic, ecological, built-environment (technological) and governance dimensions of urbanization and urban areas as socioecological systems. However, few studies have explored the variation in these dimensions across urban areas. I will present a comprehensive analytical framework that explores, in urban areas, patterns of interplay, synergy and tradeoff between socio-demographic, economic, technological, ecological, and governance (SETEG) factors as they shape two issues, traditionally analyzed by separate disciplinary domains: resource use and resilience to climate hazards. Three questions guide this effort: 1) What indicators can be used to socio-demographic, economic, technological, ecological, and governance (SETEG) determinants of urban populations' resource use and resilience to climate hazards? 2) What indicators are important? 3) What combinations (i.e., tradeoffs, synergies) of causal factors better explain urban populations' resource use and resilience to hazards? The interplay between these factors as they shape a population's resource use and resilience is not exempted from synergies and tradeoffs that require careful analysis. Consider population density, a key indicator of urban form. Scholars have found that while more compact cities are more energy efficient and emit less GHG, heat stress is much worse in more compact cities. This begs the question of which combination of urban form factors need to be considered by urban planners when designing effective urban
Berger Stephen A
Full Text Available Abstract GIDEON (Global Infectious Diseases and Epidemiology Network is a web-based computer program designed for decision support and informatics in the field of Geographic Medicine. The first of four interactive modules generates a ranked differential diagnosis based on patient signs, symptoms, exposure history and country of disease acquisition. Additional options include syndromic disease surveillance capability and simulation of bioterrorism scenarios. The second module accesses detailed and current information regarding the status of 338 individual diseases in each of 220 countries. Over 50,000 disease images, maps and user-designed graphs may be downloaded for use in teaching and preparation of written materials. The third module is a comprehensive source on the use of 328 anti-infective drugs and vaccines, including a listing of over 9,500 international trade names. The fourth module can be used to characterize or identify any bacterium or yeast, based on laboratory phenotype. GIDEON is an up-to-date and comprehensive resource for Geographic Medicine.
Cooper, James W; Wilson, Michael H; Derks, Martijn F L; Smit, Sandra; Kunert, Karl J; Cullis, Christopher; Foyer, Christine H
Grain legume improvement is currently impeded by a lack of genomic resources. The paucity of genome information for faba bean can be attributed to the intrinsic difficulties of assembling/annotating its giant (~13 Gb) genome. In order to address this challenge, RNA-sequencing analysis was performed on faba bean (cv. Wizard) leaves. Read alignment to the faba bean reference transcriptome identified 16 300 high quality unigenes. In addition, Illumina paired-end sequencing was used to establish a baseline for genomic information assembly. Genomic reads were assembled de novo into contigs with a size range of 50-5000 bp. Over 85% of sequences did not align to known genes, of which ~10% could be aligned to known repetitive genetic elements. Over 26 000 of the reference transcriptome unigenes could be aligned to DNA-sequencing (DNA-seq) reads with high confidence. Moreover, this comparison identified 56 668 potential splice points in all identified unigenes. Sequence length data were extended at 461 putative loci through alignment of DNA-seq contigs to full-length, publicly available linkage marker sequences. Reads also yielded coverages of 3466× and 650× for the chloroplast and mitochondrial genomes, respectively. Inter- and intraspecies organelle genome comparisons established core legume organelle gene sets, and revealed polymorphic regions of faba bean organelle genomes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Full Text Available Abstract Background Follicle stimulating hormone (FSH is an important hormone responsible for growth, maturation and function of the human reproductive system. FSH regulates the synthesis of steroid hormones such as estrogen and progesterone, proliferation and maturation of follicles in the ovary and spermatogenesis in the testes. FSH is a glycoprotein heterodimer that binds and acts through the FSH receptor, a G-protein coupled receptor. Although online pathway repositories provide information about G-protein coupled receptor mediated signal transduction, the signaling events initiated specifically by FSH are not cataloged in any public database in a detailed fashion. Findings We performed comprehensive curation of the published literature to identify the components of FSH signaling pathway and the molecular interactions that occur upon FSH receptor activation. Our effort yielded 64 reactions comprising 35 enzyme-substrate reactions, 11 molecular association events, 11 activation events and 7 protein translocation events that occur in response to FSH receptor activation. We also cataloged 265 genes, which were differentially expressed upon FSH stimulation in normal human reproductive tissues. Conclusions We anticipate that the information provided in this resource will provide better insights into the physiological role of FSH in reproductive biology, its signaling mediators and aid in further research in this area. The curated FSH pathway data is freely available through NetPath (http://www.netpath.org, a pathway resource developed previously by our group.
Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species
Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron
Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first
First page Back Continue Last page Overview Graphics. GPSR: A Resource for Genomics Proteomics and Systems Biology. Small programs as building unit. Why PERL? Why not BioPerl? Why not PERL modules? Advantage of independent programs. Language independent; Can be run independently.
First page Back Continue Last page Overview Graphics. GPSR: A Resource for Genomics Proteomics and Systems Biology. A journey from simple computer programs to drug/vaccine informatics. Limitations of existing web services. History repeats (Web to Standalone); Graphics vs command mode. General purpose ...
GPSR: A Resource for Genomics Proteomics and Systems Biology · Simple Calculation Programs for Biology Immunological Methods · Simple Calculation Programs for Biology Methods in Molecular Biology · Simple Calculation Programs for Biology Other Methods · PowerPoint Presentation · Slide 6 · Slide 7 · Prediction of ...
Gilchrist, Anthony Stuart; Shearman, Deborah C A; Frommer, Marianne; Raphael, Kathryn A; Deshpande, Nandan P; Wilkins, Marc R; Sherwin, William B; Sved, John A
The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations. We produced a draft de novo genome assembly of Australia's major tephritid pest species, Bactrocera tryoni. The male genome (650-700 Mbp) includes approximately 150 Mb of interspersed repetitive DNA sequences and 60 Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions. This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA
Full Text Available Abstract Background The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Results Rice Indica cDNA Database (RICD is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. Conclusion The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.
Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
Full Text Available The genome sequences of many important Triticeae species, including bread wheat ( L. and barley ( L., remained uncharacterized for a long time because their high repeat content, large sizes, and polyploidy. As a result of improvements in sequencing technologies and novel analyses strategies, several of these have recently been deciphered. These efforts have generated new insights into Triticeae biology and genome organization and have important implications for downstream usage by breeders, experimental biologists, and comparative genomicists. transPLANT ( is an EU-funded project aimed at constructing hardware, software, and data infrastructure for genome-scale research in the life sciences. Since the Triticeae data are intrinsically complex, heterogenous, and distributed, the transPLANT consortium has undertaken efforts to develop common data formats and tools that enable the exchange and integration of data from distributed resources. Here we present an overview of the individual Triticeae genome resources hosted by transPLANT partners, introduce the objectives of transPLANT, and outline common developments and interfaces supporting integrated data access.
Sue K Kim
Full Text Available Among the legume family, mungbean (Vigna radiata has become one of the important crops in Asia, showing a steady increase in global production. It provides a good source of protein and contains most notably folate and iron. Beyond the nutritional value of mungbean, certain features make it a well-suited model organism among legume plants because of its small genome size, short life-cycle, self-pollinating, and close genetic relationship to other legumes. In the past, there have been several efforts to develop molecular markers and linkage maps associated with agronomic traits for the genetic improvement of mungbean and, ultimately, breeding for cultivar development to increase the average yields of mungbean. The recent release of a reference genome of the cultivated mungbean (V. radiata var. radiata VC1973A and an additional de novo sequencing of a wild relative mungbean (V. radiata var. sublobata has provided a framework for mungbean genetic and genome research, that can further be used for genome-wide association and functional studies to identify genes related to specific agronomic traits. Moreover, the diverse gene pool of wild mungbean comprises valuable genetic resources of beneficial genes that may be helpful in widening the genetic diversity of cultivated mungbean. This review paper covers the research progress on molecular and genomics approaches and the current status of breeding programs that have developed to move toward the ultimate goal of mungbean improvement.
Mohr, Stephanie E.; Hu, Yanhui; Kim, Kevin; Housden, Benjamin E.; Perrimon, Norbert
Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003
Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.
Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263
Pennock, Kenneth [AWS Truepower, LLC, Albany, NY (United States); Makarov, Yuri V. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Rajagopal, Sankaran [Siemens Energy, Erlangen (Germany); Loutan, Clyde [California Independent System Operator; Etingov, Pavel V. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Miller, Laurie E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Lu, Bo [Siemens Energy, Erlangen (Germany); Mansingh, Ashmin [Siemens Energy, Erlangen (Germany); Zack, John [MESO, Inc., Raleigh, NC (United States); Sherick, Robert [Southern California Edison, Rosemead, CA (United States); Romo, Abraham [Southern California Edison; Habibi-Ashrafi, Farrokh [Southern California Edison; Johnson, Raymond [Southern California Edison
The need for proactive closed-loop integration of uncertainty information into system operations and probability-based controls is widely recognized, but rarely implemented in system operations. Proactive integration for this project means that the information concerning expected uncertainty ranges for net load and balancing requirements, including required balancing capacity, ramping and ramp duration characteristics, will be fed back into the generation commitment and dispatch algorithms to modify their performance so that potential shortages of these characteristics can be prevented. This basic, yet important, premise is the motivating factor for this project. The achieved project goal is to demonstrate the benefit of such a system. The project quantifies future uncertainties, predicts additional system balancing needs including the prediction intervals for capacity and ramping requirements of future dispatch intervals, evaluates the impacts of uncertainties on transmission including the risk of overloads and voltage problems, and explores opportunities for intra-hour generation adjustments helping to provide more flexibility for system operators. The resulting benefits culminate in more reliable grid operation in the face of increased system uncertainty and variability caused by solar power. The project identifies that solar power does not require special separate penetration level restrictions or penalization for its intermittency. Ultimately, the collective consideration of all sources of intermittency distributed over a wide area unified with the comprehensive evaluation of various elements of balancing process, i.e. capacity, ramping, and energy requirements, help system operators more robustly and effectively balance generation against load and interchange. This project showed that doing so can facilitate more solar and other renewable resources on the grid without compromising reliability and control performance. Efforts during the project included
Richardson, Mark F; Sequeira, Fernando; Selechnik, Daniel; Carneiro, Miguel; Vallinoto, Marcelo; Reid, Jack G; West, Andrea J; Crossland, Michael R; Shine, Richard; Rollins, Lee A
Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. © The Authors 2017. Published by Oxford University Press.
Kim, Jungeun; Weber, Jessica A; Jho, Sungwoong; Jang, Jinho; Jun, JeHoon; Cho, Yun Sung; Kim, Hak-Min; Kim, Hyunho; Kim, Yumi; Chung, OkSung; Kim, Chang Geun; Lee, HyeJin; Kim, Byung Chul; Han, Kyudong; Koh, InSong; Chae, Kyun Shik; Lee, Semin; Edwards, Jeremy S; Bhak, Jong
High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.
Full Text Available The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC tool and Pathogenomic Profiling Tool (PathoProT, which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.
Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh
The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.
Brinkley, James F.; Fisher, Shannon; Harris, Matthew P.; Holmes, Greg; Hooper, Joan E.; Wang Jabs, Ethylin; Jones, Kenneth L.; Kesselman, Carl; Klein, Ophir D.; Maas, Richard L.; Marazita, Mary L.; Selleri, Licia; Spritz, Richard A.; van Bakel, Harm; Visel, Axel; Williams, Trevor J.; Wysocka, Joanna
The FaceBase Consortium, funded by the National Institute of Dental and Craniofacial Research, National Institutes of Health, is designed to accelerate understanding of craniofacial developmental biology by generating comprehensive data resources to empower the research community, exploring high-throughput technology, fostering new scientific collaborations among researchers and human/computer interactions, facilitating hypothesis-driven research and translating science into improved health care to benefit patients. The resources generated by the FaceBase projects include a number of dynamic imaging modalities, genome-wide association studies, software tools for analyzing human facial abnormalities, detailed phenotyping, anatomical and molecular atlases, global and specific gene expression patterns, and transcriptional profiling over the course of embryonic and postnatal development in animal models and humans. The integrated data visualization tools, faceted search infrastructure, and curation provided by the FaceBase Hub offer flexible and intuitive ways to interact with these multidisciplinary data. In parallel, the datasets also offer unique opportunities for new collaborations and training for researchers coming into the field of craniofacial studies. Here, we highlight the focus of each spoke project and the integration of datasets contributed by the spokes to facilitate craniofacial research. PMID:27287806
Albert, Nikhila; Daniels, Jena; Schwartz, Jessey; Du, Michael; Wall, Dennis P
For individuals with autism spectrum disorder (ASD), finding resources can be a lengthy and difficult process. The difficulty in obtaining global, fine-grained autism epidemiological data hinders researchers from quickly and efficiently studying large-scale correlations among ASD, environmental factors, and geographical and cultural factors. The objective of this study was to define resource load and resource availability for families affected by autism and subsequently create a platform to enable a more accurate representation of prevalence rates and resource epidemiology. We created a mobile application, GapMap, to collect locational, diagnostic, and resource use information from individuals with autism to compute accurate prevalence rates and better understand autism resource epidemiology. GapMap is hosted on AWS S3, running on a React and Redux front-end framework. The backend framework is comprised of an AWS API Gateway and Lambda Function setup, with secure and scalable end points for retrieving prevalence and resource data, and for submitting participant data. Measures of autism resource scarcity, including resource load, resource availability, and resource gaps were defined and preliminarily computed using simulated or scraped data. The average distance from an individual in the United States to the nearest diagnostic center is approximately 182 km (50 miles), with a standard deviation of 235 km (146 miles). The average distance from an individual with ASD to the nearest diagnostic center, however, is only 32 km (20 miles), suggesting that individuals who live closer to diagnostic services are more likely to be diagnosed. This study confirmed that individuals closer to diagnostic services are more likely to be diagnosed and proposes GapMap, a means to measure and enable the alleviation of increasingly overburdened diagnostic centers and resource-poor areas where parents are unable to diagnose their children as quickly and easily as needed. GapMap will
Song, Yan-Chun; Yu, Dan
With the development of the society and economy, the contradictions among population, resources and environment are increasingly worse. As a result, the capacity of resources and environment becomes one of the focal issues for many countries and regions. Through investigating and analyzing the present situation and the existing problems of resources and environment in Poyang Lake Eco-economic Zone, seven factors were chosen as the evaluation criterion layer, namely, land resources, water resources, biological resources, mineral resources, ecological-geological environment, water environment and atmospheric environment. Based on the single factor evaluation results and with the county as the evaluation unit, the comprehensive capacity of resources and environment was evaluated by using the state space method in Poyang Lake Eco-economic Zone. The results showed that it boasted abundant biological resources, quality atmosphere and water environment, and relatively stable geological environment, while restricted by land resource, water resource and mineral resource. Currently, although the comprehensive capacity of the resources and environments in Poyang Lake Eco-economic Zone was not overloaded as a whole, it has been the case in some counties/districts. State space model, with clear indication and high accuracy, could serve as another approach to evaluating comprehensive capacity of regional resources and environment.
Haury, David L.
This ERIC Digest identifies how the human genome project fits into the "National Science Education Standards" and lists Human Genome Project Web sites found on the World Wide Web. It is a resource companion to "Learning about the Human Genome. Part 1: Challenge to Science Educators" (Haury 2001). The Web resources and…
Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.
Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437
Jose L. Pruneda-Paz
Full Text Available Extensive transcriptional networks play major roles in cellular and organismal functions. Transcript levels are in part determined by the combinatorial and overlapping functions of multiple transcription factors (TFs bound to gene promoters. Thus, TF-promoter interactions provide the basic molecular wiring of transcriptional regulatory networks. In plants, discovery of the functional roles of TFs is limited by an increased complexity of network circuitry due to a significant expansion of TF families. Here, we present the construction of a comprehensive collection of Arabidopsis TFs clones created to provide a versatile resource for uncovering TF biological functions. We leveraged this collection by implementing a high-throughput DNA binding assay and identified direct regulators of a key clock gene (CCA1 that provide molecular links between different signaling modules and the circadian clock. The resources introduced in this work will significantly contribute to a better understanding of the transcriptional regulatory landscape of plant genomes.
Brat, Daniel J; Verhaak, Roel G W; Aldape, Kenneth D; Yung, W K Alfred; Salama, Sofie R; Cooper, Lee A D; Rheinbay, Esther; Miller, C Ryan; Vitucci, Mark; Morozova, Olena; Robertson, A Gordon; Noushmehr, Houtan; Laird, Peter W; Cherniack, Andrew D; Akbani, Rehan; Huse, Jason T; Ciriello, Giovanni; Poisson, Laila M; Barnholtz-Sloan, Jill S; Berger, Mitchel S; Brennan, Cameron; Colen, Rivka R; Colman, Howard; Flanders, Adam E; Giannini, Caterina; Grifford, Mia; Iavarone, Antonio; Jain, Rajan; Joseph, Isaac; Kim, Jaegil; Kasaian, Katayoon; Mikkelsen, Tom; Murray, Bradley A; O'Neill, Brian Patrick; Pachter, Lior; Parsons, Donald W; Sougnez, Carrie; Sulman, Erik P; Vandenberg, Scott R; Van Meir, Erwin G; von Deimling, Andreas; Zhang, Hailei; Crain, Daniel; Lau, Kevin; Mallery, David; Morris, Scott; Paulauskis, Joseph; Penny, Robert; Shelton, Troy; Sherman, Mark; Yena, Peggy; Black, Aaron; Bowen, Jay; Dicostanzo, Katie; Gastier-Foster, Julie; Leraas, Kristen M; Lichtenberg, Tara M; Pierson, Christopher R; Ramirez, Nilsa C; Taylor, Cynthia; Weaver, Stephanie; Wise, Lisa; Zmuda, Erik; Davidsen, Tanja; Demchok, John A; Eley, Greg; Ferguson, Martin L; Hutter, Carolyn M; Mills Shaw, Kenna R; Ozenberger, Bradley A; Sheth, Margi; Sofia, Heidi J; Tarnuzzer, Roy; Wang, Zhining; Yang, Liming; Zenklusen, Jean Claude; Ayala, Brenda; Baboud, Julien; Chudamani, Sudha; Jensen, Mark A; Liu, Jia; Pihl, Todd; Raman, Rohini; Wan, Yunhu; Wu, Ye; Ally, Adrian; Auman, J Todd; Balasundaram, Miruna; Balu, Saianand; Baylin, Stephen B; Beroukhim, Rameen; Bootwalla, Moiz S; Bowlby, Reanne; Bristow, Christopher A; Brooks, Denise; Butterfield, Yaron; Carlsen, Rebecca; Carter, Scott; Chin, Lynda; Chu, Andy; Chuah, Eric; Cibulskis, Kristian; Clarke, Amanda; Coetzee, Simon G; Dhalla, Noreen; Fennell, Tim; Fisher, Sheila; Gabriel, Stacey; Getz, Gad; Gibbs, Richard; Guin, Ranabir; Hadjipanayis, Angela; Hayes, D Neil; Hinoue, Toshinori; Hoadley, Katherine; Holt, Robert A; Hoyle, Alan P; Jefferys, Stuart R; Jones, Steven; Jones, Corbin D; Kucherlapati, Raju; Lai, Phillip H; Lander, Eric; Lee, Semin; Lichtenstein, Lee; Ma, Yussanne; Maglinte, Dennis T; Mahadeshwar, Harshad S; Marra, Marco A; Mayo, Michael; Meng, Shaowu; Meyerson, Matthew L; Mieczkowski, Piotr A; Moore, Richard A; Mose, Lisle E; Mungall, Andrew J; Pantazi, Angeliki; Parfenov, Michael; Park, Peter J; Parker, Joel S; Perou, Charles M; Protopopov, Alexei; Ren, Xiaojia; Roach, Jeffrey; Sabedot, Thaís S; Schein, Jacqueline; Schumacher, Steven E; Seidman, Jonathan G; Seth, Sahil; Shen, Hui; Simons, Janae V; Sipahimalani, Payal; Soloway, Matthew G; Song, Xingzhi; Sun, Huandong; Tabak, Barbara; Tam, Angela; Tan, Donghui; Tang, Jiabin; Thiessen, Nina; Triche, Timothy; Van Den Berg, David J; Veluvolu, Umadevi; Waring, Scot; Weisenberger, Daniel J; Wilkerson, Matthew D; Wong, Tina; Wu, Junyuan; Xi, Liu; Xu, Andrew W; Yang, Lixing; Zack, Travis I; Zhang, Jianhua; Aksoy, B Arman; Arachchi, Harindra; Benz, Chris; Bernard, Brady; Carlin, Daniel; Cho, Juok; DiCara, Daniel; Frazer, Scott; Fuller, Gregory N; Gao, JianJiong; Gehlenborg, Nils; Haussler, David; Heiman, David I; Iype, Lisa; Jacobsen, Anders; Ju, Zhenlin; Katzman, Sol; Kim, Hoon; Knijnenburg, Theo; Kreisberg, Richard Bailey; Lawrence, Michael S; Lee, William; Leinonen, Kalle; Lin, Pei; Ling, Shiyun; Liu, Wenbin; Liu, Yingchun; Liu, Yuexin; Lu, Yiling; Mills, Gordon; Ng, Sam; Noble, Michael S; Paull, Evan; Rao, Arvind; Reynolds, Sheila; Saksena, Gordon; Sanborn, Zack; Sander, Chris; Schultz, Nikolaus; Senbabaoglu, Yasin; Shen, Ronglai; Shmulevich, Ilya; Sinha, Rileen; Stuart, Josh; Sumer, S Onur; Sun, Yichao; Tasman, Natalie; Taylor, Barry S; Voet, Doug; Weinhold, Nils; Weinstein, John N; Yang, Da; Yoshihara, Kosuke; Zheng, Siyuan; Zhang, Wei; Zou, Lihua; Abel, Ty; Sadeghi, Sara; Cohen, Mark L; Eschbacher, Jenny; Hattab, Eyas M; Raghunathan, Aditya; Schniederjan, Matthew J; Aziz, Dina; Barnett, Gene; Barrett, Wendi; Bigner, Darell D; Boice, Lori; Brewer, Cathy; Calatozzolo, Chiara; Campos, Benito; Carlotti, Carlos Gilberto; Chan, Timothy A; Cuppini, Lucia; Curley, Erin; Cuzzubbo, Stefania; Devine, Karen; DiMeco, Francesco; Duell, Rebecca; Elder, J Bradley; Fehrenbach, Ashley; Finocchiaro, Gaetano; Friedman, William; Fulop, Jordonna; Gardner, Johanna; Hermes, Beth; Herold-Mende, Christel; Jungk, Christine; Kendler, Ady; Lehman, Norman L; Lipp, Eric; Liu, Ouida; Mandt, Randy; McGraw, Mary; Mclendon, Roger; McPherson, Christopher; Neder, Luciano; Nguyen, Phuong; Noss, Ardene; Nunziata, Raffaele; Ostrom, Quinn T; Palmer, Cheryl; Perin, Alessandro; Pollo, Bianca; Potapov, Alexander; Potapova, Olga; Rathmell, W Kimryn; Rotin, Daniil; Scarpace, Lisa; Schilero, Cathy; Senecal, Kelly; Shimmel, Kristen; Shurkhay, Vsevolod; Sifri, Suzanne; Singh, Rosy; Sloan, Andrew E; Smolenski, Kathy; Staugaitis, Susan M; Steele, Ruth; Thorne, Leigh; Tirapelli, Daniela P C; Unterberg, Andreas; Vallurupalli, Mahitha; Wang, Yun; Warnick, Ronald; Williams, Felicia; Wolinsky, Yingli; Bell, Sue; Rosenberg, Mara; Stewart, Chip; Huang, Franklin; Grimsby, Jonna L; Radenbaugh, Amie J; Zhang, Jianan
Diffuse low-grade and intermediate-grade gliomas (which together make up the lower-grade gliomas, World Health Organization grades II and III) have highly variable clinical behavior that is not adequately predicted on the basis of histologic class. Some are indolent; others quickly progress to glioblastoma. The uncertainty is compounded by interobserver variability in histologic diagnosis. Mutations in IDH, TP53, and ATRX and codeletion of chromosome arms 1p and 19q (1p/19q codeletion) have been implicated as clinically relevant markers of lower-grade gliomas. We performed genomewide analyses of 293 lower-grade gliomas from adults, incorporating exome sequence, DNA copy number, DNA methylation, messenger RNA expression, microRNA expression, and targeted protein expression. These data were integrated and tested for correlation with clinical outcomes. Unsupervised clustering of mutations and data from RNA, DNA-copy-number, and DNA-methylation platforms uncovered concordant classification of three robust, nonoverlapping, prognostically significant subtypes of lower-grade glioma that were captured more accurately by IDH, 1p/19q, and TP53 status than by histologic class. Patients who had lower-grade gliomas with an IDH mutation and 1p/19q codeletion had the most favorable clinical outcomes. Their gliomas harbored mutations in CIC, FUBP1, NOTCH1, and the TERT promoter. Nearly all lower-grade gliomas with IDH mutations and no 1p/19q codeletion had mutations in TP53 (94%) and ATRX inactivation (86%). The large majority of lower-grade gliomas without an IDH mutation had genomic aberrations and clinical behavior strikingly similar to those found in primary glioblastoma. The integration of genomewide data from multiple platforms delineated three molecular classes of lower-grade gliomas that were more concordant with IDH, 1p/19q, and TP53 status than with histologic class. Lower-grade gliomas with an IDH mutation either had 1p/19q codeletion or carried a TP53 mutation. Most
Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E
Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237
Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro
The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293
Mu, John C; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Wong, Wing H; Lam, Hugo Y K
A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.
Alessandro M. Varani
Full Text Available The Xylella fastidiosa comparative genomic database is a scientific resource with the aim to provide a user-friendly interface for accessing high-quality manually curated genomic annotation and comparative sequence analysis, as well as for identifying and mapping prophage-like elements, a marked feature of Xylella genomes. Here we describe a database and tools for exploring the biology of this important plant pathogen. The hallmarks of this database are the high quality genomic annotation, the functional and comparative genomic analysis and the identification and mapping of prophage-like elements. It is available from web site http://www.xylella.lncc.br.
Links Matthew G
Full Text Available Abstract Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets. Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in
Heffner, Caleb S.; Herbert Pratt, C.; Babiuk, Randal P.; Sharma, Yashoda; Rockwood, Stephen F.; Donahue, Leah R.; Eppig, Janan T.; Murray, Stephen A.
Full realization of the value of the loxP-flanked alleles generated by the International Knockout Mouse Consortium will require a large set of well-characterized cre-driver lines. However, many cre driver lines display excision activity beyond the intended tissue or cell type, and these data are frequently unavailable to the potential user. Here we describe a high-throughput pipeline to extend characterization of cre driver lines to document excision activity in a wide range of tissues at multiple time points and disseminate these data to the scientific community. Our results show that the majority of cre strains exhibit some degree of unreported recombinase activity. In addition, we observe frequent mosaicism, inconsistent activity and parent-of-origin effects. Together, these results highlight the importance of deep characterization of cre strains, and provide the scientific community with a critical resource for cre strain information. PMID:23169059
Christopher J. Ricketts; Aguirre A. De Cubas; Huihui Fan; Christof C. Smith; Martin Lang; Ed Reznik; Reanne Bowlby; Ewan A. Gibb; Rehan Akbani; Rameen Beroukhim; Donald P. Bottaro; Toni K. Choueiri; Richard A. Gibbs; Andrew K. Godwin; Scott Haake
Summary: Renal cell carcinoma (RCC) is not a single disease, but several histologically defined cancers with different genetic drivers, clinical courses, and therapeutic responses. The current study evaluated 843 RCC from the three major histologic subtypes, including 488 clear cell RCC, 274 papillary RCC, and 81 chromophobe RCC. Comprehensive genomic and phenotypic analysis of the RCC subtypes reveals distinctive features of each subtype that provide the foundation for the development of sub...
Chavda, Kalyan D; Chen, Liang; Fouts, Derrick E; Sutton, Granger; Brinkac, Lauren; Jenkins, Stephen G; Bonomo, Robert A; Adams, Mark D; Kreiswirth, Barry N
Knowledge regarding the genomic structure of Enterobacter spp., the second most prevalent carbapenemase-producing Enterobacteriaceae, remains limited. Here we sequenced 97 clinical Enterobacter species isolates that were both carbapenem susceptible and resistant from various geographic regions to decipher the molecular origins of carbapenem resistance and to understand the changing phylogeny of these emerging and drug-resistant pathogens. Of the carbapenem-resistant isolates, 30 possessed bla KPC-2 , 40 had bla KPC-3 , 2 had bla KPC-4 , and 2 had bla NDM-1 Twenty-three isolates were carbapenem susceptible. Six genomes were sequenced to completion, and their sizes ranged from 4.6 to 5.1 Mbp. Phylogenomic analysis placed 96 of these genomes, 351 additional Enterobacter genomes downloaded from NCBI GenBank, and six newly sequenced type strains into 19 phylogenomic groups-18 groups (A to R) in the Enterobacter cloacae complex and Enterobacter aerogenes Diverse mechanisms underlying the molecular evolutionary trajectory of these drug-resistant Enterobacter spp. were revealed, including the acquisition of an antibiotic resistance plasmid, followed by clonal spread, horizontal transfer of bla KPC -harboring plasmids between different phylogenomic groups, and repeated transposition of the bla KPC gene among different plasmid backbones. Group A, which comprises multilocus sequence type 171 (ST171), was the most commonly identified (23% of isolates). Genomic analysis showed that ST171 isolates evolved from a common ancestor and formed two different major clusters; each acquiring unique bla KPC -harboring plasmids, followed by clonal expansion. The data presented here represent the first comprehensive study of phylogenomic interrogation and the relationship between antibiotic resistance and plasmid discrimination among carbapenem-resistant Enterobacter spp., demonstrating the genetic diversity and complexity of the molecular mechanisms driving antibiotic resistance in this
Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling
Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.
Eppig, Janan T
The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. © The Author 2017. Published by Oxford University Press.
The Molecular Screening Shared Resource (MSSR) offers a comprehensive range of leading-edge high throughput screening (HTS) services including drug discovery, chemical and functional genomics, and novel methods for nano and environmental toxicology. The MSSR is an open access environment with investigators from UCLA as well as from the entire globe. Industrial clients are equally welcome as are non-profit entities. The MSSR is a fee-for-service entity and does not retain intellectual property. In conjunction with the Center for Environmental Implications of Nanotechnology, the MSSR is unique in its dedicated and ongoing efforts towards high throughput toxicity testing of nanomaterials. In addition, the MSSR engages in technology development eliminating bottlenecks from the HTS workflow and enabling novel assays and readouts currently not available.
Law, MeiYee; Shaw, David R
Mouse Genome Informatics (MGI, http://www.informatics.jax.org/ ) web resources provide free access to meticulously curated information about the laboratory mouse. MGI's primary goal is to help researchers investigate the genetic foundations of human diseases by translating information from mouse phenotypes and disease models studies to human systems. MGI provides comprehensive phenotypes for over 50,000 mutant alleles in mice and provides experimental model descriptions for over 1500 human diseases. Curated data from scientific publications are integrated with those from high-throughput phenotyping and gene expression centers. Data are standardized using defined, hierarchical vocabularies such as the Mammalian Phenotype (MP) Ontology, Mouse Developmental Anatomy and the Gene Ontologies (GO). This chapter introduces you to Gene and Allele Detail pages and provides step-by-step instructions for simple searches and those that take advantage of the breadth of MGI data integration.
Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...
Singh Tej P
Full Text Available Abstract Background The Ras superfamily plays an important role in the control of cell signalling and division. Mutations in the Ras genes convert them into active oncogenes. The Ras oncogenes form a major thrust of global cancer research as they are involved in the development and progression of tumors. This has resulted in the exponential growth of data on Ras superfamily across different public databases and in literature. However, no dedicated public resource is currently available for data mining and analysis on this family. The present database was developed to facilitate straightforward accession, retrieval and analysis of information available on Ras oncogenes from one particular site. Description We have developed the RAS Oncogene Database (RASOnD as a comprehensive knowledgebase that provides integrated and curated information on a single platform for oncogenes of Ras superfamily. RASOnD encompasses exhaustive genomics and proteomics data existing across diverse publicly accessible databases. This resource presently includes overall 199,046 entries from 101 different species. It provides a search tool to generate information about their nucleotide and amino acid sequences, single nucleotide polymorphisms, chromosome positions, orthologies, motifs, structures, related pathways and associated diseases. We have implemented a number of user-friendly search interfaces and sequence analysis tools. At present the user can (i browse the data (ii search any field through a simple or advance search interface and (iii perform a BLAST search and subsequently CLUSTALW multiple sequence alignment by selecting sequences of Ras oncogenes. The Generic gene browser, GBrowse, JMOL for structural visualization and TREEVIEW for phylograms have been integrated for clear perception of retrieved data. External links to related databases have been included in RASOnD. Conclusions This database is a resource and search tool dedicated to Ras oncogenes. It has
Full Text Available Abstract Background Renowned for their fast growth, valuable wood properties and wide adaptability, Eucalyptus species are amongst the most planted hardwoods in the world, yet they are still at the early stages of domestication because conventional breeding is slow and costly. Thus, there is huge potential for marker-assisted breeding programs to improve traits such as wood properties. To this end, the sequencing, analysis and annotation of a large collection of expressed sequences tags (ESTs from genes involved in wood formation in Eucalyptus would provide a valuable resource. Results We report here the normalization and sequencing of a cDNA library from developing Eucalyptus secondary xylem, as well as the construction and sequencing of two subtractive libraries (juvenile versus mature wood and vice versa. A total of 9,222 high quality sequences were collected from about 10,000 cDNA clones. The EST assembly generated a set of 3,857 wood-related unigenes including 2,461 contigs (Cg and 1,396 singletons (Sg that we named 'EUCAWOOD'. About 65% of the EUCAWOOD sequences produced matches with poplar, grapevine, Arabidopsis and rice protein sequence databases. BlastX searches of the Uniref100 protein database allowed us to allocate gene ontology (GO and protein family terms to the EUCAWOOD unigenes. This annotation of the EUCAWOOD set revealed key functional categories involved in xylogenesis. For instance, 422 sequences matched various gene families involved in biosynthesis and assembly of primary and secondary cell walls. Interestingly, 141 sequences were annotated as transcription factors, some of them being orthologs of regulators known to be involved in xylogenesis. The EUCAWOOD dataset was also mined for genomic simple sequence repeat markers, yielding a total of 639 putative microsatellites. Finally, a publicly accessible database was created, supporting multiple queries on the EUCAWOOD dataset. Conclusion In this work, we have identified a
Cohen, Philip R; Tomson, Brett N; Elkin, Sheryl K; Marchlik, Erica; Carter, Jennifer L; Kurzrock, Razelle
Merkel cell carcinoma is an ultra-rare cutaneous neuroendocrine cancer for which approved treatment options are lacking. To better understand potential actionability, the genomic landscape of Merkel cell cancers was assessed. The molecular aberrations in 17 patients with Merkel cell carcinoma were, on physician request, tested in a Clinical Laboratory Improvement Amendments (CLIA) laboratory (Foundation Medicine, Cambridge, MA) using next-generation sequencing (182 or 236 genes) and analyzed by N-of-One, Inc. (Lexington, MA). There were 30 genes harboring aberrations and 60 distinct molecular alterations identified in this patient population. The most common abnormalities involved the TP53 gene (12/17 [71% of patients]) and the cell cycle pathway (CDKN2A/B, CDKN2C or RB1) (12/17 [71%]). Abnormalities also were observed in the PI3K/AKT/mTOR pathway (AKT2, FBXW7, NF1, PIK3CA, PIK3R1, PTEN or RICTOR) (9/17 [53%]) and DNA repair genes (ATM, BAP1, BRCA1/2, CHEK2, FANCA or MLH1) (5/17 [29%]). Possible cognate targeted therapies, including FDA-approved drugs, could be identified in most of the patients (16/17 [94%]). In summary, Merkel cell carcinomas were characterized by multiple distinct aberrations that were unique in the majority of analyzed cases. Most patients had theoretically actionable alterations. These results provide a framework for investigating tailored combinations of matched therapies in Merkel cell carcinoma patients.
Full Text Available Water resources carrying capacity is the maximum available water resources supporting by the social and economic development. Based on investigating and statisticing on the current situation of water resources in Shandong Province, this paper selects 13 factors including per capita water resources, water resources utilization, water supply modulus, rainfall, per capita GDP, population density, per capita water consumption, water consumption per million yuan, The water consumption of industrial output value, the agricultural output value of farmland, the irrigation rate of cultivated land, the water consumption rate of ecological environment and the forest coverage rate were used as the evaluation factors. Then,the fuzzy comprehensive evaluation model was used to analyze the water resources carrying capacity Force status evaluation. The results showed : The comprehensive evaluation results of water resources in Shandong Province were lower than 0.6 in 2001-2009 and higher than 0.6 in 2010-2015, which indicating that the water resources carrying capacity of Shandong Province has been improved.; In addition, most of the years a value of less than 0.6, individual years below 0.4, the interannual changes are relatively large, from that we can see the level of water resources is generally weak, the greater the interannual changes in Shandong Province.
Feltus Frank A
Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of
Eric R Gamazon
Full Text Available Microarray gene expression data has been used in genome-wide association studies to allow researchers to study gene regulation as well as other complex phenotypes including disease risks and drug response. To reach scientifically sound conclusions from these studies, however, it is necessary to get reliable summarization of gene expression intensities. Among various factors that could affect expression profiling using a microarray platform, single nucleotide polymorphisms (SNPs in target mRNA may lead to reduced signal intensity measurements and result in spurious results. The recently released 1000 Genomes Project dataset provides an opportunity to evaluate the distribution of both known and novel SNPs in the International HapMap Project lymphoblastoid cell lines (LCLs. We mapped the 1000 Genomes Project genotypic data to the Affymetrix GeneChip Human Exon 1.0ST array (exon array, which had been used in our previous studies and for which gene expression data had been made publicly available. We also evaluated the potential impact of these SNPs on the differentially spliced probesets we had identified previously. Though the 1000 Genomes Project data allowed a comprehensive survey of the SNPs in this particular array, the same approach can certainly be applied to other microarray platforms. Furthermore, we present a detailed catalogue of SNP-containing probesets (exon-level and transcript clusters (gene-level, which can be considered in evaluating findings using the exon array as well as benefit the design of follow-up experiments and data re-analysis.
McDonald, Sandra A; Mardis, Elaine R; Ota, David; Watson, Mark A; Pfeifer, John D; Green, Jonathan M
As part of the molecular revolution sweeping medicine, comprehensive genomic studies are adding powerful dimensions to medical research. However, their power exposes new regulatory, strategic, and quality assurance challenges for biorepositories. A key issue is that unlike other research techniques commonly applied to banked specimens, nucleic acid sequencing, if sufficiently extensive, yields data that could identify a patient. This evolving paradigm renders the concepts of anonymized and anonymous specimens increasingly outdated. The challenges for biorepositories in this new era include refined consent processes and wording, selection and use of legacy specimens, quality assurance procedures, institutional documentation, data sharing, and interaction with institutional review boards. Given current trends, biorepositories should consider these issues now, even if they are not currently experiencing sample requests for genomic analysis. We summarize our current experiences and best practices at Washington University Medical School, St Louis, MO, our perceptions of emerging trends, and recommendations.
Sasaki, Naobumi V.; Sato, Naoki
Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Protein...
Obesity and type 2 diabetes (T2D) are two major conditions that are related to metabolic disorders and affect a large population. Although there have been significant efforts to identify their therapeutic targets, few benefits have come from comprehensive molecular profiling. This limited availability of comprehensive molecular profiling of obesity and T2D may be due to multiple challenges, as these conditions involve multiple organs and collecting tissue samples from subjects is more difficult in obesity and T2D than in other diseases, where surgical treatments are popular choices. While there is no repository of comprehensive molecular profiling data for obesity and T2D, multiple existing data resources can be utilized to cover various aspects of these conditions. This review presents studies with available genomic data resources for obesity and T2D and discusses genome-wide association studies (GWAS), a knockout (KO)-based phenotyping study, and gene expression profiles. These studies, based on their assessed coverage and characteristics, can provide insights into how such data can be utilized to identify therapeutic targets for obesity and T2D.
Hemert, van S.; Ebbelaar, B.H.; Smits, M.A.; Rebel, J.M.J.
Expressed sequenced tags (ESTs) and microarray resources have a great impact on the ability to study host response in mice and humans. Unfortunately, these resources are not yet available for domestic farm animals. The aim of this study was to provide genomic resources to study chicken intestinal
Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E
Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.
Genome references are foundational for high quality entomological research today. Species, sub populations and taxonomy are defined by gene flow and genome sequences. Gene content in arthropods is often directly reflective of life history, for example, diet and symbiont related gene loss is observed...
Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson
Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...
Bracken-Grissom, Heather; Collins, Allen G.; Collins, Timothy; Crandall, Keith; Distel, Daniel; Dunn, Casey; Giribet, Gonzalo; Haddock, Steven; Knowlton, Nancy; Martindale, Mark; Medina, Monica; Messing, Charles; O'Brien, Stephen J.; Paulay, Gustav; Putnam, Nicolas; Ravasi, Timothy; Rouse, Greg W.; Ryan, Joseph F.; Schulze, Anja; Worheide, Gert; Adamska, Maja; Bailly, Xavier; Breinholt, Jesse; Browne, William E.; Diaz, M. Christina; Evans, Nathaniel; Flot, Jean-Francois; Fogarty, Nicole; Johnston, Matthew; Kamel, Bishoy; Kawahara, Akito Y.; Laberge, Tammy; Lavrov, Dennis; Michonneau, Francois; Moroz, Leonid L.; Oakley, Todd; Osborne, Karen; Pomponi, Shirley A.; Rhodes, Adelaide; Rodriguez-Lanetty, Mauricio; Santos, Scott R.; Satoh, Nori; Thacker, Robert W.; Van de Peer, Yves; Voolstra, Christian R.; Welch, David Mark; Winston, Judith; Zhou, Xin
Over 95% of all metazoan (animal) species comprise the invertebrates, but very few genomes from these organisms have been sequenced. We have, therefore, formed a Global Invertebrate Genomics Alliance (GIGA). Our intent is to build a collaborative
Full Text Available A vast majority of the burden from neglected tropical diseases result from helminth infections (nematodes and platyhelminthes. Parasitic helminthes infect over 2 billion, exerting a high collective burden that rivals high-mortality conditions such as AIDS or malaria, and cause devastation to crops and livestock. The challenges to improve control of parasitic helminth infections are multi-fold and no single category of approaches will meet them all. New information such as helminth genomics, functional genomics and proteomics coupled with innovative bioinformatic approaches provide fundamental molecular information about these parasites, accelerating both basic research as well as development of effective diagnostics, vaccines and new drugs. To facilitate such studies we have developed an online resource, HelmCoP (Helminth Control and Prevention, built by integrating functional, structural and comparative genomic data from plant, animal and human helminthes, to enable researchers to develop strategies for drug, vaccine and pesticide prioritization, while also providing a useful comparative genomics platform. HelmCoP encompasses genomic data from several hosts, including model organisms, along with a comprehensive suite of structural and functional annotations, to assist in comparative analyses and to study host-parasite interactions. The HelmCoP interface, with a sophisticated query engine as a backbone, allows users to search for multi-factorial combinations of properties and serves readily accessible information that will assist in the identification of various genes of interest. HelmCoP is publicly available at: http://www.nematode.net/helmcop.html.
Ball, M.P.; Thakuria, J.V.; Zaranek, A.W.; Clegg, T.; Rosenbaum, A.M.; Wu, X.; Angrist, M.; Bhak, J.; Bobe, J.; Callow, M.J.; Cano, C.; Chou, M.F.; Chung, W.K.; Douglas, S.M.; Estep, P.W.; Gore, A.; Hulick, P.; Labarga, A.; Lee, J.-H.; Lunshof, J.E.; Kim, B.C.; Kim, J.L.; Li, Z.; Murray, M.F.; Nilsen, G.B.; Peters, B.A.; Raman, A.M.; Rienhoff, H.Y.; Robasky, K.; Wheeler, M.T.; Vandewege, W.; Vorhaus, D.B.; Yang, Y.L.; Yang, L.; Aach, J.; Ashley, E.A.; Drmanac, R.; Kim, S.-J.; Li, J.B.; Peshkin, L.; Seidman, S.E.; Seo, J.-S.; Zhang, K.; Rehm, H.L.; Church, G.M.
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the
Full Text Available This article presents basic activities and experience of the Federal Resource Center for Organizing Comprehensive Sup¬port for Children with ASD of Moscow state university of psychology & education, amassed during 22 years of practice. Some statistic data on the center’s activity are displayed. Emphasis is done on multidirectional work and developing ways of interdepartmental and networking interaction for the sake of founding a system of complex support for autistic children in Russian Federation.
Lee, Mikyung; Kim, Yangseok
Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules. To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square
Voytas Daniel F
Full Text Available Abstract Background Zinc Finger Nucleases (ZFNs have tremendous potential as tools to facilitate genomic modifications, such as precise gene knockouts or gene replacements by homologous recombination. ZFNs can be used to advance both basic research and clinical applications, including gene therapy. Recently, the ability to engineer ZFNs that target any desired genomic DNA sequence with high fidelity has improved significantly with the introduction of rapid, robust, and publicly available techniques for ZFN design such as the Oligomerized Pool ENgineering (OPEN method. The motivation for this study is to make resources for genome modifications using OPEN-generated ZFNs more accessible to researchers by creating a user-friendly interface that identifies and provides quality scores for all potential ZFN target sites in the complete genomes of several model organisms. Description ZFNGenome is a GBrowse-based tool for identifying and visualizing potential target sites for OPEN-generated ZFNs. ZFNGenome currently includes a total of more than 11.6 million potential ZFN target sites, mapped within the fully sequenced genomes of seven model organisms; S. cerevisiae, C. reinhardtii, A. thaliana, D. melanogaster, D. rerio, C. elegans, and H. sapiens and can be visualized within the flexible GBrowse environment. Additional model organisms will be included in future updates. ZFNGenome provides information about each potential ZFN target site, including its chromosomal location and position relative to transcription initiation site(s. Users can query ZFNGenome using several different criteria (e.g., gene ID, transcript ID, target site sequence. Tracks in ZFNGenome also provide "uniqueness" and ZiFOpT (Zinc Finger OPEN Targeter "confidence" scores that estimate the likelihood that a chosen ZFN target site will function in vivo. ZFNGenome is dynamically linked to ZiFDB, allowing users access to all available information about zinc finger reagents, such as the
Sasaki, Naobumi V; Sato, Naoki
Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.
Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu
The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.
Full Text Available The development of next-generation sequencing (NGS technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.
Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin
The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.
Subhash C Verma
Full Text Available Kaposi's sarcoma associated herpesvirus is tightly linked to multiple human malignancies including Kaposi's sarcoma (KS, Primary Effusion Lymphoma (PEL and Multicentric Castleman's Disease (MCD. KSHV like other herpesviruses establishes life-long latency in the infected host by persisting as chromatin and tethering to host chromatin through the virally encoded protein Latency Associated Nuclear Antigen (LANA. LANA, a multifunctional protein, is capable of binding to a large number of cellular proteins responsible for transcriptional regulation of various cellular and viral pathways involved in blocking cell death and promoting cell proliferation. This leads to enhanced cell division and replication of the viral genome, which segregates faithfully in the dividing tumor cells. The mechanism of genome segregation is well known and the binding of LANA to nucleosomal proteins, throughout the cell cycle, suggests that these interactions play an important role in efficient segregation. Various biochemical methods have identified a large number of LANA binding proteins, including histone H2A/H2B, histone H1, MeCP2, DEK, CENP-F, NuMA, Bub1, HP-1, and Brd4. These nucleosomal proteins may have various functions in tethering of the viral genome during specific phases of the viral life cycle. Therefore, we performed a comprehensive analysis of their interaction with LANA using a number of different assays. We show that LANA binds to core nucleosomal histones and also associates with other host chromatin proteins including histone H1 and high mobility group proteins (HMGs. We used various biochemical assays including co-immunoprecipitation and in-vivo localization by split GFP and fluorescence resonance energy transfer (FRET to demonstrate their association.
We created a public, searchable DNA sequence resource for sheep that contained approximately 14x whole genome sequence of 96 rams. The animals represent 10 popular U.S. breeds and share minimal pedigree relationships, making the resource suitable for viewing gene variants in the user-friendly Integ...
Sarah Grace Prescott
A comparative review of several companies that offer similar kits or services that allow students to isolate DNA (human and others), amplify it by PCR, and in some cases sequence the resulting sample. The companies include: Carolina® Biological Supply Company, Bio-Rad®, Edvotek® Inc., Hiram Genomics Store, and 23andMe.
Oyster aquaculture is an important sector of world food production. As such, it is imperative to develop a high quality reference genome for the eastern oyster, Crassostrea virginica, to assist in the elucidation of the genomic basis of commercially important traits. All genetic, gene expression and...
Sarah Grace Prescott
Full Text Available A comparative review of several companies that offer similar kits or services that allow students to isolate DNA (human and others, amplify it by PCR, and in some cases sequence the resulting sample. The companies include: Carolina® Biological Supply Company, Bio-Rad®, Edvotek® Inc., Hiram Genomics Store, and 23andMe.
Over 95% of all metazoan (animal) species comprise the invertebrates, but very few genomes from these organisms have been sequenced. We have, therefore, formed a Global Invertebrate Genomics Alliance (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site () has been launched to facilitate this collaborative venture.
Full Text Available Microsporidia have attracted considerable attention because they infect a wide range of hosts, from invertebrates to vertebrates, and cause serious human diseases and major economic losses in the livestock industry. There are no prospective drugs to counteract this pathogen. Eukaryotic protein kinases (ePKs play a central role in regulating many essential cellular processes and are therefore potential drug targets. In this study, a comprehensive summary and comparative analysis of the protein kinases in four microsporidia—Enterocytozoon bieneusi, Encephalitozoon cuniculi, Nosema bombycis and Nosema ceranae—was performed. The results show that there are 34 ePKs and 4 atypical protein kinases (aPKs in E. bieneusi, 29 ePKs and 6 aPKs in E. cuniculi, 41 ePKs and 5 aPKs in N. bombycis, and 27 ePKs and 4 aPKs in N. ceranae. These data support the previous conclusion that the microsporidian kinome is the smallest eukaryotic kinome. Microsporidian kinomes contain only serine-threonine kinases and do not contain receptor-like and tyrosine kinases. Many of the kinases related to nutrient and energy signaling and the stress response have been lost in microsporidian kinomes. However, cell cycle-, development- and growth-related kinases, which are important to parasites, are well conserved. This reduction of the microsporidian kinome is in good agreement with genome compaction, but kinome density is negatively correlated with proteome size. Furthermore, the protein kinases in each microsporidian genome are under strong purifying selection pressure. No remarkable differences in kinase family classification, domain features, gain and/or loss, and selective pressure were observed in these four species. Although microsporidia adapt to different host types, the coevolution of microsporidia and their hosts was not clearly reflected in the protein kinases. Overall, this study enriches and updates the microsporidian protein kinase database and may provide
Full Text Available Abstract Background Medicago truncatula has been chosen as a model species for genomic studies. It is closely related to an important legume, alfalfa. Transporters are a large group of membrane-spanning proteins. They deliver essential nutrients, eject waste products, and assist the cell in sensing environmental conditions by forming a complex system of pumps and channels. Although studies have effectively characterized individual M. truncatula transporters in several databases, until now there has been no available systematic database that includes all transporters in M. truncatula. Description The M. truncatula transporter database (MTDB contains comprehensive information on the transporters in M. truncatula. Based on the TransportTP method, we have presented a novel prediction pipeline. A total of 3,665 putative transporters have been annotated based on International Medicago Genome Annotated Group (IMGAG V3.5 V3 and the M. truncatula Gene Index (MTGI V10.0 releases and assigned to 162 families according to the transporter classification system. These families were further classified into seven types according to their transport mode and energy coupling mechanism. Extensive annotations referring to each protein were generated, including basic protein function, expressed sequence tag (EST mapping, genome locus, three-dimensional template prediction, transmembrane segment, and domain annotation. A chromosome distribution map and text-based Basic Local Alignment Search Tools were also created. In addition, we have provided a way to explore the expression of putative M. truncatula transporter genes under stress treatments. Conclusions In summary, the MTDB enables the exploration and comparative analysis of putative transporters in M. truncatula. A user-friendly web interface and regular updates make MTDB valuable to researchers in related fields. The MTDB is freely available now to all users at http://bioinformatics.cau.edu.cn/MtTransporter/.
Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R
The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.
Sokan-Adeaga, Adewale Allen; Ana, Godson R E E
The quest for biofuels in Nigeria, no doubt, represents a legitimate ambition. This is so because the focus on biofuel production has assumed a global dimension, and the benefits that may accrue from such effort may turn out to be enormous if the preconditions are adequately satisfied. As a member of the global community, it has become exigent for Nigeria to explore other potential means of bettering her already impoverished economy. Biomass is the major energy source in Nigeria, contributing about 78% of Nigeria's primary energy supply. In this paper, a comprehensive review of the potential of biomass resources and biofuel production in Nigeria is given. The study adopted a desk review of existing literatures on major energy crops produced in Nigeria. A brief description of the current biofuel developmental activities in the country is also given. A variety of biomass resources exist in the country in large quantities with opportunities for expansion. Biomass resources considered include agricultural crops, agricultural crop residues, forestry resources, municipal solid waste, and animal waste. However, the prospects of achieving this giant stride appear not to be feasible in Nigeria. Although the focus on biofuel production may be a worthwhile endeavor in view of Nigeria's development woes, the paper argues that because Nigeria is yet to adequately satisfy the preconditions for such program, the effort may be designed to fail after all. To avoid this, the government must address key areas of concern such as food insecurity, environmental crisis, and blatant corruption in all quarters. It is concluded that given the large availability of biomass resources in Nigeria, there is immense potential for biofuel production from these biomass resources. With the very high potential for biofuel production, the governments as well as private investors are therefore encouraged to take practical steps toward investing in agriculture for the production of energy crops and the
Wenger, A. M.; Clarke, S. L.; Guturu, H.; Chen, J.; Schaar, B. T.; McLean, C. Y.; Bejerano, G.
The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
Full Text Available The objective of this study was to evaluate the usefulness of comprehensive chromosome screening (CCS using array comparative genomic hybridization (aCGH. The study included 1420 CCS cycles for recurrent miscarriage (n=203; repetitive implantation failure (n=188; severe male factor (n=116; previous trisomic pregnancy (n=33; and advanced maternal age (n=880. CCS was performed in cycles with fresh oocytes and embryos (n=774; mixed cycles with fresh and vitrified oocytes (n=320; mixed cycles with fresh and vitrified day-2 embryos (n=235; and mixed cycles with fresh and vitrified day-3 embryos (n=91. Day-3 embryo biopsy was performed and analyzed by aCGH followed by day-5 embryo transfer. Consistent implantation (range: 40.5–54.2% and pregnancy rates per transfer (range: 46.0–62.9% were obtained for all the indications and independently of the origin of the oocytes or embryos. However, a lower delivery rate per cycle was achieved in women aged over 40 years (18.1% due to the higher percentage of aneuploid embryos (85.3% and lower number of cycles with at least one euploid embryo available per transfer (40.3%. We concluded that aneuploidy is one of the major factors which affect embryo implantation.
Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.
Wenger, A. M.
The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
Full Text Available Mitochondrial disorders have the highest incidence among congenital metabolic disorders characterized by biochemical respiratory chain complex deficiencies. It occurs at a rate of 1 in 5,000 births, and has phenotypic and genetic heterogeneity. Mutations in about 1,500 nuclear encoded mitochondrial proteins may cause mitochondrial dysfunction of energy production and mitochondrial disorders. More than 250 genes that cause mitochondrial disorders have been reported to date. However exact genetic diagnosis for patients still remained largely unknown. To reveal this heterogeneity, we performed comprehensive genomic analyses for 142 patients with childhood-onset mitochondrial respiratory chain complex deficiencies. The approach includes whole mtDNA and exome analyses using high-throughput sequencing, and chromosomal aberration analyses using high-density oligonucleotide arrays. We identified 37 novel mutations in known mitochondrial disease genes and 3 mitochondria-related genes (MRPS23, QRSL1, and PNPLA4 as novel causative genes. We also identified 2 genes known to cause monogenic diseases (MECP2 and TNNI3 and 3 chromosomal aberrations (6q24.3-q25.1, 17p12, and 22q11.21 as causes in this cohort. Our approaches enhance the ability to identify pathogenic gene mutations in patients with biochemically defined mitochondrial respiratory chain complex deficiencies in clinical settings. They also underscore clinical and genetic heterogeneity and will improve patient care of this complex disorder.
Bruford, Michael W.; Ginja, Catarina; Hoffmann, Irene; Joost, Stéphane; Orozco-terWengel, Pablo; Alberto, Florian J.; Amaral, Andreia J.; Barbato, Mario; Biscarini, Filippo; Colli, Licia; Costa, Mafalda; Curik, Ino; Duruz, Solange; Ferenčaković, Maja; Fischer, Daniel; Fitak, Robert; Groeneveld, Linn F.; Hall, Stephen J. G.; Hanotte, Olivier; Hassan, Faiz-ul; Helsen, Philippe; Iacolina, Laura; Kantanen, Juha; Leempoel, Kevin; Lenstra, Johannes A.; Ajmone-Marsan, Paolo; Masembe, Charles; Megens, Hendrik-Jan; Miele, Mara; Neuditschko, Markus; Nicolazzi, Ezequiel L.; Pompanon, François; Roosen, Jutta; Sevane, Natalia; Smetko, Anamarija; Štambuk, Anamaria; Streeter, Ian; Stucki, Sylvie; Supakorn, China; Telo Da Gama, Luis; Tixier-Boichard, Michèle; Wegmann, Daniel; Zhan, Xiangjiang
Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal Genomic Resources (FAnGR). However, these rapid changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and integrative methodologies needed to fully exploit new, multidimensional data. The final conference of the ESF Genomic Resources program aimed to address these interdisciplinary problems in an attempt to contribute to the agenda for research and policy development directions during the coming decade. By 2020, according to the Convention on Biodiversity's Aichi Target 13, signatories should ensure that “…the genetic diversity of …farmed and domesticated animals and of wild relatives …is maintained, and strategies have been developed and implemented for minimizing genetic erosion and safeguarding their genetic diversity.” However, the real extent of genetic erosion is very difficult to measure using current data. Therefore, this challenging target demands better coverage, understanding and utilization of genomic and environmental data, the development of optimized ways to integrate these data with social and other sciences and policy analysis to enable more flexible, evidence-based models to underpin FAnGR conservation. At the conference, we attempted to identify the most important problems for effective livestock genomic resource conservation during the next decade. Twenty priority questions were identified that could be broadly categorized into challenges related to methodology, analytical approaches, data management and conservation. It should be acknowledged here that while the focus of our meeting was predominantly around genetics, genomics and animal science, many of the practical challenges facing conservation of genomic resources are
Bruford, Michael W; Ginja, Catarina; Hoffmann, Irene; Joost, Stéphane; Orozco-terWengel, Pablo; Alberto, Florian J; Amaral, Andreia J; Barbato, Mario; Biscarini, Filippo; Colli, Licia; Costa, Mafalda; Curik, Ino; Duruz, Solange; Ferenčaković, Maja; Fischer, Daniel; Fitak, Robert; Groeneveld, Linn F; Hall, Stephen J G; Hanotte, Olivier; Hassan, Faiz-Ul; Helsen, Philippe; Iacolina, Laura; Kantanen, Juha; Leempoel, Kevin; Lenstra, Johannes A; Ajmone-Marsan, Paolo; Masembe, Charles; Megens, Hendrik-Jan; Miele, Mara; Neuditschko, Markus; Nicolazzi, Ezequiel L; Pompanon, François; Roosen, Jutta; Sevane, Natalia; Smetko, Anamarija; Štambuk, Anamaria; Streeter, Ian; Stucki, Sylvie; Supakorn, China; Telo Da Gama, Luis; Tixier-Boichard, Michèle; Wegmann, Daniel; Zhan, Xiangjiang
Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal Genomic Resources (FAnGR). However, these rapid changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and integrative methodologies needed to fully exploit new, multidimensional data. The final conference of the ESF Genomic Resources program aimed to address these interdisciplinary problems in an attempt to contribute to the agenda for research and policy development directions during the coming decade. By 2020, according to the Convention on Biodiversity's Aichi Target 13, signatories should ensure that "…the genetic diversity of …farmed and domesticated animals and of wild relatives …is maintained, and strategies have been developed and implemented for minimizing genetic erosion and safeguarding their genetic diversity." However, the real extent of genetic erosion is very difficult to measure using current data. Therefore, this challenging target demands better coverage, understanding and utilization of genomic and environmental data, the development of optimized ways to integrate these data with social and other sciences and policy analysis to enable more flexible, evidence-based models to underpin FAnGR conservation. At the conference, we attempted to identify the most important problems for effective livestock genomic resource conservation during the next decade. Twenty priority questions were identified that could be broadly categorized into challenges related to methodology, analytical approaches, data management and conservation. It should be acknowledged here that while the focus of our meeting was predominantly around genetics, genomics and animal science, many of the practical challenges facing conservation of genomic resources are
Signor, Sarah; Seher, Thaddeus; Kopp, Artyom
The development of genomic resources in non-model taxa is essential for understanding the genetic basis of biological diversity. Although the genomes of many Drosophila species have been sequenced, most of the phenotypic diversity in this genus remains to be explored. To facilitate the genetic analysis of interspecific and intraspecific variation, we have generated new genomic resources for seven species and subspecies in the D. ananassae species subgroup. We have generated large amounts of transcriptome sequence data for D. ercepeae, D. merina, D. bipectinata, D. malerkotliana malerkotliana, D. m. pallens, D. pseudoananassae pseudoananassae, and D. p. nigrens. de novo assembly resulted in contigs covering more than half of the predicted transcriptome and matching an average of 59% of annotated genes in the complete genome of D. ananassae. Most contigs, corresponding to an average of 49% of D. ananassae genes, contain sequence polymorphisms that can be used as genetic markers. Subsets of these markers were validated by genotyping the progeny of inter- and intraspecific crosses. The ananassae subgroup is an excellent model system for examining the molecular basis of speciation and phenotypic evolution. The new genomic resources will facilitate the genetic analysis of inter- and intraspecific differences in this lineage. Transcriptome sequencing provides a simple and cost-effective way to identify molecular markers at nearly single-gene density, and is equally applicable to any non-model taxa.
Full Text Available Agassiz's desert tortoise (Gopherus agassizii is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1 that turtles are among the slowest-evolving genome-enabled reptiles, (2 amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3 fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex.
Wegner, Molly F.
As students begin middle school, they are expected to possess and apply a wide array of nonfiction reading strategies if they are to comprehend new concepts from nonfiction texts. Although strategies and resource guides for fiction reading are available, an effective nonfiction reading comprehension resource guide tailored to middle school science teachers is lacking. The conceptual framework guiding this study is based on schema theory that supports the use of prior knowledge as a foundation for learning. The purpose of this project study was to address this local problem by providing middle school science teachers with a user-friendly resource for nonfiction reading comprehension strategies in a science context. The research question examined nonfiction reading comprehension strategies that could supplement middle school science teachers' instructional practices to increase student comprehension in science, as reflected on the results of state standardized tests. This project study consulted science and language arts teachers using a Delphi questionnaire technique to achieve a consensus through multiple iterations of questionnaires. Science teachers identified 7 areas of concern as students read nonfiction texts, and language arts teachers suggested effective reading comprehension strategies to address these areas. Based on the consensus of reading comprehension strategies and review of literature, a resource guide for middle school science teachers was created. By improving reading comprehension in content areas, teachers may not only increase student learning, but also underscore the importance of literacy relating to life-long learning through future occupations, academic endeavors, and society as well.
Full Text Available Parasteatoda tepidariorum is an increasingly popular model for the study of spider development and the evolution of development more broadly. However, fully understanding the regulation and evolution of P. tepidariorum development in comparison to other animals requires a genomic perspective. Although research on P. tepidariorum has provided major new insights, gene analysis to date has been limited to candidate gene approaches. Furthermore, the few available EST collections are based on embryonic transcripts, which have not been systematically annotated and are unlikely to contain transcripts specific to post-embryonic stages of development. We therefore generated cDNA from pooled embryos representing all described embryonic stages, as well as post-embryonic stages including nymphs, larvae and adults, and using Illumina HiSeq technology obtained a total of 625,076,514 100-bp paired end reads. We combined these data with 24,360 ESTs available in GenBank, and 1,040,006 reads newly generated from 454 pyrosequencing of a mixed-stage embryo cDNA library. The combined sequence data were assembled using a custom de novo assembly strategy designed to optimize assembly product length, number of predicted transcripts, and proportion of raw reads incorporated into the assembly. The de novo assembly generated 446,427 contigs with an N50 of 1,875 bp. These sequences obtained 62,799 unique BLAST hits against the NCBI non-redundant protein data base, including putative orthologs to 8,917 Drosophila melanogaster genes based on best reciprocal BLAST hit identity compared with the D. melanogaster proteome. Finally, we explored the utility of the transcriptome for RNA-Seq studies, and showed that this resource can be used as a mapping scaffold to detect differential gene expression in different cDNA libraries. This resource will therefore provide a platform for future genomic, gene expression and functional approaches using P. tepidariorum.
Goyal, A.; Tyagi, H.; Gosain, A. K.; Khosa, R.
Hydrological systems across the globe are getting increasingly water stressed with each passing season due to climate variability & snowballing water demand. Hence, to safeguard food, livelihood & economic security, it becomes imperative to employ scientific studies for holistic management of indispensable resource like water. However, hydrological study of any scale & purpose is heavily reliant on various spatio-temporal datasets which are not only difficult to discover/access but are also tough to use & manage. Besides, owing to diversity of water sector agencies & dearth of standard operating procedures, seamless information exchange is challenging for collaborators. Extensive research is being done worldwide to address these issues but regrettably not much has been done in developing countries like India. Therefore, the current study endeavours to develop a Hydrological Information System framework in a Web-GIS environment for empowering Indian water resources systems. The study attempts to harmonize the standards for metadata, terminology, symbology, versioning & archiving for effective generation, processing, dissemination & mining of data required for hydrological studies. Furthermore, modelers with humble computing resources at their disposal, can consume this standardized data in high performance simulation modelling using cloud computing within the developed Web-GIS framework. They can also integrate the inputs-outputs of different numerical models available on the platform and integrate their results for comprehensive analysis of the chosen hydrological system. Thus, the developed portal is an all-in-one framework that can facilitate decision makers, industry professionals & researchers in efficient water management.
Zhang, Xiaomeng; Wu, Deng; Chen, Liqun; Li, Xiang; Yang, Jinxurong; Fan, Dandan; Dong, Tingting; Liu, Mingyue; Tan, Puwen; Xu, Jintian; Yi, Ying; Wang, Yuting; Zou, Hua; Hu, Yongfei; Fan, Kaili; Kang, Juanjuan; Huang, Yan; Miao, Zhengqiang; Bi, Miaoman; Jin, Nana; Li, Kongning; Li, Xia; Xu, Jianzhen; Wang, Dong
Transcriptomic analyses have revealed an unexpected complexity in the eukaryote transcriptome, which includes not only protein-coding transcripts but also an expanding catalog of noncoding RNAs (ncRNAs). Diverse coding and noncoding RNAs (ncRNAs) perform functions through interaction with each other in various cellular processes. In this project, we have developed RAID (http://www.rna-society.org/raid), an RNA-associated (RNA–RNA/RNA–protein) interaction database. RAID intends to provide the scientific community with all-in-one resources for efficient browsing and extraction of the RNA-associated interactions in human. This version of RAID contains more than 6100 RNA-associated interactions obtained by manually reviewing more than 2100 published papers, including 4493 RNA–RNA interactions and 1619 RNA–protein interactions. Each entry contains detailed information on an RNA-associated interaction, including RAID ID, RNA/protein symbol, RNA/protein categories, validated method, expressing tissue, literature references (Pubmed IDs), and detailed functional description. Users can query, browse, analyze, and manipulate RNA-associated (RNA–RNA/RNA–protein) interaction. RAID provides a comprehensive resource of human RNA-associated (RNA–RNA/RNA–protein) interaction network. Furthermore, this resource will help in uncovering the generic organizing principles of cellular function network. PMID:24803509
Full Text Available Copy number variations (CNV include net gains or losses of part or whole chromosomal regions. They differ from copy neutral loss of heterozygosity (cn-LOH events which do not induce any net change in the copy number and are often associated with uniparental disomy. These phenomena have long been reported to be associated with diseases and particularly in cancer. Losses/gains of genomic regions are often correlated with lower/higher gene expression. On the other hand, loss of heterozygosity (LOH and cn-LOH are common events in cancer and may be associated with the loss of a functional tumor suppressor gene. Therefore, identifying recurrent CNV and cn-LOH events can be important as they may highlight common biological components and give insights into the development or mechanisms of a disease. However, no currently available tools allow a comprehensive whole-genome visualization of recurrent CNVs and cn-LOH in groups of samples providing absolute quantification of the aberrations leading to the loss of potentially important information.To overcome these limitations, we developed aCNViewer (Absolute CNV Viewer, a visualization tool for absolute CNVs and cn-LOH across a group of samples. aCNViewer proposes three graphical representations: dendrograms, bi-dimensional heatmaps showing chromosomal regions sharing similar abnormality patterns, and quantitative stacked histograms facilitating the identification of recurrent absolute CNVs and cn-LOH. We illustrated aCNViewer using publically available hepatocellular carcinomas (HCCs Affymetrix SNP Array data (Fig 1A. Regions 1q and 8q present a similar percentage of total gains but significantly different copy number gain categories (p-value of 0.0103 with a Fisher exact test, validated by another cohort of HCCs (p-value of 5.6e-7 (Fig 2B.aCNViewer is implemented in python and R and is available with a GNU GPLv3 license on GitHub https://github.com/FJD-CEPH/aCNViewer and Docker https://hub.docker.com/r/fjdceph/acnviewer/.aCNViewer@cephb.fr.
Renault, Victor; Tost, Jörg; Pichon, Fabien; Wang-Renault, Shu-Fang; Letouzé, Eric; Imbeaud, Sandrine; Zucman-Rossi, Jessica; Deleuze, Jean-François; How-Kit, Alexandre
Copy number variations (CNV) include net gains or losses of part or whole chromosomal regions. They differ from copy neutral loss of heterozygosity (cn-LOH) events which do not induce any net change in the copy number and are often associated with uniparental disomy. These phenomena have long been reported to be associated with diseases and particularly in cancer. Losses/gains of genomic regions are often correlated with lower/higher gene expression. On the other hand, loss of heterozygosity (LOH) and cn-LOH are common events in cancer and may be associated with the loss of a functional tumor suppressor gene. Therefore, identifying recurrent CNV and cn-LOH events can be important as they may highlight common biological components and give insights into the development or mechanisms of a disease. However, no currently available tools allow a comprehensive whole-genome visualization of recurrent CNVs and cn-LOH in groups of samples providing absolute quantification of the aberrations leading to the loss of potentially important information. To overcome these limitations, we developed aCNViewer (Absolute CNV Viewer), a visualization tool for absolute CNVs and cn-LOH across a group of samples. aCNViewer proposes three graphical representations: dendrograms, bi-dimensional heatmaps showing chromosomal regions sharing similar abnormality patterns, and quantitative stacked histograms facilitating the identification of recurrent absolute CNVs and cn-LOH. We illustrated aCNViewer using publically available hepatocellular carcinomas (HCCs) Affymetrix SNP Array data (Fig 1A). Regions 1q and 8q present a similar percentage of total gains but significantly different copy number gain categories (p-value of 0.0103 with a Fisher exact test), validated by another cohort of HCCs (p-value of 5.6e-7) (Fig 2B). aCNViewer is implemented in python and R and is available with a GNU GPLv3 license on GitHub https://github.com/FJD-CEPH/aCNViewer and Docker https
Bruford, Michael W; Ginja, Catarina; Hoffmann, Irene; Joost, Stéphane; Orozco-terWengel, Pablo; Alberto, Florian J; Amaral, Andreia J; Barbato, Mario; Biscarini, Filippo; Colli, Licia; Costa, Mafalda; Curik, Ino; Duruz, Solange; Ferenčaković, Maja; Fischer, Daniel; Fitak, Robert; Groeneveld, Linn F; Hall, Stephen J G; Hanotte, Olivier; Hassan, Faiz-Ul; Helsen, Philippe; Iacolina, Laura; Kantanen, Juha; Leempoel, Kevin; Lenstra, Johannes A; Ajmone-Marsan, Paolo; Masembe, Charles; Megens, Hendrik-Jan; Miele, Mara; Neuditschko, Markus; Nicolazzi, Ezequiel L; Pompanon, François; Roosen, Jutta; Sevane, Natalia; Smetko, Anamarija; Štambuk, Anamaria; Streeter, Ian; Stucki, Sylvie; Supakorn, China; Telo Da Gama, Luis; Tixier-Boichard, Michèle; Wegmann, Daniel; Zhan, Xiangjiang
Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal Genomic Resources (FAnGR). However,
Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.
We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,
Bruford, M.W.; Ginja, Catarina; Hoffmann, Irene; Megens, Hendrik Jan
Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal Genomic Resources (FAnGR).
Keane, Michael; Craig, Thomas; Alföldi, Jessica; Berlin, Aaron M; Johnson, Jeremy; Seluanov, Andrei; Gorbunova, Vera; Di Palma, Federica; Lindblad-Toh, Kerstin; Church, George M; de Magalhães, João Pedro
The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. © The Author 2014. Published by Oxford University Press.
Christopher J. Ricketts
Full Text Available Summary: Renal cell carcinoma (RCC is not a single disease, but several histologically defined cancers with different genetic drivers, clinical courses, and therapeutic responses. The current study evaluated 843 RCC from the three major histologic subtypes, including 488 clear cell RCC, 274 papillary RCC, and 81 chromophobe RCC. Comprehensive genomic and phenotypic analysis of the RCC subtypes reveals distinctive features of each subtype that provide the foundation for the development of subtype-specific therapeutic and management strategies for patients affected with these cancers. Somatic alteration of BAP1, PBRM1, and PTEN and altered metabolic pathways correlated with subtype-specific decreased survival, while CDKN2A alteration, increased DNA hypermethylation, and increases in the immune-related Th2 gene expression signature correlated with decreased survival within all major histologic subtypes. CIMP-RCC demonstrated an increased immune signature, and a uniform and distinct metabolic expression pattern identified a subset of metabolically divergent (MD ChRCC that associated with extremely poor survival. : Ricketts et al. find distinctive features of each RCC subtype, providing the foundation for development of subtype-specific therapeutic and management strategies. Somatic alteration of BAP1, PBRM1, and metabolic pathways correlates with subtype-specific decreased survival, while CDKN2A alteration, DNA hypermethylation, and Th2 immune signature correlate with decreased survival within all subtypes. Keywords: clear cell renal cell carcinoma, papillary renal cell carcinoma, chromophobe renal cell carcinoma, CDKN2A, DNA hypermethylation, immune signature, chromatin remodeling, TCGA, PanCanAtlas
Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong
Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.
Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori
Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.
Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R
Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P
The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.
Reeve, Wayne [Murdoch University
Wayne Reeve of Murdoch University on "Genomics Encyclopedia of Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB): a resource for microsymbiont genomes" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.
Michael William Bruford
Full Text Available Livestock conservation practice is changing rapidly in light of policy, climate change and market demands. The last decade saw a step change in technological and analytical approaches to define, manage and conserve Farm Animal Genomic Resources (FAnGR. These changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and the methodologies needed to exploit new, multidimensional data. The ESF Genomic Resources program final conference addressed these problems attempting to contribute to the development of the research and policy agenda for the next decade. We broadly identified four areas related to methodological and analytical challenges, data management and conservation. The overall conclusion is that there is a need for the use of current state-of-the-art tools to characterise the state of genomic resources in non-commercial and local breeds. The livestock genomic sector, which has been relatively well-organised in applying such methodologies so far, needs to make a concerted effort in the coming decade to enable to the democratisation of the powerful tools that are now at its disposal, and to ensure that they are applied in the context of breed conservation as well as development.
Full Text Available Abstract Background Accurate modeling of electrostatic potential and corresponding energies becomes increasingly important for understanding properties of biological macromolecules and their complexes. However, this is not an easy task due to the irregular shape of biological entities and the presence of water and mobile ions. Results Here we report a comprehensive suite for the well-known Poisson-Boltzmann solver, DelPhi, enriched with additional features to facilitate DelPhi usage. The suite allows for easy download of both DelPhi executable files and source code along with a makefile for local installations. The users can obtain the DelPhi manual and parameter files required for the corresponding investigation. Non-experienced researchers can download examples containing all necessary data to carry out DelPhi runs on a set of selected examples illustrating various DelPhi features and demonstrating DelPhi’s accuracy against analytical solutions. Conclusions DelPhi suite offers not only the DelPhi executable and sources files, examples and parameter files, but also provides links to third party developed resources either utilizing DelPhi or providing plugins for DelPhi. In addition, the users and developers are offered a forum to share ideas, resolve issues, report bugs and seek help with respect to the DelPhi package. The resource is available free of charge for academic users from URL: http://compbio.clemson.edu/DelPhi.php.
Zhang, Jie; Chowdhury, Souma; Messac, Achille
Highlights: • A more comprehensive metric is developed to accurately assess the quality of wind resources at a site. • WPP exploits the joint distribution of wind speed and direction, and yields more credible estimates. • WPP investigates the effect of wind distribution on the optimal net power generation of a farm. • The results show that WPD and WPP follow different trends. - Abstract: Currently, the quality of available wind energy at a site is assessed using wind power density (WPD). This paper proposes to use a more comprehensive metric: the wind power potential (WPP). While the former accounts for only wind speed information, the latter exploits the joint distribution of wind speed and wind direction and yields more credible estimates. The WPP investigates the effect of wind velocity distribution on the optimal net power generation of a farm. A joint distribution of wind speed and direction is used to characterize the stochastic variation of wind conditions. Two joint distribution methods are adopted in this paper: bivariate normal distribution and anisotropic lognormal method. The net power generation for a particular farmland size and installed capacity is maximized for different distributions of wind speed and wind direction, using the Unrestricted Wind Farm Layout Optimization (UWFLO) framework. A response surface is constructed to represent the computed maximum wind farm capacity factor as a function of the parameters of the wind distribution. Two different response surface methods are adopted in this paper: (i) the adaptive hybrid functions (AHF), and (ii) the quadratic response surface method (QRSM). Toward this end, for any farm site, we can (i) estimate the parameters of the joint distribution using recorded wind data (for bivariate normal or anisotropic lognormal distributions) and (ii) predict the maximum capacity factor for a specified farm size and capacity using this response surface. The WPP metric is illustrated using recorded wind
The Cool Season Food Legume Genome database (CSFL, www.coolseasonfoodlegume.org) is an online resource for genomics, genetics, and breeding research for chickpea, lentil,pea, and faba bean. The user-friendly and curated website allows for all publicly available map,marker,trait, gene,transcript, ger...
Qureshi, Abid; Thakur, Nishant; Monga, Isha; Thakur, Anamika; Kumar, Manoj
Viral microRNAs (miRNAs) regulate gene expression of viral and/or host genes to benefit the virus. Hence, miRNAs play a key role in host-virus interactions and pathogenesis of viral diseases. Lately, miRNAs have also shown potential as important targets for the development of novel antiviral therapeutics. Although several miRNA and their target repositories are available for human and other organisms in literature, but a dedicated resource on viral miRNAs and their targets are lacking. Therefore, we have developed a comprehensive viral miRNA resource harboring information of 9133 entries in three subdatabases. This includes 1308 experimentally validated miRNA sequences with their isomiRs encoded by 44 viruses in viral miRNA ' VIRMIRNA: ' and 7283 of their target genes in ' VIRMIRTAR': . Additionally, there is information of 542 antiviral miRNAs encoded by the host against 24 viruses in antiviral miRNA ' AVIRMIR': . The web interface was developed using Linux-Apache-MySQL-PHP (LAMP) software bundle. User-friendly browse, search, advanced search and useful analysis tools are also provided on the web interface. VIRmiRNA is the first specialized resource of experimentally proven virus-encoded miRNAs and their associated targets. This database would enhance the understanding of viral/host gene regulation and may also prove beneficial in the development of antiviral therapeutics. Database URL: http://crdd.osdd.net/servers/virmirna. © The Author(s) 2014. Published by Oxford University Press.
Clasen, Frederick Johannes; Pierneef, Rian Ewald; Slippers, Bernard; Reva, Oleg
Genomic islands (GIs) are inserts of foreign DNA that have potentially arisen through horizontal gene transfer (HGT). There are evidences that GIs can contribute significantly to the evolution of prokaryotes. The acquisition of GIs through HGT in eukaryotes has, however, been largely unexplored. In this study, the previously developed GI prediction tool, SeqWord Gene Island Sniffer (SWGIS), is modified to predict GIs in eukaryotic chromosomes. Artificial simulations are used to estimate ratios of predicting false positive and false negative GIs by inserting GIs into different test chromosomes and performing the SWGIS v2.0 algorithm. Using SWGIS v2.0, GIs are then identified in 36 fungal, 22 protozoan and 8 invertebrate genomes. SWGIS v2.0 predicts GIs in large eukaryotic chromosomes based on the atypical nucleotide composition of these regions. Averages for predicting false negative and false positive GIs were 20.1% and 11.01% respectively. A total of 10,550 GIs were identified in 66 eukaryotic species with 5299 of these GIs coding for at least one functional protein. The EuGI web-resource, freely accessible at http://eugi.bi.up.ac.za , was developed that allows browsing the database created from identified GIs and genes within GIs through an interactive and visual interface. SWGIS v2.0 along with the EuGI database, which houses GIs identified in 66 different eukaryotic species, and the EuGI web-resource, provide the first comprehensive resource for studying HGT in eukaryotes.
Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh
With "integration" as the direction,Shenzhen Comprehensive Transport Planning integrates the plan,construction and management of all kinds of transport mode in the transport system,and integrates the transport with the social,economic and environment development.The planning specifies the strategic targets,key indicators,development strategies as well as major policies of the comprehensive transport system,which explores an alternative way for the sustainable urban transport development under the condition of limited resources in Shenzhen.
Muñoz-Amatriaín, María; Mirebrahim, Hamid; Xu, Pei; Wanamaker, Steve I; Luo, MingCheng; Alhakami, Hind; Alpert, Matthew; Atokple, Ibrahim; Batieno, Benoit J; Boukar, Ousmane; Bozdag, Serdar; Cisse, Ndiaga; Drabo, Issa; Ehlers, Jeffrey D; Farmer, Andrew; Fatokun, Christian; Gu, Yong Q; Guo, Yi-Ning; Huynh, Bao-Lam; Jackson, Scott A; Kusi, Francis; Lawley, Cynthia T; Lucas, Mitchell R; Ma, Yaqin; Timko, Michael P; Wu, Jiajie; You, Frank; Barkley, Noelle A; Roberts, Philip A; Lonardi, Stefano; Close, Timothy J
Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought-prone climates, and a primary source of protein in sub-Saharan Africa and other parts of the developing world. However, genome resources for cowpea have lagged behind most other major crops. Here we describe foundational genome resources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K-499-35 include a whole-genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi-parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited-input small-holder farming and climate stress. © 2016 The Authors. The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.
Soto, Julio G.
A lesson was designed for lower division general education, non-major biology lecture-only course that included the historical and scientific context, some of the skills used to study the human genome, results, conclusions and ethical consideration. Students learn to examine and compare the published Human Genome maps, and employ the strategies…
Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi
Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and
Nestor Luis Lopez Corrales
Full Text Available The genomic integrity of two human pluripotent stem cells and their derived neuroprogenitor cell lines was studied, applying a combination of high-resolution genetic methodologies. The usefulness of combining array-comparative genomic hybridization (aCGH and multiplex fluorescence in situ hybridization (M-FISH techniques should be delineated to exclude/detect a maximum of possible genomic structural aberrations. Interestingly, in parts different genomic imbalances at chromosomal and subchromosomal levels were detected in pluripotent stem cells and their derivatives. Some of the copy number variations were inherited from the original cell line, whereas other modifications were presumably acquired during the differentiation and manipulation procedures. These results underline the necessity to study both pluripotent stem cells and their differentiated progeny by as many approaches as possible in order to assess their genomic stability before using them in clinical therapies.
Alkhasov, A. B.
Technology for the integrated development of low-temperature geothermal resources using the thermal and water potentials for various purposes is proposed. The heat of the thermal waters is utilized in a low-temperature district heating system and for heating the water in a hot water supply system. The water cooled in heat exchangers enters a chemical treatment system where it is conditioned into potable water quality and then forwarded to the household and potable water supply system. Efficient technologies for removal of arsenic and organic contaminants from the water have been developed. For the uninterrupted supply of the consumers with power, the technologies that use two and more types of renewable energy sources (RESs) have the best prospects. Technology for processing organic waste using the geothermal energy has been proposed. According to this technology, the geothermal water is divided into two flows, one of which is delivered to a biomass conversion system and the other is directed to a geothermal steam-gas power plant (GSGP). The wastewater arrives at the pump station from which it is pumped back into the bed. Upon drying, the biogas from the conversion system is delivered into the combustion chamber of a gas-turbine plant (GTP). The heat of the turbine exhaust gases is used in the GSGP to evaporate and reheat the low-boiling working medium. The working medium is heated in the GSGP to the evaporation temperature using the heat of the thermal water. High-temperature geothermal brines are the most promising for the comprehensive processing. According to the proposed technology, the heat energy of the brines is utilized to generate the electric power at a binary geothermal power station; the electric power is then used to extract the dissolved chemical components from the rest of the brine. The comprehensive utilization of high-temperature brines of the East-Precaucasian Artesian Basin will allow to completely satisfy the demand of Russia for lithium
diCenzo, George C; Zamani, Maryam; Milunovic, Branislava; Finan, Turlough M
The lack of an appropriate genomic platform has precluded the use of gain-of-function approaches to study the rhizobium-legume symbiosis, preventing the establishment of the genes necessary and sufficient for symbiotic nitrogen fixation (SNF) and potentially hindering synthetic biology approaches aimed at engineering this process. Here, we describe the development of an appropriate system by reverse engineering Sinorhizobium meliloti. Using a novel in vivo cloning procedure, the engA-tRNA-rmlC (ETR) region, essential for cell viability and symbiosis, was transferred from Sinorhizobium fredii to the ancestral location on the S. meliloti chromosome, rendering the ETR region on pSymB redundant. A derivative of this strain lacking both the large symbiotic replicons (pSymA and pSymB) was constructed. Transfer of pSymA and pSymB back into this strain restored symbiotic capabilities with alfalfa. To delineate the location of the single-copy genes essential for SNF on these replicons, we screened a S. meliloti deletion library, representing > 95% of the 2900 genes of the symbiotic replicons, for their phenotypes with alfalfa. Only four loci, accounting for < 12% of pSymA and pSymB, were essential for SNF. These regions will serve as our preliminary target of the minimal set of horizontally acquired genes necessary and sufficient for SNF. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
Zhou, Dan; Xu, Yang; Zhang, Cheng; Hu, Meng-Xue; Huang, Yun; Sun, Yan; Ma, Lei; Shen, Bo; Zhu, Chang-Liang
Anopheles sinensis is an important malaria vector in Southeast Asia. The widespread emergence of insecticide resistance in this mosquito species poses a serious threat to the efficacy of malaria control measures, particularly in China. Recently, the whole-genome sequencing and de novo assembly of An. sinensis (China strain) has been finished. A series of insecticide-resistant studies in An. sinensis have also been reported. There is a growing need to integrate these valuable data to provide a comprehensive database for further studies on insecticide-resistant management of An. sinensis. A bioinformatics database named An. sinensis genome database (ASGDB) was built. In addition to being a searchable database of published An. sinensis genome sequences and annotation, ASGDB provides in-depth analytical platforms for further understanding of the genomic and genetic data, including visualization of genomic data, orthologous relationship analysis, GO analysis, pathway analysis, expression analysis and resistance-related gene analysis. Moreover, ASGDB provides a panoramic view of insecticide resistance studies in An. sinensis in China. In total, 551 insecticide-resistant phenotypic and genotypic reports on An. sinensis distributed in Chinese malaria-endemic areas since the mid-1980s have been collected, manually edited in the same format and integrated into OpenLayers map-based interface, which allows the international community to assess and exploit the high volume of scattered data much easier. The database has been given the URL: http://www.asgdb.org /. ASGDB was built to help users mine data from the genome sequence of An. sinensis easily and effectively, especially with its advantages in insecticide resistance surveillance and control.
Full Text Available The fastest growing use of maize is for the production of fuel ethanol using the enzymatic conversion of corn starch to glucose and then to ethanol as well by converting the cellulosic (non-food) parts of maize to ethanol. However for the production...
Factor Database (VFDB specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.
Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh
Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.
Bhattarai, M D
The role of self-management education in diabetes and other major non-communicable diseases is clearly evident. To take care of and educate people with diabetes and other major NCD under the supervision of medical professionals and for education of other health care professionals, Comprehensive Diabetes and NCD Educators are needed in the routine service in peripheral health clinics and hospitals. The areas of training of CDNCD educator should match with the cost-effective interventions for diabetes and other major NCD that are feasible and planned for implementation in primary care in the low resource settings. Most of such interventions are part of diabetes education as required for Diabetes Self-Management Education programmes and traditional Diabetes Educator. The addition of use of inhaled steroids and bronchodilator in chronic respiratory disease and identification of presenting features of cancer, also required for many people with diabetes with various such common co-morbidities, will complete the areas of training of traditional Diabetes Educator as that of CDNCD Educator. Staff nurse and health assistants, who are as such already providing routine clinical service to all patients including with diabetes and major NCD in peripheral health clinics and hospitals, are most appropriate for CDNCD Educator training. The training of CDNCD Educator, like that of traditional Diabetes Educator, requires fulfilment of sufficient hours of practical work experience under supervision and achievement of the essential competencies entailing at least 6 month or more of intensive training schedules to be eligible to appear in its final certifying examination.
Du, Peina; Huang, Peide; Huang, Xuanlin
Oesophageal carcinoma is the fourth leading cause of cancer-related death in China, and more than 90% of these tumours are oesophageal squamous cell carcinoma (ESCC). Although several ESCC genomic sequencing studies have identified mutated somatic genes, the number of samples in each study...
Shariati J, Vahid; Malboobi, Mohammad Ali; Tabrizi, Zeinab; Tavakol, Elahe; Owilia, Parviz; Safari, Maryam
In this study, we provide a comparative genomic analysis of Pantoea agglomerans strain P5 and 10 closely related strains based on phylogenetic analyses. A next-generation shotgun strategy was implemented using the Illumina HiSeq 2500 technology followed by core- and pan-genome analysis. The genome of P. agglomerans strain P5 contains an assembly size of 5082485 bp with 55.4% G + C content. P. agglomerans consists of 2981 core and 3159 accessory genes for Coding DNA Sequences (CDSs) based on the pan-genome analysis. Strain P5 can be grouped closely with strains PG734 and 299 R using pan and core genes, respectively. All the predicted and annotated gene sequences were allocated to KEGG pathways. Accordingly, genes involved in plant growth-promoting (PGP) ability, including phosphate solubilization, IAA and siderophore production, acetoin and 2,3-butanediol synthesis and bacterial secretion, were assigned. This study provides an in-depth view of the PGP characteristics of strain P5, highlighting its potential use in agriculture as a biofertilizer.
Full Text Available Custom genome editing has become an essential element of molecular biology. In particular, the generation of fusion constructs with epitope tags or fluorescent proteins at the genomic locus facilitates the analysis of protein expression, localization, and interaction partners at physiologic levels. Following up on our initial publication, we now describe a considerably simplified, more efficient, and readily scalable experimental workflow for PCR-based genome editing in cultured Drosophila melanogaster cells. Our analysis at the act5C locus suggests that PCR-based homology arms of 60 bp are sufficient to reach targeting efficiencies of up to 80% after selection; extension to 80 bp (PCR or 500 bp (targeting vector did not further improve the yield. We have expanded our targeting system to N-terminal epitope tags; this also allows the generation of cell populations with heterologous expression control of the tagged locus via the copper-inducible mtnDE promoter. We present detailed, quantitative data on editing efficiencies for several genomic loci that may serve as positive controls or benchmarks in other laboratories. While our first PCR-based editing approach offered only blasticidin-resistance for selection, we now introduce puromycin-resistance as a second, independent selection marker; it is thus possible to edit two loci (e.g., for coimmunoprecipitation without marker removal. Finally, we describe a modified FLP recombinase expression plasmid that improves the efficiency of marker cassette FLP-out. In summary, our technique and reagents enable a flexible, robust, and cloning-free genome editing approach that can be parallelized for scale-up.
Full Text Available Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database (TaSSRDb is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169 from complex, hexaploid wheat genome (~17 GB along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb and lowest (74.57 SSRs/Mb SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability testing, EDV (Essentially Derived Variety/IV (Initial Variety disputes, seed
Full Text Available Abstract Background Babesiosis is a socioeconomically important tick-borne disease of animals (including humans caused by haemoprotozoan parasites. The severity of babesiosis relates to host and parasite factors, particularly virulence/pathogenicity. Although Babesia bovis is a particularly pathogenic species of cattle, there are species of Babesia of ruminants that have limited pathogenicity. For instance, the operational taxonomic unit Babesia sp. Xinjiang (abbreviated here as Bx of sheep from China is substantially less virulent/pathogenic than B. bovis is in cattle. Although the reason for this distinctiveness is presently unknown, it is possible that Bx has a reduced ability to adhere to cells or evade/suppress immune responses, which might relate to particular proteins, such as the variant erythrocyte surface antigens (VESAs. Results We sequenced and annotated the 8.4 Mb nuclear draft genome of Bx and compared it with those of B. bovis and B. bigemina by synteny analysis; we also investigated the genetic relationship of Bx with selected Babesia species and related apicomplexans for which genomic datasets are available, and explored the VESA complement in Bx. Conclusions The availability of the Bx genome now provides unique opportunities to elucidate aspects of the molecular biology, biochemistry and physiology of Bx, and to explore the reason(s for its limited virulence and/or apparent ability to evade immune attack by the host animal. Moreover, the present genomic resource and an in vitro culture system for Bx raises the prospect of establishing a functional genomic platform to explore essential genes as new intervention targets against babesiosis.
Guan, Guiquan; Korhonen, Pasi K; Young, Neil D; Koehler, Anson V; Wang, Tao; Li, Youquan; Liu, Zhijie; Luo, Jianxun; Yin, Hong; Gasser, Robin B
Babesiosis is a socioeconomically important tick-borne disease of animals (including humans) caused by haemoprotozoan parasites. The severity of babesiosis relates to host and parasite factors, particularly virulence/pathogenicity. Although Babesia bovis is a particularly pathogenic species of cattle, there are species of Babesia of ruminants that have limited pathogenicity. For instance, the operational taxonomic unit Babesia sp. Xinjiang (abbreviated here as Bx) of sheep from China is substantially less virulent/pathogenic than B. bovis is in cattle. Although the reason for this distinctiveness is presently unknown, it is possible that Bx has a reduced ability to adhere to cells or evade/suppress immune responses, which might relate to particular proteins, such as the variant erythrocyte surface antigens (VESAs). We sequenced and annotated the 8.4 Mb nuclear draft genome of Bx and compared it with those of B. bovis and B. bigemina by synteny analysis; we also investigated the genetic relationship of Bx with selected Babesia species and related apicomplexans for which genomic datasets are available, and explored the VESA complement in Bx. The availability of the Bx genome now provides unique opportunities to elucidate aspects of the molecular biology, biochemistry and physiology of Bx, and to explore the reason(s) for its limited virulence and/or apparent ability to evade immune attack by the host animal. Moreover, the present genomic resource and an in vitro culture system for Bx raises the prospect of establishing a functional genomic platform to explore essential genes as new intervention targets against babesiosis.
Zhang, Yingxiao; Iaffaldano, Brian J; Zhuang, Xiaofeng; Cardina, John; Cornish, Katrina
Rubber dandelion (Taraxacum kok-saghyz, TK) is being developed as a domestic source of natural rubber to meet increasing global demand. However, the domestication of TK is complicated by its colocation with two weedy dandelion species, Taraxacum brevicorniculatum (TB) and the common dandelion (Taraxacum officinale, TO). TB is often present as a seed contaminant within TK accessions, while TO is a pandemic weed, which may have the potential to hybridize with TK. To discriminate these species at the molecular level, and facilitate gene flow studies between the potential rubber crop, TK, and its weedy relatives, we generated genomic and marker resources for these three dandelion species. Complete chloroplast genome sequences of TK (151,338 bp), TO (151,299 bp), and TB (151,282 bp) were obtained using the Illumina GAII and MiSeq platforms. Chloroplast sequences were analyzed and annotated for all the three species. Phylogenetic analysis within Asteraceae showed that TK has a closer genetic distance to TB than to TO and Taraxacum species were most closely related to lettuce (Lactuca sativa). By sequencing multiple genotypes for each species and testing variants using gel-based methods, four chloroplast Single Nucleotide Polymorphism (SNP) variants were found to be fixed between TK and TO in large populations, and between TB and TO. Additionally, Expressed Sequence Tag (EST) resources developed for TO and TK permitted the identification of five nuclear species-specific SNP markers. The availability of chloroplast genomes of these three dandelion species, as well as chloroplast and nuclear molecular markers, will provide a powerful genetic resource for germplasm differentiation and purification, and the study of potential gene flow among Taraxacum species.
We report a comprehensive analysis of 412 muscle-invasive bladder cancers characterized by multiple TCGA analytical platforms. Fifty-eight genes were significantly mutated, and the overall mutational load was associated with APOBEC-signature mutagenesis. Clustering by mutation signature identified a high-mutation subset with 75% 5-year survival.
Hall, Neil; Karras, Marianna; Raine, J Dale; Carlton, Jane M; Kooij, Taco W A; Berriman, Matthew; Florens, Laurence; Janssen, Christoph S; Pain, Arnab; Christophides, Georges K; James, Keith; Rutherford, Kim; Harris, Barbara; Harris, David; Churcher, Carol; Quail, Michael A; Ormond, Doug; Doggett, Jon; Trueman, Holly E; Mendoza, Jacqui; Bidwell, Shelby L; Rajandream, Marie-Adele; Carucci, Daniel J; Yates, John R; Kafatos, Fotis C; Janse, Chris J; Barrell, Bart; Turner, C Michael R; Waters, Andrew P; Sinden, Robert E
Plasmodium berghei and Plasmodium chabaudi are widely used model malaria species. Comparison of their genomes, integrated with proteomic and microarray data, with the genomes of Plasmodium falciparum and Plasmodium yoelii revealed a conserved core of 4500 Plasmodium genes in the central regions of the 14 chromosomes and highlighted genes evolving rapidly because of stage-specific selective pressures. Four strategies for gene expression are apparent during the parasites' life cycle: (i) housekeeping; (ii) host-related; (iii) strategy-specific related to invasion, asexual replication, and sexual development; and (iv) stage-specific. We observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3' untranslated region motif is implicated in this process.
Skvortsova, Ksenia; Zotenko, Elena; Luu, Phuc-Loi; Gould, Cathryn M; Nair, Shalima S; Clark, Susan J; Stirzaker, Clare
The discovery that 5-methylcytosine (5mC) can be oxidized to 5-hydroxymethylcytosine (5hmC) by the ten-eleven translocation (TET) proteins has prompted wide interest in the potential role of 5hmC in reshaping the mammalian DNA methylation landscape. The gold-standard bisulphite conversion technologies to study DNA methylation do not distinguish between 5mC and 5hmC. However, new approaches to mapping 5hmC genome-wide have advanced rapidly, although it is unclear how the different methods compare in accurately calling 5hmC. In this study, we provide a comparative analysis on brain DNA using three 5hmC genome-wide approaches, namely whole-genome bisulphite/oxidative bisulphite sequencing (WG Bis/OxBis-seq), Infinium HumanMethylation450 BeadChip arrays coupled with oxidative bisulphite (HM450K Bis/OxBis) and antibody-based immunoprecipitation and sequencing of hydroxymethylated DNA (hMeDIP-seq). We also perform loci-specific TET-assisted bisulphite sequencing (TAB-seq) for validation of candidate regions. We show that whole-genome single-base resolution approaches are advantaged in providing precise 5hmC values but require high sequencing depth to accurately measure 5hmC, as this modification is commonly in low abundance in mammalian cells. HM450K arrays coupled with oxidative bisulphite provide a cost-effective representation of 5hmC distribution, at CpG sites with 5hmC levels >~10%. However, 5hmC analysis is restricted to the genomic location of the probes, which is an important consideration as 5hmC modification is commonly enriched at enhancer elements. Finally, we show that the widely used hMeDIP-seq method provides an efficient genome-wide profile of 5hmC and shows high correlation with WG Bis/OxBis-seq 5hmC distribution in brain DNA. However, in cell line DNA with low levels of 5hmC, hMeDIP-seq-enriched regions are not detected by WG Bis/OxBis or HM450K, either suggesting misinterpretation of 5hmC calls by hMeDIP or lack of sensitivity of the latter methods. We
Lo Giudice, Claudio; Pesole, Graziano; Picardi, Ernesto
RNA editing is an important epigenetic mechanism by which genome-encoded transcripts are modified by substitutions, insertions and/or deletions. It was first discovered in kinetoplastid protozoa followed by its reporting in a wide range of organisms. In plants, RNA editing occurs mostly by cytidine (C) to uridine (U) conversion in translated regions of organelle mRNAs and tends to modify affected codons restoring evolutionary conserved aminoacid residues. RNA editing has also been described in non-protein coding regions such as group II introns and structural RNAs. Despite its impact on organellar transcriptome and proteome complexity, current primary databases still do not provide a specific field for RNA editing events. To overcome these limitations, we developed REDIdb a specialized database for RNA editing modifications in plant organelles. Hereafter we describe its third release containing more than 26,000 events in a completely novel web interface to accommodate RNA editing in its genomics, biological and evolutionary context through whole genome maps and multiple sequence alignments. REDIdb is freely available at http://srv00.recas.ba.infn.it/redidb/index.html.
Claudio Lo Giudice
Full Text Available RNA editing is an important epigenetic mechanism by which genome-encoded transcripts are modified by substitutions, insertions and/or deletions. It was first discovered in kinetoplastid protozoa followed by its reporting in a wide range of organisms. In plants, RNA editing occurs mostly by cytidine (C to uridine (U conversion in translated regions of organelle mRNAs and tends to modify affected codons restoring evolutionary conserved aminoacid residues. RNA editing has also been described in non-protein coding regions such as group II introns and structural RNAs. Despite its impact on organellar transcriptome and proteome complexity, current primary databases still do not provide a specific field for RNA editing events. To overcome these limitations, we developed REDIdb a specialized database for RNA editing modifications in plant organelles. Hereafter we describe its third release containing more than 26,000 events in a completely novel web interface to accommodate RNA editing in its genomics, biological and evolutionary context through whole genome maps and multiple sequence alignments. REDIdb is freely available at http://srv00.recas.ba.infn.it/redidb/index.html
Full Text Available Abstract Background Maize is a major crop plant, grown for human and animal nutrition, as well as a renewable resource for bioenergy. When looking at the problems of limited fossil fuels, the growth of the world’s population or the world’s climate change, it is important to find ways to increase the yield and biomass of maize and to study how it reacts to specific abiotic and biotic stress situations. Within the OPTIMAS systems biology project maize plants were grown under a large set of controlled stress conditions, phenotypically characterised and plant material was harvested to analyse the effect of specific environmental conditions or developmental stages. Transcriptomic, metabolomic, ionomic and proteomic parameters were measured from the same plant material allowing the comparison of results across different omics domains. A data warehouse was developed to store experimental data as well as analysis results of the performed experiments. Description The OPTIMAS Data Warehouse (OPTIMAS-DW is a comprehensive data collection for maize and integrates data from different data domains such as transcriptomics, metabolomics, ionomics, proteomics and phenomics. Within the OPTIMAS project, a 44K oligo chip was designed and annotated to describe the functions of the selected unigenes. Several treatment- and plant growth stage experiments were performed and measured data were filled into data templates and imported into the data warehouse by a Java based import tool. A web interface allows users to browse through all stored experiment data in OPTIMAS-DW including all data domains. Furthermore, the user can filter the data to extract information of particular interest. All data can be exported into different file formats for further data analysis and visualisation. The data analysis integrates data from different data domains and enables the user to find answers to different systems biology questions. Finally, maize specific pathway information is
Full Text Available The NAC proteins represent a major plant-specific transcription factor family that has established enormously diverse roles in various plant processes. Aided by the availability of complete genomes, several members of this family have been identified in Arabidopsis, rice, soybean and poplar. However, no comprehensive investigation has been presented for the recently sequenced, naturally stress tolerant crop, Setaria italica (foxtail millet that is famed as a model crop for bioenergy research. In this study, we identified 147 putative NAC domain-encoding genes from foxtail millet by systematic sequence analysis and physically mapped them onto nine chromosomes. Genomic organization suggested that inter-chromosomal duplications may have been responsible for expansion of this gene family in foxtail millet. Phylogenetically, they were arranged into 11 distinct sub-families (I-XI, with duplicated genes fitting into one cluster and possessing conserved motif compositions. Comparative mapping with other grass species revealed some orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of genes. The evolutionary significance as duplication and divergence of NAC genes based on their amino acid substitution rates was understood. Expression profiling against various stresses and phytohormones provides novel insights into specific and/or overlapping expression patterns of SiNAC genes, which may be responsible for functional divergence among individual members in this crop. Further, we performed structure modeling and molecular simulation of a stress-responsive protein, SiNAC128, proffering an initial framework for understanding its molecular function. Taken together, this genome-wide identification and expression profiling unlocks new avenues for systematic functional analysis of novel NAC gene family candidates which may be applied for improvising stress adaption in plants.
Puranik, Swati; Sahu, Pranav Pankaj; Mandal, Sambhu Nath; B, Venkata Suresh; Parida, Swarup Kumar; Prasad, Manoj
The NAC proteins represent a major plant-specific transcription factor family that has established enormously diverse roles in various plant processes. Aided by the availability of complete genomes, several members of this family have been identified in Arabidopsis, rice, soybean and poplar. However, no comprehensive investigation has been presented for the recently sequenced, naturally stress tolerant crop, Setaria italica (foxtail millet) that is famed as a model crop for bioenergy research. In this study, we identified 147 putative NAC domain-encoding genes from foxtail millet by systematic sequence analysis and physically mapped them onto nine chromosomes. Genomic organization suggested that inter-chromosomal duplications may have been responsible for expansion of this gene family in foxtail millet. Phylogenetically, they were arranged into 11 distinct sub-families (I-XI), with duplicated genes fitting into one cluster and possessing conserved motif compositions. Comparative mapping with other grass species revealed some orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of genes. The evolutionary significance as duplication and divergence of NAC genes based on their amino acid substitution rates was understood. Expression profiling against various stresses and phytohormones provides novel insights into specific and/or overlapping expression patterns of SiNAC genes, which may be responsible for functional divergence among individual members in this crop. Further, we performed structure modeling and molecular simulation of a stress-responsive protein, SiNAC128, proffering an initial framework for understanding its molecular function. Taken together, this genome-wide identification and expression profiling unlocks new avenues for systematic functional analysis of novel NAC gene family candidates which may be applied for improvising stress adaption in plants.
Clunie, Lauren; Morris, Neil P; Joynes, Viktoria C T; Pickering, James D
Anatomy education is at the forefront of integrating innovative technologies into its curricula. However, despite this rise in technology numerous authors have commented on the shortfall in efficacy studies to assess the impact such technology-enhanced learning (TEL) resources have on learning. To assess the range of evaluation approaches to TEL across anatomy education, a systematic review was conducted using MEDLINE, the Educational Resources Information Centre (ERIC), Scopus, and Google Scholar, with a total of 3,345 articles retrieved. Following the PRISMA method for reporting items, 153 articles were identified and reviewed against a published framework-the technology-enhanced learning evaluation model (TELEM). The model allowed published reports to be categorized according to evaluations at the level of (1) learner satisfaction, (2) learning gain, (3) learner impact, and (4) institutional impact. The results of this systematic review reveal that most evaluation studies into TEL within anatomy curricula were based on learner satisfaction, followed by module or course learning outcomes. Randomized controlled studies assessing learning gain with a specific TEL resource were in a minority, with no studies reporting a comprehensive assessment on the overall impact of introducing a specific TEL resource (e.g., return on investment). This systematic review has provided clear evidence that anatomy education is engaged in evaluating the impact of TEL resources on student education, although it remains at a level that fails to provide comprehensive causative evidence. Anat Sci Educ 11: 303-319. © 2017 American Association of Anatomists. © 2017 American Association of Anatomists.
Harb, Omar S; Roos, David S
Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.
Riedelsheimer, Christian; Melchinger, Albrecht E
We developed a universally applicable planning tool for optimizing the allocation of resources for one cycle of genomic selection in a biparental population. The framework combines selection theory with constraint numerical optimization and considers genotype × environment interactions. Genomic selection (GS) is increasingly implemented in plant breeding programs to increase selection gain but little is known how to optimally allocate the resources under a given budget. We investigated this problem with model calculations by combining quantitative genetic selection theory with constraint numerical optimization. We assumed one selection cycle where both the training and prediction sets comprised double haploid (DH) lines from the same biparental population. Grain yield for testcrosses of maize DH lines was used as a model trait but all parameters can be adjusted in a freely available software implementation. An extension of the expected selection accuracy given by Daetwyler et al. (2008) was developed to correctly balance between the number of environments for phenotyping the training set and its population size in the presence of genotype × environment interactions. Under small budget, genotyping costs mainly determine whether GS is superior over phenotypic selection. With increasing budget, flexibility in resource allocation increases greatly but selection gain leveled off quickly requiring balancing the number of populations with the budget spent for each population. The use of an index combining phenotypic and GS predicted values in the training set was especially beneficial under limited resources and large genotype × environment interactions. Once a sufficiently high selection accuracy is achieved in the prediction set, further selection gain can be achieved most efficiently by massively expanding its size. Thus, with increasing budget, reducing the costs for producing a DH line becomes increasingly crucial for successfully exploiting the
Tian, Wenlan; Paudel, Dev
Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822
Full Text Available The differences between countries in national income, growth, human development and many other factors are used to classify countries into developed and developing countries. There are several classification systems that use different sets of measures and criteria. The most common classifications are the United Nations (UN and the World Bank (WB systems. The UN classification system uses the UN Human Development Index (HDI, an indicator that uses statistic of life expectancy, education, and income per capita for countries' classification. While the WB system uses gross national income (GNI per capita that is calculated using the World Bank Atlas method. According to the UN and WB classification systems, there are 151 and 134 developing countries, respectively, with 89% overlap between the two systems. Developing countries have limited human development, and limited expenditure in education and research, among several other limitations. The biggest challenge facing genomic researchers and clinicians is limited resources. As a result, genomic tools, specifically genome sequencing technologies, which are rapidly becoming indispensable, are not widely available. In this report, we explore the current status of sequencing technologies in developing countries, describe the associated challenges and emphasize potential solutions.
Liang, Ruoyu; Song, Shuai; Shi, Yajing; Shi, Yajuan; Lu, Yonglong; Zheng, Xiaoqi; Xu, Xiangbo; Wang, Yurong; Han, Xuesong
The redundancy or deficiency of selenium in soils can cause adverse effects on crops and even threaten human health. It was necessary to assess selenium resources with a rigorous scientific appraisal. Previous studies of selenium resource assessment were usually carried out using a single index evaluation. A multi-index evaluation method (analytic hierarchy process) was used in this study to establish a comprehensive assessment system based on consideration of selenium content, soil nutrients and soil environmental quality. The criteria for the comprehensive assessment system were classified by summing critical values in the standards with weights and a Geographical Information System was used to reflect the regional distribution of the assessment results. Boshan, a representative region for developing selenium-rich agriculture, was taken as a case area and classified into Zone I-V, which suggested priority areas for developing selenium-rich agriculture. Most parts of the North and Midlands of Boshan were relatively suitable for development of selenium-rich agriculture. Soils in south fractions were contaminated by Cd, PAHs, HCHs and DDTs, in which it was forbidden to farm. This study was expected to provide the basis for developing selenium-rich agriculture and an example for comprehensive evaluation of relevant resources in a region. Copyright © 2017 Elsevier B.V. All rights reserved.
Xu, Jiawei; Bao, Xiao; Peng, Zhaofeng; Wang, Linlin; Du, Linqing; Niu, Wenbin; Sun, Yingpu
Polycystic ovary syndrome (PCOS) affects approximately 7% of the reproductive-age women. A growing body of evidence indicated that epigenetic mechanisms contributed to the development of PCOS. The role of DNA modification in human PCOS ovary granulosa cell is still unknown in PCOS progression. Global DNA methylation and hydroxymethylation were detected between PCOS' and controls' granulosa cell. Genome-wide DNA methylation was profiled to investigate the putative function of DNA methylaiton. Selected genes expressions were analyzed between PCOS' and controls' granulosa cell. Our results showed that the granulosa cell global DNA methylation of PCOS patients was significant higher than the controls'. The global DNA hydroxymethylation showed low level and no statistical difference between PCOS and control. 6936 differentially methylated CpG sites were identified between control and PCOS-obesity. 12245 differential methylated CpG sites were detected between control and PCOS-nonobesity group. 5202 methylated CpG sites were significantly differential between PCOS-obesity and PCOS-nonobesity group. Our results showed that DNA methylation not hydroxymethylation altered genome-wide in PCOS granulosa cell. The different methylation genes were enriched in development protein, transcription factor activity, alternative splicing, sequence-specific DNA binding and embryonic morphogenesis. YWHAQ, NCF2, DHRS9 and SCNA were up-regulation in PCOS-obesity patients with no significance different between control and PCOS-nonobesity patients, which may be activated by lower DNA methylaiton. Global and genome-wide DNA methylation alteration may contribute to different genes expression and PCOS clinical pathology.
Johnson, Adrienne; Severson, Eric; Gay, Laurie; Vergilio, Jo-Anne; Elvin, Julia; Suh, James; Daniel, Sugganth; Covert, Mandy; Frampton, Garrett M; Hsu, Sigmund; Lesser, Glenn J; Stogner-Underwood, Kimberly; Mott, Ryan T; Rush, Sarah Z; Stanke, Jennifer J; Dahiya, Sonika; Sun, James; Reddy, Prasanth; Chalmers, Zachary R; Erlich, Rachel; Chudnovsky, Yakov; Fabrizio, David; Schrock, Alexa B; Ali, Siraj; Miller, Vincent; Stephens, Philip J; Ross, Jeffrey; Crawford, John R; Ramkissoon, Shakti H
Pediatric brain tumors are the leading cause of death for children with cancer in the U.S. Incorporating next-generation sequencing data for both pediatric low-grade (pLGGs) and high-grade gliomas (pHGGs) can inform diagnostic, prognostic, and therapeutic decision-making. We performed comprehensive genomic profiling on 282 pediatric gliomas (157 pHGGs, 125 pLGGs), sequencing 315 cancer-related genes and calculating the tumor mutational burden (TMB; mutations per megabase [Mb]). In pLGGs, we detected genomic alterations (GA) in 95.2% (119/125) of tumors. BRAF was most frequently altered (48%; 60/125), and FGFR1 missense (17.6%; 22/125), NF1 loss of function (8.8%; 11/125), and TP53 (5.6%; 7/125) mutations were also detected. Rearrangements were identified in 35% of pLGGs, including KIAA1549-BRAF , QKI-RAF1 , FGFR3-TACC3 , CEP85L-ROS1 , and GOPC-ROS1 fusions. Among pHGGs, GA were identified in 96.8% (152/157). The genes most frequently mutated were TP53 (49%; 77/157), H3F3A (37.6%; 59/157), ATRX (24.2%; 38/157), NF1 (22.2%; 35/157), and PDGFRA (21.7%; 34/157). Interestingly, most H3F3A mutations (81.4%; 35/43) were the variant K28M. Midline tumor analysis revealed H3F3A mutations (40%; 40/100) consisted solely of the K28M variant. Pediatric high-grade gliomas harbored oncogenic EML4-ALK , DGKB-ETV1 , ATG7-RAF1 , and EWSR1-PATZ1 fusions. Six percent (9/157) of pHGGs were hypermutated (TMB >20 mutations per Mb; range 43-581 mutations per Mb), harboring mutations deleterious for DNA repair in MSH6, MSH2, MLH1, PMS2, POLE , and POLD1 genes (78% of cases). Comprehensive genomic profiling of pediatric gliomas provides objective data that promote diagnostic accuracy and enhance clinical decision-making. Additionally, TMB could be a biomarker to identify pediatric glioblastoma (GBM) patients who may benefit from immunotherapy. By providing objective data to support diagnostic, prognostic, and therapeutic decision-making, comprehensive genomic profiling is necessary for
Full Text Available Carcinogenesis is a complex multifactorial, multistage process, but the precise mechanisms are not well understood. In this study, we performed a genome-wide analysis of the copy number variation (CNV, breakpoint region (BPR and fragile sites in 2,737 tumor samples from eight tumor entities and in 432 normal samples. CNV detection and BPR identification revealed that BPRs tended to accumulate in specific genomic regions in tumor samples whereas being dispersed genome-wide in the normal samples. Hotspots were observed, at which segments with similar alteration in copy number were overlapped along with BPRs adjacently clustered. Evaluation of BPR occurrence frequency showed that at least one was detected in about and more than 15% of samples for each tumor entity while BPRs were maximal in 12% of the normal samples. 127 of 2,716 tumor-relevant BPRs (termed 'common BPRs' exhibited also a noticeable occurrence frequency in the normal samples. Colocalization assessment identified 20,077 CNV-affecting genes and 169 of these being known tumor-related genes. The most noteworthy genes are KIAA0513 important for immunologic, synaptic and apoptotic signal pathways, intergenic non-coding RNA RP11-115C21.2 possibly acting as oncogene or tumor suppressor by changing the structure of chromatin, and ADAM32 likely importance in cancer cell proliferation and progression by ectodomain-shedding of diverse growth factors, and the well-known tumor suppressor gene p53. The BPR distributions indicate that CNV mutations are likely non-random in tumor genomes. The marked recurrence of BPRs at specific regions supports common progression mechanisms in tumors. The presence of hotspots together with common BPRs, despite its small group size, imply a relation between fragile sites and cancer-gene alteration. Our data further suggest that both protein-coding and non-coding genes possessing a range of biological functions might play a causative or functional role in tumor
Tzika, Athanasia C; Ullate-Agote, Asier; Grbic, Djordje; Milinkovitch, Michel C
Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the "Reptilian Transcriptomes Database 2.0," which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Full Text Available Summary: Metabolism is an emerging stem cell hallmark tied to cell fate, pluripotency, and self-renewal, yet systems-level understanding of stem cell metabolism has been limited by the lack of genome-scale network models. Here, we develop a systems approach to integrate time-course metabolomics data with a computational model of metabolism to analyze the metabolic state of naive and primed murine pluripotent stem cells. Using this approach, we find that one-carbon metabolism involving phosphoglycerate dehydrogenase, folate synthesis, and nucleotide synthesis is a key pathway that differs between the two states, resulting in differential sensitivity to anti-folates. The model also predicts that the pluripotency factor Lin28 regulates this one-carbon metabolic pathway, which we validate using metabolomics data from Lin28-deficient cells. Moreover, we identify and validate metabolic reactions related to S-adenosyl-methionine production that can differentially impact histone methylation in naive and primed cells. Our network-based approach provides a framework for characterizing metabolic changes influencing pluripotency and cell fate. : Chandrasekaran et al. use computational modeling, metabolomics, and metabolic inhibitors to discover metabolic differences between various pluripotent stem cell states and infer their impact on stem cell fate decisions. Keywords: systems biology, stem cell biology, metabolism, genome-scale modeling, pluripotency, histone methylation, naive (ground state, primed state, cell fate, metabolic network
Spriggs, Andrew; Henderson, Steven T; Hand, Melanie L; Johnson, Susan D; Taylor, Jennifer M; Koltunow, Anna
Cowpea ( Vigna unguiculata (L.) Walp) is an important legume crop for food security in areas of low-input and smallholder farming throughout Africa and Asia. Genetic improvements are required to increase yield and resilience to biotic and abiotic stress and to enhance cowpea crop performance. An integrated cowpea genomic and gene expression data resource has the potential to greatly accelerate breeding and the delivery of novel genetic traits for cowpea. Extensive genomic resources for cowpea have been absent from the public domain; however, a recent early release reference genome for IT97K-499-35 ( Vigna unguiculata v1.0, NSF, UCR, USAID, DOE-JGI, http://phytozome.jgi.doe.gov/) has now been established in a collaboration between the Joint Genome Institute (JGI) and University California (UC) Riverside. Here we release supporting genomic and transcriptomic data for IT97K-499-35 and a second transformable cowpea variety, IT86D-1010. The transcriptome resource includes six tissue-specific datasets for each variety, with particular emphasis on reproductive tissues that extend and support the V. unguiculata v1.0 reference. Annotations have been included in our resource to allow direct mapping to the v1.0 cowpea reference. Access to this resource provided here is supported by raw and assembled data downloads.
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lu, Hongzhong; Cao, Weiqiang; Ouyang, Liming; Xia, Jianye; Huang, Mingzhi; Chu, Ju; Zhuang, Yingping; Zhang, Siliang; Noorman, Henk
Aspergillus niger is one of the most important cell factories for industrial enzymes and organic acids production. A comprehensive genome-scale metabolic network model (GSMM) with high quality is crucial for efficient strain improvement and process optimization. The lack of accurate reaction equations and gene-protein-reaction associations (GPRs) in the current best model of A. niger named GSMM iMA871, however, limits its application scope. To overcome these limitations, we updated the A. niger GSMM by combining the latest genome annotation and literature mining technology. Compared with iMA871, the number of reactions in iHL1210 was increased from 1,380 to 1,764, and the number of unique ORFs from 871 to 1,210. With the aid of our transcriptomics analysis, the existence of 63% ORFs and 68% reactions in iHL1210 can be verified when glucose was used as the only carbon source. Physiological data from chemostat cultivations, 13 C-labeled and molecular experiments from the published literature were further used to check the performance of iHL1210. The average correlation coefficients between the predicted fluxes and estimated fluxes from 13 C-labeling data were sufficiently high (above 0.89) and the prediction of cell growth on most of the reported carbon and nitrogen sources was consistent. Using the updated genome-scale model, we evaluated gene essentiality on synthetic and yeast extract medium, as well as the effects of NADPH supply on glucoamylase production in A. niger. In summary, the new A. niger GSMM iHL1210 contains significant improvements with respect to the metabolic coverage and prediction performance, which paves the way for systematic metabolic engineering of A. niger. Biotechnol. Bioeng. 2017;114: 685-695. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
... above. Ronald Gluck, Assistant Section Chief, Environmental Enforcement Section Environment and Natural... Department of the Interior's Natural Resource Damage Assessment and Restoration Fund, which can be used to.... Comments should be addressed to the Assistant Attorney General, Environment and Natural Resources Division...
Guo, Jing; Cheng, Gang; Gou, Xiang-Yong; Xing, Feng; Li, Sen; Han, Yi-Chao; Wang, Long; Song, Jia-Ming; Shu, Cheng-Cheng; Chen, Shou-Wen; Chen, Ling-Ling
The updated genome of Bacillus licheniformis WX-02 comprises a circular chromosome of 4286821 base-pairs containing 4512 protein-coding genes. We applied strand-specific RNA-sequencing to explore the transcriptome profiles of B. licheniformis WX-02 under normal and high-salt conditions (NaCl 6%). We identified 2381 co-expressed gene pairs constituting 871 operon structures. In addition, 1169 antisense transcripts and 90 small RNAs were detected. Systematic comparison of differentially expressed genes under different conditions revealed that genes involved in multiple functions were significantly repressed in long-term high salt adaptation process. Genes related to promotion of glutamic acid synthesis were activated by 6% NaCl, potentially explaining the high yield of γ-PGA under salt condition. This study will be useful for the optimization of crucial metabolic activities in this bacterium. Copyright © 2015. Published by Elsevier B.V.
Saha, Rajib; Suthers, Patrick F.; Maranas, Costas D.
The scope and breadth of genome-scale metabolic reconstructions have continued to expand over the last decade. Herein, we introduce a genome-scale model for a plant with direct applications to food and bioenergy production (i.e., maize). Maize annotation is still underway, which introduces significant challenges in the association of metabolic functions to genes. The developed model is designed to meet rigorous standards on gene-protein-reaction (GPR) associations, elementally and charged balanced reactions and a biomass reaction abstracting the relative contribution of all biomass constituents. The metabolic network contains 1,563 genes and 1,825 metabolites involved in 1,985 reactions from primary and secondary maize metabolism. For approximately 42% of the reactions direct literature evidence for the participation of the reaction in maize was found. As many as 445 reactions and 369 metabolites are unique to the maize model compared to the AraGEM model for A. thaliana. 674 metabolites and 893 reactions are present in Zea mays iRS1563 that are not accounted for in maize C4GEM. All reactions are elementally and charged balanced and localized into six different compartments (i.e., cytoplasm, mitochondrion, plastid, peroxisome, vacuole and extracellular). GPR associations are also established based on the functional annotation information and homology prediction accounting for monofunctional, multifunctional and multimeric proteins, isozymes and protein complexes. We describe results from performing flux balance analysis under different physiological conditions, (i.e., photosynthesis, photorespiration and respiration) of a C4 plant and also explore model predictions against experimental observations for two naturally occurring mutants (i.e., bm1 and bm3). The developed model corresponds to the largest and more complete to-date effort at cataloguing metabolism for a plant species. PMID:21755001
Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda
The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis
Advocates for Youth, 2015
Decades of research have identified dozens of programs that are effective in helping young people reduce their risk for pregnancy, HIV, and STDs. These evidence-based programs utilize strategies that include the provision of accurate, honest information about abstinence as well as contraception and can serve as the foundation for comprehensive sex…
The historical origins and development of comprehensive schooling have seldom been analyzed systematically and comparatively. However, there is a rich comparative and historically grounded literature on the development of welfare states, which focuses on many relevant policies, but ignores the education system. In particular, the power resources…
Nelson, J.; Jones, N.; Ames, D. P.
Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine
MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269
Ross, Jeffrey S; Wang, Kai; Khaira, Depinder; Ali, Siraj M; Fisher, Huge A G; Mian, Badar; Nazeer, Tipu; Elvin, Julia A; Palma, Norma; Yelensky, Roman; Lipson, Doron; Miller, Vincent A; Stephens, Philip J; Subbiah, Vivek; Pal, Sumanta K
In the current study, the authors present a comprehensive genomic profile (CGP)-based study of advanced urothelial carcinoma (UC) designed to detect clinically relevant genomic alterations (CRGAs). DNA was extracted from 40 µm of formalin-fixed, paraffin-embedded sections from 295 consecutive cases of recurrent/metastatic UC. CGP was performed on hybridization-captured, adaptor ligation-based libraries to a mean coverage depth of 688X for all coding exons of 236 cancer-related genes plus 47 introns from 19 genes frequently rearranged in cancer, using process-matched normal control samples as a reference. CRGAs were defined as GAs linked to drugs on the market or currently under evaluation in mechanism-driven clinical trials. All 295 patients assessed were classified with high-grade (International Society of Urological Pathology classification) and advanced stage (stage III/IV American Joint Committee on Cancer) disease, and 294 of 295 patients (99.7%) had at least 1 GA on CGP with a mean of 6.4 GAs per UC (61% substitutions/insertions/deletions, 37% copy number alterations, and 2% fusions). Furthermore, 275 patients (93%) had at least 1 CRGA involving 75 individual genes with a mean of 2.6 CRGAs per UC. The most common CRGAs involved cyclin-dependent kinase inhibitor 2A (CDKN2A) (34%), fibroblast growth factor receptor 3 (FGFR3) (21%), phosphatidylinositol 3-kinase catalytic subunit alpha (PIK3CA) (20%), and ERBB2 (17%). FGFR3 GAs were diverse types and included 10% fusions. ERBB2 GAs were equally divided between amplifications and substitutions. ERBB2 substitutions were predominantly within the extracellular domain and were highly enriched in patients with micropapillary UC (38% of 32 cases vs 5% of 263 nonmicropapillary UC cases; PCancer 2016;122:702-711. © 2015 American Cancer Society. © 2015 American Cancer Society.
Sahajpal, Ruchika; Kandoi, Gaurav; Dhiman, Heena; Raj, Sweety; Scaria, Vinod; Bhartiya, Deeksha; Hasija, Yasha
Abstract Tuberculosis (TB) is an infectious disease caused by fastidious pathogen Mycobacterium tuberculosis. TB has emerged as one of the major causes of mortality in the developing world. Role of host genetic factors that modulate disease susceptibility have not been studied widely. Recent studies have reported few genetic loci that provide impetus to this area of research. The availability of tools has enabled genome-wide scans for disease susceptibility loci associated with infectious dis...
Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen
Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335
Full Text Available Bile salt hydrolase (BSH is a well-known enzyme that has been commonly characterized in probiotic bacteria, as it has cholesterol-lowering effects. However, its molecular investigations are scarce. Here, we build a local database of BSH sequences from Lactobacillaceae (BSH–SDL, and phylogenetic analysis and homology searches were employed to elucidate their comparability and distinctiveness among species. Evolutionary study demonstrates that BSH sequences in BSH–SDL are divided into five groups, named BSH A, B, C, D and E here, which can be the genetic basis for BSH classification and nomenclature. Sequence analysis suggests the differences between BSH-active and BSH-inactive proteins clearly, especially on site 82. In addition, a total of 551 BSHs from 107 species are identified from 451 genomes of 158 Lactobacillaceae species. Interestingly, those bacteria carrying various copies of BSH A or B can be predicted to be potential cholesterol-lowering probiotics, based on the results of phylogenetic analysis and the subtypes that those previously reported BSH-active probiotics possess. In summary, this study elaborates the molecular basis of BSH in Lactobacillaceae systematically, and provides a novel methodology as well as a consistent standard for the identification of the BSH subtype. We believe that high-throughput screening can be efficiently applied to the selection of promising candidate BSH-active probiotics, which will advance the development of healthcare products in cholesterol metabolism.
Charalambous, Marika; da Rocha, Simão Teixeira; Ferguson-Smith, Anne C
Genes subject to genomic imprinting are predominantly expressed from one of the two parental chromosomes, are often clustered in the genome, and their activity and repression are epigenetically regulated. The role of imprinted genes in growth control has been apparent since the discovery of imprinting in the early 1980s. Drawing from studies in the mouse, we propose three distinct classes of imprinted genes - those expressed, imprinted and acting predominantly within the placenta, those with no associated foetal growth effects that act postnatally to regulate metabolic processes, and those expressed in the embryo and placenta that programme the development of organs participating in metabolic processes. Members of this latter class may interact in functional networks regulating the interaction between the mother and the foetus, affecting generalized foetal well-being, growth and organ development; they may also coordinately regulate the development of particular organ systems. The mono-allelic behaviour and sensitivity to changes in regional epigenetic states renders imprinted genes adaptable and vulnerable; in all cases, their perturbed dosage can compromise prenatal and/or postnatal control of nutritional resources. This finding has implications for understanding the relationships between prenatal events and diseases later in life.
Balaur, Irina; Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J; Auffray, Charles
The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . firstname.lastname@example.org. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Sun, Tingting; Li, Mingjun; Shao, Yun; Yu, Lingyan; Ma, Fengwang
Elemental phosphorus (Pi) is essential to plant growth and development. The family of phosphate transporters (PHTs) mediates the uptake and translocation of Pi inside the plants. Members include five sub-cellular phosphate transporters that play different roles in Pi uptake and transport. We searched the Genome Database for Rosaceae and identified five clusters of phosphate transporters in apple ( Malus domestica ), including 37 putative genes. The MdPHT1 family contains 14 genes while MdPHT2 has two, MdPHT3 has seven, MdPHT4 has 11, and MdPHT5 has three. Our overview of this gene family focused on structure, chromosomal distribution and localization, phylogenies, and motifs. These genes displayed differential expression patterns in various tissues. For example, expression was high for MdPHT1;12, MdPHT3;6 , and MdPHT3;7 in the roots, and was also increased in response to low-phosphorus conditions. In contrast, MdPHT4;1, MdPHT4;4 , and MdPHT4;10 were expressed only in the leaves while transcript levels of MdPHT1;4, MdPHT1;12 , and MdPHT5;3 were highest in flowers. In general, these 37 genes were regulated significantly in either roots or leaves in response to the imposition of phosphorus and/or drought stress. The results suggest that members of the PHT family function in plant adaptations to adverse growing environments. Our study will lay a foundation for better understanding the PHT family evolution and exploring genes of interest for genetic improvement in apple.
Makarova Kira S
Full Text Available Abstract Background The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in various forms of stress response and management of prokaryotic population. Several distinct TAS of type 1, where the toxin is a protein and the antitoxin is an antisense RNA, and numerous, unrelated TAS of type 2, in which both the toxin and the antitoxin are proteins, have been experimentally characterized, and it is suspected that many more remain to be identified. Results We report a comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems in prokaryotes. Using sensitive methods for distant sequence similarity search, genome context analysis and a new approach for the identification of mobile two-component systems, we identified numerous, previously unnoticed protein families that are homologous to toxins and antitoxins of known type 2 TAS. In addition, we predict 12 new families of toxins and 13 families of antitoxins, and also, predict a TAS or TAS-like activity for several gene modules that were not previously suspected to function in that capacity. In particular, we present indications that the two-gene module that encodes a minimal nucleotidyl transferase and the accompanying HEPN protein, and is extremely abundant in many archaea and bacteria, especially, thermophiles might comprise a novel TAS. We present a survey of previously known and newly predicted TAS in 750 complete genomes of archaea and bacteria, quantitatively demonstrate the exceptional mobility of the TAS, and explore the network of toxin-antitoxin pairings that combines plasticity with selectivity. Conclusion The defining properties of the TAS, namely, the typically small size of the toxin and antitoxin genes, fast evolution, and
Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in various forms of stress response and management of prokaryotic population. Several distinct TAS of type 1, where the toxin is a protein and the antitoxin is an antisense RNA, and numerous, unrelated TAS of type 2, in which both the toxin and the antitoxin are proteins, have been experimentally characterized, and it is suspected that many more remain to be identified. We report a comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems in prokaryotes. Using sensitive methods for distant sequence similarity search, genome context analysis and a new approach for the identification of mobile two-component systems, we identified numerous, previously unnoticed protein families that are homologous to toxins and antitoxins of known type 2 TAS. In addition, we predict 12 new families of toxins and 13 families of antitoxins, and also, predict a TAS or TAS-like activity for several gene modules that were not previously suspected to function in that capacity. In particular, we present indications that the two-gene module that encodes a minimal nucleotidyl transferase and the accompanying HEPN protein, and is extremely abundant in many archaea and bacteria, especially, thermophiles might comprise a novel TAS. We present a survey of previously known and newly predicted TAS in 750 complete genomes of archaea and bacteria, quantitatively demonstrate the exceptional mobility of the TAS, and explore the network of toxin-antitoxin pairings that combines plasticity with selectivity. The defining properties of the TAS, namely, the typically small size of the toxin and antitoxin genes, fast evolution, and extensive horizontal mobility, make the task of
Frank, David G.; Wallace, Alan R.; Schneider, Jill L.
Minerals in the environment and products manufactured from mineral materials are all around us and we use and come into contact with them every day. They impact our way of life and the health of all that lives. Minerals are critical to the Nation's economy and knowing where future mineral resources will come from is important for sustaining the Nation's economy and national security. The U.S. Geological Survey (USGS) Mineral Resources Program (MRP) provides scientific information for objective resource assessments and unbiased research results on mineral resource potential, production and consumption statistics, as well as environmental consequences of mining. The MRP conducts this research to provide information needed for land planners and decisionmakers about where mineral commodities are known and suspected in the earth's crust and about the environmental consequences of extracting those commodities. As part of the MRP scientists of the Western Mineral and Environmental Resources Science Center (WMERSC or 'Center' herein) coordinate the development of national, geologic, geochemical, geophysical, and mineral-resource databases and the migration of existing databases to standard models and formats that are available to both internal and external users. The unique expertise developed by Center scientists over many decades in response to mineral-resource-related issues is now in great demand to support applications such as public health research and remediation of environmental hazards that result from mining and mining-related activities. Western Mineral and Environmental Resources Science Center Results of WMERSC research provide timely and unbiased analyses of minerals and inorganic materials to (1) improve stewardship of public lands and resources; (2) support national and international economic and security policies; (3) sustain prosperity and improve our quality of life; and (4) protect and improve public health, safety, and environmental quality. The MRP
Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng
Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.
C. L. Laxmipathi Gowda
Full Text Available Both chickpea ( L. and pigeonpea [ (L. Millsp.] are important dietary source of protein while groundnut ( L. is one of the major oil crops. Globally, approximately 1.1 million grain legume accessions are conserved in genebanks, of which the ICRISAT genebank holds 49,485 accessions of cultivated species and wild relatives of chickpea, pigeonpea, and groundnut from 133 countries. These genetic resources are reservoirs of many useful genes for present and future crop improvement programs. Representative subsets in the form of core and mini core collections have been used to identify trait-specific genetically diverse germplasm for use in breeding and genomic studies in these crops. Chickpea, groundnut, and pigeonpea have moved from “orphan” to “genomic resources rich crops.” The chickpea and pigeonpea genomes have been decoded, and the sequences of groundnut genome will soon be available. With the availability of these genomic resources, the germplasm curators, breeders, and molecular biologists will have abundant opportunities to enhance the efficiency of genebank operations, mine allelic variations in germplasm collection, identify genetically diverse germplasm with beneficial traits, broaden the cultigen’s genepool, and accelerate the cultivar development to address new challenges to production, particularly with respect to climate change and variability. Marker-assisted breeding approaches have already been initiated for some traits in chickpea and groundnut, which should lead to enhanced efficiency and efficacy of crop improvement. Resistance to some pests and diseases has been successfully transferred from wild relatives to cultivated species.
Clément, D; Lanaud, C; Sabau, X; Fouet, O; Le Cunff, L; Ruiz, E; Risterucci, A M; Glaszmann, J C; Piffanelli, P
We have constructed and validated the first cocoa ( Theobroma cacao L.) BAC library, with the aim of developing molecular resources to study the structure and evolution of the genome of this perennial crop. This library contains 36,864 clones with an average insert size of 120 kb, representing approximately ten haploid genome equivalents. It was constructed from the genotype Scavina-6 (Sca-6), a Forastero clone highly resistant to cocoa pathogens and a parent of existing mapping populations. Validation of the BAC library was carried out with a set of 13 genetically-anchored single copy and one duplicated markers. An average of nine BAC clones per probe was identified, giving an initial experimental estimation of the genome coverage represented in the library. Screening of the library with a set of resistance gene analogues (RGAs), previously mapped in cocoa and co-localizing with QTL for resistance to Phytophthora traits, confirmed at the physical level the tight clustering of RGAs in the cocoa genome and provided the first insights into the relationships between genetic and physical distances in the cocoa genome. This library represents an available BAC resource for structural genomic studies or map-based cloning of genes corresponding to important QTLs for agronomic traits such as resistance genes to major cocoa pathogens like Phytophthora spp ( palmivora and megakarya), Crinipellis perniciosa and Moniliophthora roreri.
Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten
The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org.
Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten
The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org PMID:23624946
Duku, Moses Hensley [School of Engineering Sciences, University of Southampton, Southampton, S017 1BJ (United Kingdom); Institute of Industrial Research, Council for Scientific and Industrial Research, P. Box LG 576, Legon (Ghana); Gu, Sai [School of Engineering Sciences, University of Southampton, Southampton, S017 1BJ (United Kingdom); Hagan, Essel Ben [Institute of Industrial Research, Council for Scientific and Industrial Research, P. Box LG 576, Legon (Ghana)
Biomass is the major energy source in Ghana contributing about 64% of Ghana's primary energy supply. In this paper, an assessment of biomass resources and biofuels production potential in Ghana is given. The broad areas of energy crops, agricultural crop residues, forest products residues, urban wastes and animal wastes are included. Animal wastes are limited to those produced by domesticated livestock. Agricultural residues included those generated from sugarcane, maize, rice, cocoa, oil palm, coconut, sorghum and millet processing. The urban category is subdivided into municipal solid waste, food waste, sewage sludge or bio-solids and waste grease. The availability of these types of biomass, together with a brief description of possible biomass conversion routes, sustainability measures, and current research and development activities in Ghana is given. It is concluded that a large availability of biomass in Ghana gives a great potential for biofuels production from these biomass resources. (author)
The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources such as SSRs, SNPs and InDels in several model and non-model plant species. Yam (Dioscorea spp.) i...
Commercial and experimental genetic resources were used to investigate genetic pleiotropic factors that influence age at puberty, litter-size and reproductive longevity. The phenotypes were complemented by high-density genotyping and whole genome and RNA sequencing. The SNPs from Porcine SNP60 BeadA...
Monk, Ellen Fischer
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University Enterprise Resource Planning Systems (ERP) are very large and complex software packages that run every aspect of an organization. Increasingly, ERP systems are used in higher education to teach business processes, essential knowledge for students competing in today’s business environment. Past research attempting to measure learning business processes with ERP has been inconclusive and lac...
Mamaeva, A. A.; Shaykhislamov, D. I.; Voevodin, Vad V.; Zhumatiy, S. A.
One of the main problems of modern supercomputers is the low efficiency of their usage, which leads to the significant idle time of computational resources, and, in turn, to the decrease in speed of scientific research. This paper presents three approaches to study the efficiency of supercomputer resource usage based on monitoring data analysis. The first approach performs an analysis of computing resource utilization statistics, which allows to identify different typical classes of programs, to explore the structure of the supercomputer job flow and to track overall trends in the supercomputer behavior. The second approach is aimed specifically at analyzing off-the-shelf software packages and libraries installed on the supercomputer, since efficiency of their usage is becoming an increasingly important factor for the efficient functioning of the entire supercomputer. Within the third approach, abnormal jobs – jobs with abnormally inefficient behavior that differs significantly from the standard behavior of the overall supercomputer job flow – are being detected. For each approach, the results obtained in practice in the Supercomputer Center of Moscow State University are demonstrated.
Grattapaglia, Dario; Mamani, Eva M C; Silva-Junior, Orzenil B; Faria, Danielle A
Keystone species in their native ranges, eucalypts, are ecologically and genetically very diverse, growing naturally along extensive latitudinal and altitudinal ranges and variable environments. Besides their ecological importance, eucalypts are also the most widely planted trees for sustainable forestry in the world. We report the development of a novel collection of 535 microsatellites for species of Eucalyptus, 494 designed from ESTs and 41 from genomic libraries. A selected subset of 223 was evaluated for individual identification, parentage testing, and ancestral information content in the two most extensively studied species, Eucalyptus grandis and Eucalyptus globulus. Microsatellites showed high transferability and overlapping allele size range, suggesting they have arisen still in their common ancestor and confirming the extensive genome conservation between these two species. A consensus linkage map with 437 microsatellites, the most comprehensive microsatellite-only genetic map for Eucalyptus, was built by assembling segregation data from three mapping populations and anchored to the Eucalyptus genome. An overall colinearity between recombination-based and physical positioning of 84% of the mapped microsatellites was observed, with some ordering discrepancies and sporadic locus duplications, consistent with the recently described whole genome duplication events in Eucalyptus. The linkage map covered 95.2% of the 605.8-Mbp assembled genome sequence, placing one microsatellite every 1.55 Mbp on average, and an overall estimate of physical to recombination distance of 618 kbp/cM. The genetic parameters estimates together with linkage and physical position data for this large set of microsatellites should assist marker choice for genome-wide population genetics and comparative mapping in Eucalyptus. © 2014 John Wiley & Sons Ltd.
Lorenz, Aaron J
Allocating resources between population size and replication affects both genetic gain through phenotypic selection and quantitative trait loci detection power and effect estimation accuracy for marker-assisted selection (MAS). It is well known that because alleles are replicated across individuals in quantitative trait loci mapping and MAS, more resources should be allocated to increasing population size compared with phenotypic selection. Genomic selection is a form of MAS using all marker information simultaneously to predict individual genetic values for complex traits and has widely been found superior to MAS. No studies have explicitly investigated how resource allocation decisions affect success of genomic selection. My objective was to study the effect of resource allocation on response to MAS and genomic selection in a single biparental population of doubled haploid lines by using computer simulation. Simulation results were compared with previously derived formulas for the calculation of prediction accuracy under different levels of heritability and population size. Response of prediction accuracy to resource allocation strategies differed between genomic selection models (ridge regression best linear unbiased prediction [RR-BLUP], BayesCπ) and multiple linear regression using ordinary least-squares estimation (OLS), leading to different optimal resource allocation choices between OLS and RR-BLUP. For OLS, it was always advantageous to maximize population size at the expense of replication, but a high degree of flexibility was observed for RR-BLUP. Prediction accuracy of doubled haploid lines included in the training set was much greater than of those excluded from the training set, so there was little benefit to phenotyping only a subset of the lines genotyped. Finally, observed prediction accuracies in the simulation compared well to calculated prediction accuracies, indicating these theoretical formulas are useful for making resource allocation
Mohan, Viswanathan; Radha, Venkatesan; Nguyen, Thong T; Stawiski, Eric W; Pahuja, Kanika Bajaj; Goldstein, Leonard D; Tom, Jennifer; Anjana, Ranjit Mohan; Kong-Beltran, Monica; Bhangale, Tushar; Jahnavi, Suresh; Chandni, Radhakrishnan; Gayathri, Vijay; George, Paul; Zhang, Na; Murugan, Sakthivel; Phalke, Sameer; Chaudhuri, Subhra; Gupta, Ravi; Zhang, Jingli; Santhosh, Sam; Stinson, Jeremy; Modrusan, Zora; Ramprasad, V L; Seshagiri, Somasekar; Peterson, Andrew S
Maturity-onset diabetes of the young (MODY) is an early-onset, autosomal dominant form of non-insulin dependent diabetes. Genetic diagnosis of MODY can transform patient management. Earlier data on the genetic predisposition to MODY have come primarily from familial studies in populations of European origin. In this study, we carried out a comprehensive genomic analysis of 289 individuals from India that included 152 clinically diagnosed MODY cases to identify variants in known MODY genes. Further, we have analyzed exome data to identify putative MODY relevant variants in genes previously not implicated in MODY. Functional validation of MODY relevant variants was also performed. We found MODY 3 (HNF1A; 7.2%) to be most frequently mutated followed by MODY 12 (ABCC8; 3.3%). They together account for ~ 11% of the cases. In addition to known MODY genes, we report the identification of variants in RFX6, WFS1, AKT2, NKX6-1 that may contribute to development of MODY. Functional assessment of the NKX6-1 variants showed that they are functionally impaired. Our findings showed HNF1A and ABCC8 to be the most frequently mutated MODY genes in south India. Further we provide evidence for additional MODY relevant genes, such as NKX6-1, and these require further validation.
Full Text Available Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV is currently invading the germline of the koala (Phascolarctos cinereus and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small.
Nuclear Technology and Education Center (NuTEC) of the Japan Atomic Energy Agency (JAEA) aims at comprehensive nuclear education and training activities, which cover 1) education and training for national nuclear engineers, 2) cooperation with universities and 3) international contribution and cooperation. The main feature of NuTEC's training programs is that the curricula place emphasis on the laboratory exercises with well-equipped training facilities, including research reacotrs, and expertise of lecturers mostly from JAEA. The wide spectrum of cooperative activities have been pursued with universities and also with international organizations, such as IAEA, ENEN, CEA/INSTN and FNCA countries. The present paper descrives the overall HRD activities of NuTEC, especially in nuclear power field. (author)
Nuclear Technology and Education Center (NuTEC) of the Japan Atomic Energy Agency (JAEA) aims at comprehensive nuclear education and training activities, which cover 1) education and training for national nuclear engineers, 2) cooperation with universities and 3) international contribution and cooperation. The main feature of NuTEC's training programs is that the curricula place emphasis on the laboratory exercises with well-equipped training facilities, including research reacotrs, and expertise of lecturers mostly from JAEA. The wide spectrum of cooperative activities have been pursued with universities and also with international organizations, such as IAEA, ENEN, CEA/INSTN and FNCA countries. The present paper descrives the overall HRD activities of NuTEC, especially in nuclear power field. (author)
Krupke, Debra M; Begley, Dale A; Sundberg, John P; Richardson, Joel E; Neuhauser, Steven B; Bult, Carol J
Research using laboratory mice has led to fundamental insights into the molecular genetic processes that govern cancer initiation, progression, and treatment response. Although thousands of scientific articles have been published about mouse models of human cancer, collating information and data for a specific model is hampered by the fact that many authors do not adhere to existing annotation standards when describing models. The interpretation of experimental results in mouse models can also be confounded when researchers do not factor in the effect of genetic background on tumor biology. The Mouse Tumor Biology (MTB) database is an expertly curated, comprehensive compendium of mouse models of human cancer. Through the enforcement of nomenclature and related annotation standards, MTB supports aggregation of data about a cancer model from diverse sources and assessment of how genetic background of a mouse strain influences the biological properties of a specific tumor type and model utility. Cancer Res; 77(21); e67-70. ©2017 AACR . ©2017 American Association for Cancer Research.
Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A
Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in
Rodriguez Romero, Paulo Cesar; Cubillos Gonzalez, Alexander
The predominance of economic assessments regarding the value of natural resources has caused a sub-valuing of the real benefits which societies can obtain from nature. This is due to a lack of knowledge about the complexity of ecological functions, as well as a dismissal of the integrated relations of the sub-systems which make up the environment. It is therefore necessary to establish conceptual bridges between environmental sciences to fill in the gaps in economic valuation methods by recurring to diverse measuring scales, participation from the different actors involved, and a principle of precaution regarding the limits of nature. This paper explores the concepts of value and economic valuation methods from the perspectives of Environmental Economics and Ecological Economics. It then proposes an integration of valuing methodologies which take into account how complementary and complex natures value relations are. This proposal of valuing integrally ecosystem goods and services contributes to adjusting political decisions more accordingly to real environmental conditions.
Lea, Amanda J.; Altmann, Jeanne; Alberts, Susan C.; Tung, Jenny
Variation in resource availability commonly exerts strong effects on fitness-related traits in wild animals. However, we know little about the molecular mechanisms that mediate these effects, or about their persistence over time. To address these questions, we profiled genome-wide whole blood DNA methylation levels in two sets of wild baboons: (i) ‘wild-feeding’ baboons that foraged naturally in a savanna environment and (ii) ‘Lodge’ baboons that had ready access to spatially concentrated human food scraps, resulting in high feeding efficiency and low daily travel distances. We identified 1,014 sites (0.20% of sites tested) that were differentially methylated between wild-feeding and Lodge baboons, providing the first evidence that resource availability shapes the epigenome in a wild mammal. Differentially methylated sites tended to occur in contiguous stretches (i.e., in differentially methylated regions or DMRs), in promoters and enhancers, and near metabolism-related genes, supporting their functional importance in gene regulation. In agreement, reporter assay experiments confirmed that methylation at the largest identified DMR, located in the promoter of a key glycolysis-related gene, was sufficient to causally drive changes in gene expression. Intriguingly, all dispersing males carried a consistent epigenetic signature of their membership in a wild-feeding group, regardless of whether males dispersed into or out of this group as adults. Together, our findings support a role for DNA methylation in mediating ecological effects on phenotypic traits in the wild, and emphasize the dynamic environmental sensitivity of DNA methylation levels across the life course. PMID:26508127
Comprehensive Genomic Profiling Facilitates Implementation of the National Comprehensive Cancer Network Guidelines for Lung Cancer Biomarker Testing and Identifies Patients Who May Benefit From Enrollment in Mechanism-Driven Clinical Trials.
Suh, James H; Johnson, Adrienne; Albacker, Lee; Wang, Kai; Chmielecki, Juliann; Frampton, Garrett; Gay, Laurie; Elvin, Julia A; Vergilio, Jo-Anne; Ali, Siraj; Miller, Vincent A; Stephens, Philip J; Ross, Jeffrey S
The National Comprehensive Cancer Network (NCCN) guidelines for patients with metastatic non-small cell lung cancer (NSCLC) recommend testing for EGFR, BRAF, ERBB2, and MET mutations; ALK, ROS1, and RET rearrangements; and MET amplification. We investigated the feasibility and utility of comprehensive genomic profiling (CGP), a hybrid capture-based next-generation sequencing (NGS) test, in clinical practice. CGP was performed to a mean coverage depth of 576× on 6,832 consecutive cases of NSCLC (2012-2015). Genomic alterations (GAs) (point mutations, small indels, copy number changes, and rearrangements) involving EGFR, ALK, BRAF, ERBB2, MET, ROS1, RET, and KRAS were recorded. We also evaluated lung adenocarcinoma (AD) cases without GAs, involving these eight genes. The median age of the patients was 64 years (range: 13-88 years) and 53% were female. Among the patients studied, 4,876 (71%) harbored at least one GA involving EGFR (20%), ALK (4.1%), BRAF (5.7%), ERBB2 (6.0%), MET (5.6%), ROS1 (1.5%), RET (2.4%), or KRAS (32%). In the remaining cohort of lung AD without these known drivers, 273 cancer-related genes were altered in at least 0.1% of cases, including STK11 (21%), NF1 (13%), MYC (9.8%), RICTOR (6.4%), PIK3CA (5.4%), CDK4 (4.3%), CCND1 (4.0%), BRCA2 (2.5%), NRAS (2.3%), BRCA1 (1.7%), MAP2K1 (1.2%), HRAS (0.7%), NTRK1 (0.7%), and NTRK3 (0.2%). CGP is practical and facilitates implementation of the NCCN guidelines for NSCLC by enabling simultaneous detection of GAs involving all seven driver oncogenes and KRAS. Furthermore, without additional tissue use or cost, CGP identifies patients with "pan-negative" lung AD who may benefit from enrollment in mechanism-driven clinical trials. National Comprehensive Cancer Network guidelines for patients with metastatic non-small cell lung cancer (NSCLC) recommend testing for several genomic alterations (GAs). The feasibility and utility of comprehensive genomic profiling were studied in NSCLC and in lung adenocarcinoma
Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W
We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302
Zhao, Wenming; Wang, Jing; He, Ximiao
Rice is a major food staple for the world's population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate....... Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (http://rise.genomics.org.cn/). Udgivelsesdato: 2004-Jan-1...
Full Text Available Clovers (genus Trifolium are a large and widespread genus of legumes. A number of clovers are of agricultural importance as forage crops in grassland agriculture, particularly temperate areas. White clover (Trifolium repens L. is used in grazed pasture and red clover (T. pratense L. is widely cut and conserved as a winter feed. For the diploid red clover, genetic and genomic tools and resources have developed rapidly over the last five years including genetic and physical maps, BAC (bacterial artificial chromosome end sequence and transcriptome sequence information. This has paved the way for the use of genome wide selection and high throughput phenotyping in germplasm development. For the allotetraploid white clover progress has been slower although marker assisted selection is in use and relatively robust genetic maps and QTL (quantitative trait locus information now exist. For both species the sequencing of the model legume Medicago truncatula gene space is an important development to aid genomic, biological and evolutionary studies. The first genetic maps of another species, subterranean clover (Trifolium subterraneum L. have also been published and its comparative genomics with red clover and M. truncatula conducted. Next generation sequencing brings the potential to revolutionize clover genomics, but international consortia and effective use of germplasm, novel population structures and phenomics will be required to carry out effective translation into breeding. Another avenue for clover genomic and genetic improvement is interspecific hybridization. This approach has considerable potential with regard to crop improvement but also opens windows of opportunity for studies of biological and evolutionary processes.
Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii: The Identification of Genes and Markers Associated with Reproduction
Full Text Available The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.
Li, Dan; Zhou, Zhongliang; Si, Yafei; Xu, Yongjian; Shen, Chi; Wang, Yiyang; Wang, Xiao
of health human resource. The tough issue of HHR inequality should be addressed by comprehensive measures from a multidisciplinary perspective.
White, Richard A.; Brown, Joseph M.; Colby, Sean M.; Overall, Christopher C.; Lee, Joon-Yong; Zucker, Jeremy D.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.
ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.
Brown, T. A. (Terence A.)
... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...
Turinsky Andrei L
Full Text Available Abstract Background The Bluejay genome browser has been developed over several years to address the challenges posed by the ever increasing number of data types as well as the increasing volume of data in genome research. Beginning with a browser capable of rendering views of XML-based genomic information and providing scalable vector graphics output, we have now completed version 1.0 of the system with many additional features. Our development efforts were guided by our observation that biologists who use both gene expression profiling and comparative genomics gain functional insights above and beyond those provided by traditional per-gene analyses. Results Bluejay 1.0 is a genome viewer integrating genome annotation with: (i gene expression information; and (ii comparative analysis with an unlimited number of other genomes in the same view. This allows the biologist to see a gene not just in the context of its genome, but also its regulation and its evolution. Bluejay now has rich provision for personalization by users: (i numerous display customization features; (ii the availability of waypoints for marking multiple points of interest on a genome and subsequently utilizing them; and (iii the ability to take user relevance feedback of annotated genes or textual items to offer personalized recommendations. Bluejay 1.0 also embeds the Seahawk browser for the Moby protocol, enabling users to seamlessly invoke hundreds of Web Services on genomic data of interest without any hard-coding. Conclusion Bluejay offers a unique set of customizable genome-browsing features, with the goal of allowing biologists to quickly focus on, analyze, compare, and retrieve related information on the parts of the genomic data they are most interested in. We expect these capabilities of Bluejay to benefit the many biologists who want to answer complex questions using the information available from completely sequenced genomes.
Iquebal, M A; Jaiswal, Sarika; Mahato, Ajay Kumar; Jayaswal, Pawan K; Angadi, U B; Kumar, Neeraj; Sharma, Nimisha; Singh, Anand K; Srivastav, Manish; Prakash, Jai; Singh, S K; Khan, Kasim; Mishra, Rupesh K; Rajan, Shailendra; Bajpai, Anju; Sandhya, B S; Nischita, Puttaraju; Ravishankar, K V; Dinesh, M R; Rai, Anil; Kumar, Dinesh; Sharma, Tilak R; Singh, Nagendra K
Full Text Available To understand the resource features and geology in the deep Jinchuan nickel deposit, difficult geological conditions were systematically analyzed, including high stress, fragmentized ore rock, prevalent deformation, difficult tunnel support, complicated rock mechanics, and low mining recovery. An integrated technology package was built for safe, efficient, and continuous mining in a deep, massive, and complex nickel and cobalt mine. This was done by the invention of a large-area continuous mining method with honeycomb drives; the establishment of ground control theory and a technology package for high-stress and fragmented ore rock; and the development of a new type of backfilling cement material, along with a deep backfilling technology that comprises the pipeline transport of high-density slurry with coarse aggregates. In this way, good solutions to existing problems were found to permit the efficient exploitation and comprehensive utilization of the resources in the deep Jinchuan nickel mine. In addition, a technological demonstration in an underground mine was performed using the cemented undercut-and-fill mining method for stressful, fragmented, and rheological rock.
Huang, Jingshan; Eilbeck, Karen; Smith, Barry; Blake, Judith A; Dou, Dejing; Huang, Weili; Natale, Darren A; Ruttenberg, Alan; Huan, Jun; Zimmermann, Michael T; Jiang, Guoqian; Lin, Yu; Wu, Bin; Strachan, Harrison J; He, Yongqun; Zhang, Shaojie; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming
In recent years, sequencing technologies have enabled the identification of a wide range of non-coding RNAs (ncRNAs). Unfortunately, annotation and integration of ncRNA data has lagged behind their identification. Given the large quantity of information being obtained in this area, there emerges an urgent need to integrate what is being discovered by a broad range of relevant communities. To this end, the Non-Coding RNA Ontology (NCRO) is being developed to provide a systematically structured and precisely defined controlled vocabulary for the domain of ncRNAs, thereby facilitating the discovery, curation, analysis, exchange, and reasoning of data about structures of ncRNAs, their molecular and cellular functions, and their impacts upon phenotypes. The goal of NCRO is to serve as a common resource for annotations of diverse research in a way that will significantly enhance integrative and comparative analysis of the myriad resources currently housed in disparate sources. It is our belief that the NCRO ontology can perform an important role in the comprehensive unification of ncRNA biology and, indeed, fill a critical gap in both the Open Biological and Biomedical Ontologies (OBO) Library and the National Center for Biomedical Ontology (NCBO) BioPortal. Our initial focus is on the ontological representation of small regulatory ncRNAs, which we see as the first step in providing a resource for the annotation of data about all forms of ncRNAs. The NCRO ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/ncro.owl.
Nakagawa, So; Takahashi, Mahoko Ueda
In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.
Sharpton, Thomas J; Jospin, Guillaume; Wu, Dongying; Langille, Morgan G I; Pollard, Katherine S; Eisen, Jonathan A
New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as "Sifting Families," or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology-based analyses. We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/).
Gupta, Sonal; Nawaz, Kashif; Parween, Sabiha; Roy, Riti; Sahu, Kamlesh; Kumar Pole, Anil; Khandal, Hitaishi; Srivastava, Rishi; Kumar Parida, Swarup; Chattopadhyay, Debasis
Cicer reticulatum L. is the wild progenitor of the fourth most important legume crop chickpea (C. arietinum L.). We assembled short-read sequences into 416 Mb draft genome of C. reticulatum and anchored 78% (327 Mb) of this assembly to eight linkage groups. Genome annotation predicted 25,680 protein-coding genes covering more than 90% of predicted gene space. The genome assembly shared a substantial synteny and conservation of gene orders with the genome of the model legume Medicago truncatula. Resistance gene homologs of wild and domesticated chickpeas showed high sequence homology and conserved synteny. Comparison of gene sequences and nucleotide diversity using 66 wild and domesticated chickpea accessions suggested that the desi type chickpea was genetically closer to the wild species than the kabuli type. Comparative analyses predicted gene flow between the wild and the cultivated species during domestication. Molecular diversity and population genetic structure determination using 15,096 genome-wide single nucleotide polymorphisms revealed an admixed domestication pattern among cultivated (desi and kabuli) and wild chickpea accessions belonging to three population groups reflecting significant influence of parentage or geographical origin for their cultivar-specific population classification. The assembly and the polymorphic sequence resources presented here would facilitate the study of chickpea domestication and targeted use of wild Cicer germplasms for agronomic trait improvement in chickpea. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Full Text Available MicroRNAs (miRNAs are important regulators of many cellular processes and exist in a wide range of eukaryotes. High-throughput sequencing is a mainstream method of miRNA identification through which it is possible to obtain the complete small RNA profile of an organism. Currently, most approaches to miRNA identification rely on a reference genome for the prediction of hairpin structures. However, many species of economic and phylogenetic importance are non-model organisms without complete genome sequences, and this limits miRNA discovery. Here, to overcome this limitation, we have developed a contig-based miRNA identification strategy. We applied this method to a triploid species of edible banana (GCTCV-119, Musa spp. AAA group and identified 180 pre-miRNAs and 314 mature miRNAs, which is three times more than those were predicted by the available dataset-based methods (represented by EST+GSS. Based on the recently published miRNA data set of Musa acuminate, the recall rate and precision of our strategy are estimated to be 70.6% and 92.2%, respectively, significantly better than those of EST+GSS-based strategy (10.2% and 50.0%, respectively. Our novel, efficient and cost-effective strategy facilitates the study of the functional and evolutionary role of miRNAs, as well as miRNA-based molecular breeding, in non-model species of economic or evolutionary interest.
Full Text Available Abstract Background Proteinaceous toxins are observed across all levels of inter-organismal and intra-genomic conflicts. These include recently discovered prokaryotic polymorphic toxin systems implicated in intra-specific conflicts. They are characterized by a remarkable diversity of C-terminal toxin domains generated by recombination with standalone toxin-coding cassettes. Prior analysis revealed a striking diversity of nuclease and deaminase domains among the toxin modules. We systematically investigated polymorphic toxin systems using comparative genomics, sequence and structure analysis. Results Polymorphic toxin systems are distributed across all major bacterial lineages and are delivered by at least eight distinct secretory systems. In addition to type-II, these include type-V, VI, VII (ESX, and the poorly characterized “Photorhabdus virulence cassettes (PVC”, PrsW-dependent and MuF phage-capsid-like systems. We present evidence that trafficking of these toxins is often accompanied by autoproteolytic processing catalyzed by HINT, ZU5, PrsW, caspase-like, papain-like, and a novel metallopeptidase associated with the PVC system. We identified over 150 distinct toxin domains in these systems. These span an extraordinary catalytic spectrum to include 23 distinct clades of peptidases, numerous previously unrecognized versions of nucleases and deaminases, ADP-ribosyltransferases, ADP ribosyl cyclases, RelA/SpoT-like nucleotidyltransferases, glycosyltranferases and other enzymes predicted to modify lipids and carbohydrates, and a pore-forming toxin domain. Several of these toxin domains are shared with host-directed effectors of pathogenic bacteria. Over 90 families of immunity proteins might neutralize anywhere between a single to at least 27 distinct types of toxin domains. In some organisms multiple tandem immunity genes or immunity protein domains are organized into polyimmunity loci or polyimmunity proteins. Gene-neighborhood-analysis of
Webb, Kristen M; Rosenthal, Benjamin M
The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple
Wang, Daxi; Korhonen, Pasi K; Gasser, Robin B; Young, Neil D
Clonorchis sinensis (family Opisthorchiidae) is an important foodborne parasite that has a major socioeconomic impact on ~35 million people predominantly in China, Vietnam, Korea and the Russian Far East. In humans, infection with C. sinensis causes clonorchiasis, a complex hepatobiliary disease that can induce cholangiocarcinoma (CCA), a malignant cancer of the bile ducts. Central to understanding the epidemiology of this disease is knowledge of genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species, evidence of karyotypic variation within C. sinensis and cryptic species within a related opisthorchiid fluke (Opisthorchis viverrini) emphasise the importance of studying and comparing the genes and genomes of geographically distinct isolates of C. sinensis. Recently, we sequenced, assembled and characterised a draft nuclear genome of a C. sinensis isolate from Korea and compared it with a published draft genome of a Chinese isolate of this species using a bioinformatic workflow established for comparing draft genome assemblies and their gene annotations. We identified that 50.6% and 51.3% of the Korean and Chinese C. sinensis genomic scaffolds were syntenic, respectively. Within aligned syntenic blocks, the genomes had a high level of nucleotide identity (99.1%) and encoded 15 variable proteins likely to be involved in diverse biological processes. Here, we review current technical challenges of using draft genome assemblies to undertake comparative genomic analyses to quantify genetic variation between isolates of the same species. Using a workflow that overcomes these challenges, we report on a high-quality draft genome for C. sinensis from Korea and comparative genomic analyses, as a basis for future investigations of the genetic structures of C. sinensis populations, and discuss the biotechnological implications of these explorations. Copyright © 2018
Full Text Available Lepidopterans (butterflies and moths are a rich and diverse order of insects, which, despite their economic impact and unusual biological properties, are relatively underrepresented in terms of genomic resources. The genome of the silkworm Bombyx mori has been fully sequenced, but comparative lepidopteran genomics has been hampered by the scarcity of information for other species. This is especially striking for butterflies, even though they have diverse and derived phenotypes (such as color vision and wing color patterns and are considered prime models for the evolutionary and developmental analysis of ecologically relevant, complex traits. We focus on Bicyclus anynana butterflies, a laboratory system for studying the diversification of novelties and serially repeated traits. With a panel of 12 small families and a biphasic mapping approach, we first assigned 508 expressed genes to segregation groups and then ordered 297 of them within individual linkage groups. We also coarsely mapped seven color pattern loci. This is the richest gene-based map available for any butterfly species and allowed for a broad-coverage analysis of synteny with the lepidopteran reference genome. Based on 462 pairs of mapped orthologous markers in Bi. anynana and Bo. mori, we observed strong conservation of gene assignment to chromosomes, but also evidence for numerous large- and small-scale chromosomal rearrangements. With gene collections growing for a variety of target organisms, the ability to place those genes in their proper genomic context is paramount. Methods to map expressed genes and to compare maps with relevant model systems are crucial to extend genomic-level analysis outside classical model species. Maps with gene-based markers are useful for comparative genomics and to resolve mapped genomic regions to a tractable number of candidate genes, especially if there is synteny with related model species. This is discussed in relation to the identification of
Full Text Available Abstract Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum. Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data
Pujolar, J.M.; Jacobsen, M.W.; Frydenberg, J.
Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the Eu...... 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome...
Hüser Andrea T
Full Text Available Abstract Background The introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics. Results To facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1 GenDB, an open source genome annotation system, (2 EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3 CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions. Conclusion CoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at http://www.CoryneCenter.de.
Bauer, Patricia J; Blue, Shala N; Xu, Aoxiang; Esposito, Alena G
We investigated 7- to 10-year-old children's productive extension of semantic memory through self-generation of new factual knowledge derived through integration of separate yet related facts learned through instruction or through reading. In Experiment 1, an experimenter read the to-be-integrated facts. Children successfully learned and integrated the information and used it to further extend their semantic knowledge, as evidenced by high levels of correct responses in open-ended and forced-choice testing. In Experiment 2, on half of the trials, the to-be-integrated facts were read by an experimenter (as in Experiment 1) and on half of the trials, children read the facts themselves. Self-generation performance was high in both conditions (experimenter- and self-read); in both conditions, self-generation of new semantic knowledge was related to an independent measure of children's reading comprehension. In Experiment 3, the way children deployed cognitive resources during reading was predictive of their subsequent recall of newly learned information derived through integration. These findings indicate self-generation of new semantic knowledge through integration in school-age children as well as relations between this productive means of extension of semantic memory and cognitive processes engaged during reading. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Tello-Ruiz, Marcela K; Naithani, Sushma; Stein, Joshua C; Gupta, Parul; Campbell, Michael; Olson, Andrew; Wei, Sharon; Preece, Justin; Geniza, Matthew J; Jiao, Yinping; Lee, Young Koung; Wang, Bo; Mulvaney, Joseph; Chougule, Kapeel; Elser, Justin
Abstract Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversi...
Keane, Michael; Craig, Thomas; Alfoldi, Jessica; Berlin, Aaron M; Johnson, Jeremy; Seluanov, Andrei; Gorbunova, Vera; Di Palma, Federica; Lindblad-Toh, Kerstin; Church, George M; de Magalhaes, Joao Pedro
MOTIVATION: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. RESULTS: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, inc...
Rebell, Michael A.; Wolff, Jessica R.
This fifth in a five part series, states that, if comprehensive educational opportunity is conceived as a right, then the state must commit to providing it and must develop a policy infrastructure to assure broad access, uniform quality, regularized funding, and firm accountability strictures to ensure all students a meaningful opportunity to…
Jayakumar, Vasanthan; Sakakibara, Yasubumi
Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.
Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.
Full Text Available Cotton is the world's most important natural fiber crop. It is also a model system for studying polyploidization, genomic organization, and genome-size variation. Integrating the cytological characterization of cotton with its genetic map will be essential for understanding its genome structure and evolution, as well as for performing further genetic-map based mapping and cloning. In this study, we isolated a complete set of bacterial artificial chromosome clones anchored to each of the 52 chromosome arms of the tetraploid cotton Gossypium hirsutum. Combining these with telomere and centromere markers, we constructed a standard karyotype for the G. hirsutum inbred line TM-1. We dissected the chromosome arm localizations of the 45S and 5S rDNA and suggest a centromere repositioning event in the homoeologous chromosomes AT09 and DT09. By integrating a systematic karyotype analysis with the genetic linkage map, we observed different genome sizes and chromosomal structures between the subgenomes of the tetraploid cotton and those of its diploid ancestors. Using evidence of conserved coding sequences, we suggest that the different evolutionary paths of non-coding retrotransposons account for most of the variation in size between the subgenomes of tetraploid cotton and its diploid ancestors. These results provide insights into the cotton genome and will facilitate further genome studies in G. hirsutum.
Wenbo; Shan; Yanqin; Jiang; Jinlei; Han; Kai; Wang
Cotton is the world’s most important natural fiber crop. It is also a model system for studying polyploidization, genomic organization, and genome-size variation. Integrating the cytological characterization of cotton with its genetic map will be essential for understanding its genome structure and evolution, as well as for performing further genetic-map based mapping and cloning. In this study, we isolated a complete set of bacterial artificial chromosome clones anchored to each of the 52 chromosome arms of the tetraploid cotton Gossypium hirsutum. Combining these with telomere and centromere markers, we constructed a standard karyotype for the G. hirsutum inbred line TM-1. We dissected the chromosome arm localizations of the 45 S and 5S r DNA and suggest a centromere repositioning event in the homoeologous chromosomes AT09 and DT09. By integrating a systematic karyotype analysis with the genetic linkage map, we observed different genome sizes and chromosomal structures between the subgenomes of the tetraploid cotton and those of its diploid ancestors. Using evidence of conserved coding sequences, we suggest that the different evolutionary paths of non-coding retrotransposons account for most of the variation in size between the subgenomes of tetraploid cotton and its diploid ancestors. These results provide insights into the cotton genome and will facilitate further genome studies in G. hirsutum.
Stenson, Peter D; Mort, Matthew; Ball, Edward V; Shaw, Katy; Phillips, Andrew; Cooper, David N
The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.
Perry, George H; Reeves, Darryl; Melsted, Páll; Ratan, Aakrosh; Miller, Webb; Michelini, Katelyn; Louis, Edward E; Pritchard, Jonathan K; Mason, Christopher E; Gilad, Yoav
We present a high-coverage draft genome assembly of the aye-aye (Daubentonia madagascariensis), a highly unusual nocturnal primate from Madagascar. Our assembly totals ~3.0 billion bp (3.0 Gb), roughly the size of the human genome, comprised of ~2.6 million scaffolds (N50 scaffold size = 13,597 bp) based on short paired-end sequencing reads. We compared the aye-aye genome sequence data with four other published primate genomes (human, chimpanzee, orangutan, and rhesus macaque) as well as with the mouse and dog genomes as nonprimate outgroups. Unexpectedly, we observed strong evidence for a relatively slow substitution rate in the aye-aye lineage compared with these and other primates. In fact, the aye-aye branch length is estimated to be ~10% shorter than that of the human lineage, which is known for its low substitution rate. This finding may be explained, in part, by the protracted aye-aye life-history pattern, including late weaning and age of first reproduction relative to other lemurs. Additionally, the availability of this draft lemur genome sequence allowed us to polarize nucleotide and protein sequence changes to the ancestral primate lineage-a critical period in primate evolution, for which the relevant fossil record is sparse. Finally, we identified 293,800 high-confidence single nucleotide polymorphisms in the donor individual for our aye-aye genome sequence, a captive-born individual from two wild-born parents. The resulting heterozygosity estimate of 0.051% is the lowest of any primate studied to date, which is understandable considering the aye-aye's extensive home-range size and relatively low population densities. Yet this level of genetic diversity also suggests that conservation efforts benefiting this unusual species should be prioritized, especially in the face of the accelerating degradation and fragmentation of Madagascar's forests.
Full Text Available Abstract Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org.
Full Text Available Human mitochondrial DNA (mtDNA encodes a set of 37 genes which are essential structural and functional components of the electron transport chain. Variations in these genes have been implicated in a broad spectrum of diseases and are extensively reported in literature and various databases. In this study, we describe MitoLSDB, an integrated platform to catalogue disease association studies on mtDNA (http://mitolsdb.igib.res.in. The main goal of MitoLSDB is to provide a central platform for direct submissions of novel variants that can be curated by the Mitochondrial Research Community. MitoLSDB provides access to standardized and annotated data from literature and databases encompassing information from 5231 individuals, 675 populations and 27 phenotypes. This platform is developed using the Leiden Open (source Variation Database (LOVD software. MitoLSDB houses information on all 37 genes in each population amounting to 132397 variants, 5147 unique variants. For each variant its genomic location as per the Revised Cambridge Reference Sequence, codon and amino acid change for variations in protein-coding regions, frequency, disease/phenotype, population, reference and remarks are also listed. MitoLSDB curators have also reported errors documented in literature which includes 94 phantom mutations, 10 NUMTs, six documentation errors and one artefactual recombination. MitoLSDB is the largest repository of mtDNA variants systematically standardized and presented using the LOVD platform. We believe that this is a good starting resource to curate mtDNA variants and will facilitate direct submissions enhancing data coverage, annotation in context of pathogenesis and quality control by ensuring non-redundancy in reporting novel disease associated variants.
Full Text Available Abstract Background Because of its size, allohexaploid nature and high repeat content, the wheat genome has always been perceived as too complex for efficient molecular studies. We recently constructed the first physical map of a wheat chromosome (3B. However gene mapping is still laborious in wheat because of high redundancy between the three homoeologous genomes. In contrast, in the closely related diploid species, barley, numerous gene-based markers have been developed. This study aims at combining the unique genomic resources developed in wheat and barley to decipher the organisation of gene space on wheat chromosome 3B. Results Three dimensional pools of the minimal tiling path of wheat chromosome 3B physical map were hybridised to a barley Agilent 15K expression microarray. This led to the fine mapping of 738 barley orthologous genes on wheat chromosome 3B. In addition, comparative analyses revealed that 68% of the genes identified were syntenic between the wheat chromosome 3B and barley chromosome 3 H and 59% between wheat chromosome 3B and rice chromosome 1, together with some wheat-specific rearrangements. Finally, it indicated an increasing gradient of gene density from the centromere to the telomeres positively correlated with the number of genes clustered in islands on wheat chromosome 3B. Conclusion Our study shows that novel structural genomics resources now available in wheat and barley can be combined efficiently to overcome specific problems of genetic anchoring of physical contigs in wheat and to perform high-resolution comparative analyses with rice for deciphering the organisation of the wheat gene space.
Sharpton Thomas J
Full Text Available Abstract Background New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. Results We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as “Sifting Families,” or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology–based analyses. Conclusions We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/.
Lin, Douglas I; Chudnovsky, Yakov; Duggan, Bridget; Zajchowski, Deborah; Greenbowe, Joel; Ross, Jeffrey S; Gay, Laurie M; Ali, Siraj M; Elvin, Julia A
Small cell carcinoma of the ovary, hypercalcemic-type (SCCOHT) is a rare, extremely aggressive neoplasm that usually occurs in young women and is characterized by deleterious germline or somatic SMARCA4 mutations. We performed comprehensive genomic profiling (CGP) to potentially identify additional clinically and pathophysiologically relevant genomic alterations in SCCOHT. CGP assessment of all classes of coding alterations in up to 406 genes commonly altered in cancer and intronic regions for up to 31 genes commonly rearranged in cancer was performed on 18 SCCOHT cases (16 exhibiting classic morphology and 2 cases exhibiting exclusive a large cell variant morphology). In addition, a retrospective database search for clinically advanced ovarian tumors with genomic profiles similar to SCCOHT yielded 3 additional cases originally diagnosed as non-SCCOHT. CGP revealed inactivating SMARCA4 alterations and low tumor mutational burden (TMB) (<6mutations/Mb) in 94% (15/16) of SCCOHT with classic morphology. In contrast, both (2/2) cases exhibiting only large cell variant morphology were hypermutated (TMB scores of 90 and 360mut/Mb) and were wildtype for SMARCA4. In our retrospective search, an index ovarian cancer patient harboring inactivating SMARCA4 alterations, initially diagnosed as endometrioid carcinoma, was re-classified as SCCOHT and responded to an SCCOHT chemotherapy regimen. The vast majority of SCCOHT demonstrate genomic SMARCA4 loss with only rare co-occurring alterations. Our data support a role for CGP in the diagnosis and management of SCCOHT and of other lesions with overlapping histological and clinical features, since identifying the former by genomic profile suggests benefit from an appropriate regimen and treatment decisions, as illustrated by an index patient. Copyright © 2017 Elsevier Inc. All rights reserved.
Full Text Available Abstract Background De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L. is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable. Results To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of ~50 kb (N80 ~31 kb, N90 ~21 kb and a Q40 of 94%. For ~80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes. By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies. Conclusion The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome.
Kremkow, Benjamin G; Baik, Jong Youn; MacDonald, Madolyn L; Lee, Kelvin H
Chinese hamster ovary (CHO) cells are a major host cell line for the production of therapeutic proteins, and CHO cell and Chinese hamster (CH) genomes have recently been sequenced using next-generation sequencing methods. CHOgenome.org was launched in 2011 (version 1.0) to serve as a database repository and to provide bioinformatics tools for the CHO community. CHOgenome.org (version 1.0) maintained GenBank CHO-K1 genome data, identified CHO-omics literature, and provided a CHO-specific BLAST service. Recent major updates to CHOgenome.org (version 2.0) include new sequence and annotation databases for both CHO and CH genomes, a more user-friendly website, and new research tools, including a proteome browser and a genome viewer. CHO cell-line specific sequences and annotations facilitate cell line development opportunities, several of which are discussed. Moving forward, CHOgenome.org will host the increasing amount of CHO-omics data and continue to make useful bioinformatics tools available to the CHO community. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Varshney, R.K.; Song, C.; Saxena, R.K.; Azam, S.; Doležel, Jaroslav; Cook, D.R.
Roč. 31, č. 3 (2013), s. 240-246 ISSN 1087-0156 Grant - others:GA MŠk(CZ) ED0007/01/01 Program:ED Institutional research plan: CEZ:AV0Z50380511 Keywords : POPULATION-STRUCTURE * L. GENOME * ARABIDOPSIS Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 39.080, year: 2013
Background The Nile tilapia (Oreochromis niloticus) is the second most farmed fish species worldwide. It is also an important model for studies of fish physiology, particularly because of its broad tolerance to an array of environments. It is a good model to study evolutionary mechanisms in vertebrates, because of its close relationship to haplochromine cichlids, which have undergone rapid speciation in East Africa. The existing genomic resources for Nile tilapia include a genetic map, BAC end sequences and ESTs, but comparative genome analysis and maps of quantitative trait loci (QTL) are still limited. Results We have constructed a high-resolution radiation hybrid (RH) panel for the Nile tilapia and genotyped 1358 markers consisting of 850 genes, 82 markers corresponding to BAC end sequences, 154 microsatellites and 272 single nucleotide polymorphisms (SNPs). From these, 1296 markers could be associated in 81 RH groups, while 62 were not linked. The total size of the RH map is 34,084 cR3500 and 937,310 kb. It covers 88% of the entire genome with an estimated inter-marker distance of 742 Kb. Mapping of microsatellites enabled integration to the genetic map. We have merged LG8 and LG24 into a single linkage group, and confirmed that LG16-LG21 are also merged. The orientation and association of RH groups to each chromosome and LG was confirmed by chromosomal in situ hybridizations (FISH) of 55 BACs. Fifty RH groups were localized on the 22 chromosomes while 31 remained small orphan groups. Synteny relationships were determined between Nile tilapia, stickleback, medaka and pufferfish. Conclusion The RH map and associated FISH map provide a valuable gene-ordered resource for gene mapping and QTL studies. All genetic linkage groups with their corresponding RH groups now have a corresponding chromosome which can be identified in the karyotype. Placement of conserved segments indicated that multiple inter-chromosomal rearrangements have occurred between Nile tilapia
Full Text Available Miscanthus × giganteus is wildly cultivated as a potential biofuel feedstock around the world; however, the narrow genetic basis and sterile characteristics have become a limitation for its utilization. As a progenitor of M. × giganteus, M. sinensis is widely distributed around East Asia providing well abiotic stress tolerance. To enrich the M. sinensis genomic databases and resources, we sequenced and annotated the transcriptome of M. sinensis by using an Illumina HiSeq 2000 platform. Approximately 316 million high-quality trimmed reads were generated from 349 million raw reads, and a total of 114,747 unigenes were obtained after de novo assembly. Furthermore, 95,897 (83.57% unigenes were annotated to at least one database including NR, Swiss-Prot, KEGG, COG, GO, and NT, supporting that the sequences obtained were annotated properly. Differentially expressed gene analysis indicates that drought stress 15 days could be a critical period for M. sinensis response to drought stress. The high-throughput transcriptome sequencing of M. sinensis under drought stress has greatly enriched the current genomic available resources. The comparison of DEGs under different periods of drought stress identified a wealth of candidate genes involved in drought tolerance regulatory networks, which will facilitate further genetic improvement and molecular studies of the M. sinensis.
Kodama, Yuichi; Mashima, Jun; Kaminuma, Eli; Gojobori, Takashi; Ogasawara, Osamu; Takagi, Toshihisa; Okubo, Kousaku; Nakamura, Yasukazu
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
Wenning Zheng; Naresh V.R. Mutha; Hamed Heydari; Avirup Dutta; Cheuk Chuen Siow; Nicholas S. Jakubovics; Wei Yee Wee; Shi Yang Tan; Mia Yang Ang; Guat Jah Wong; Siew Woh Choo
Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genome...
Schwarzer, David; Buettner, Falk F R; Browning, Christopher; Nazarov, Sergey; Rabsch, Wolfgang; Bethe, Andrea; Oberbeck, Astrid; Bowman, Valorie D; Stummeyer, Katharina; Mühlenhoff, Martina; Leiman, Petr G; Gerardy-Schahn, Rita
Bacteriophage phi92 is a large, lytic myovirus isolated in 1983 from pathogenic Escherichia coli strains that carry a polysialic acid capsule. Here we report the genome organization of phi92, the cryoelectron microscopy reconstruction of its virion, and the reinvestigation of its host specificity. The genome consists of a linear, double-stranded 148,612-bp DNA sequence containing 248 potential open reading frames and 11 putative tRNA genes. Orthologs were found for 130 of the predicted proteins. Most of the virion proteins showed significant sequence similarities to proteins of myoviruses rv5 and PVP-SE1, indicating that phi92 is a new member of the novel genus of rv5-like phages. Reinvestigation of phi92 host specificity showed that the host range is not limited to polysialic acid-encapsulated Escherichia coli but includes most laboratory strains of Escherichia coli and many Salmonella strains. Structure analysis of the phi92 virion demonstrated the presence of four different types of tail fibers and/or tailspikes, which enable the phage to use attachment sites on encapsulated and nonencapsulated bacteria. With this report, we provide the first detailed description of a multivalent, multispecies phage armed with a host cell adsorption apparatus resembling a nanosized Swiss army knife. The genome, structure, and, in particular, the organization of the baseplate of phi92 demonstrate how a bacteriophage can evolve into a multi-pathogen-killing agent.
Simon, M. I.; Kim, U.-J.
We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year
Full Text Available One objective of this study was to provide readers with a clear and unified understanding ofparametric statistical and kernel methods, used for genomic prediction, and to compare some ofthese in the context of rice breeding for quantitative traits. Furthermore, another objective wasto provide a simple and user-friendly R package, named KRMM, which allows users to performRKHS regression with several kernels. After introducing the concept of regularized empiricalrisk minimization, the connections between well-known parametric and kernel methods suchas Ridge regression (i.e. genomic best linear unbiased predictor (GBLUP and reproducingkernel Hilbert space (RKHS regression were reviewed. Ridge regression was then reformulatedso as to show and emphasize the advantage of the kernel trick concept, exploited by kernelmethods in the context of epistatic genetic architectures, over parametric frameworks used byconventional methods. Some parametric and kernel methods; least absolute shrinkage andselection operator (LASSO, GBLUP, support vector machine regression (SVR and RKHSregression were thereupon compared for their genomic predictive ability in the context of ricebreeding using three real data sets. Among the compared methods, RKHS regression and SVRwere often the most accurate methods for prediction followed by GBLUP and LASSO. An Rfunction which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression,with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time hasbeen developed. Moreover, a modified version of this function, which allows users to tune kernelsfor RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Jacquin, Laval; Cao, Tuong-Vi; Ahmadi, Nourollah
One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel "trick" concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Full Text Available Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.
Brauchli, Rebecca; Jenny, Gregor J.; Füllemann, Désirée; Bauer, Georg F.
Studies using the Job Demands-Resources (JD-R) model commonly have a heterogeneous focus concerning the variables they investigate?selective job demands and resources as well as burnout and work engagement. The present study applies the rationale of the JD-R model to expand the relevant outcomes of job demands and job resources by linking the JD-R model to the logic of a generic health development framework predicting more broadly positive and negative health. The resulting JD-R health model ...
Full Text Available Parrotia subaequalis is an endangered palaeoendemic tree from disjunct montane sites in eastern China. Due to the lack of effective genomic resources, the genetic diversity and population structure of this endangered species are not clearly understood. In this study, we conducted paired-end shotgun sequencing (2 × 125 bp of genomic DNA for two individuals of P. subaequalis on the Illumina HiSeq platform. Based on the resulting sequences, we have successfully assembled the complete chloroplast genome of P. subaequalis, as well as identified the polymorphic chloroplast microsatellites (cpSSRs, nuclear microsatellites (nSSRs and mutational hotspots of chloroplast. Ten polymorphic cpSSR loci and 12 polymorphic nSSR loci were used to genotype 96 individuals of P. subaequalis from six populations to estimate genetic diversity and population structure. Our results revealed that P. subaequalis exhibited abundant genetic diversity (e.g., cpSSRs: Hcp = 0.862; nSSRs: HT = 0.559 and high genetic differentiation (e.g., cpSSRs: RST = 0.652; nSSRs: RST = 0.331, and characterized by a low pollen-to-seed migration ratio (r ≈ 1.78. These genetic patterns are attributable to its long evolutionary histories and low levels of contemporary inter-population gene flow by pollen and seed. In addition, lack of isolation-by-distance pattern and strong population genetic structuring in both marker systems, suggests that long-term isolation and/or habitat fragmentation as well as genetic drift may have also contributed to the geographic differentiation of P. subaequalis. Therefore, long-term habitat protection is the most important methods to prevent further loss of genetic variation and a decrease in effective population size. Furthermore, both cpSSRs and nSSRs revealed that P. subaequalis populations consisted of three genetic clusters, which should be considered as separated conservation units.
Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold
Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377
Full Text Available Wheat, like many other staple cereals, contains low levels of the essential micronutrients iron and zinc. Up to two billion people worldwide suffer from iron and zinc deficiencies, particularly in regions with predominantly cereal-based diets. Although wheat flour is commonly fortified during processing, an attractive and more sustainable solution is biofortification, which requires developing new varieties of wheat with inherently higher iron and zinc content in their grains. Until now most studies aimed at increasing iron and zinc content in wheat grains have focused on discovering natural variation in progenitor or related species. However, recent developments in genomics and transformation have led to a step change in targeted research on wheat at a molecular level. We discuss promising approaches to improve iron and zinc content in wheat using knowledge gained in model grasses. We explore how the latest resources developed in wheat, including sequenced genomes and mutant populations, can be exploited for biofortification. We also highlight the key research and practical challenges that remain in improving iron and zinc content in wheat.
Jayakodi, Murukarthick; Selvan, Sreedevi Ghokhilamani; Natesan, Senthil; Muthurajan, Raveendran; Duraisamy, Raghu; Ramineni, Jana Jeevan; Rathinasamy, Sakthi Ambothi; Karuppusamy, Nageswari; Lakshmanan, Pugalenthi; Chokkappan, Mohan
The goal of our research is to establish a unique portal to bring out the potential outcome of the research in the Casssava crop. The Biogen base for cassava clearly brings out the variations of different traits of the germplasms, maintained at the Tapioca and Castor Research Station, Tamil Nadu Agricultural University. Phenotypic and genotypic variations of the accessions are clearly depicted, for the users to browse and interpret the variations using the microsatellite markers. Database (BIOGEN BASE - CASSAVA) is designed using PHP and MySQL and is equipped with extensive search options. It is more user-friendly and made publicly available, to improve the research and development of cassava by making a wealth of genetics and genomics data available through open, common, and worldwide forum for all individuals interested in the field. The database is available for free at http://www.tnaugenomics.com/biogenbase/casava.php.
Full Text Available Abstract Re-emergence of schistosomiasis in regions of China where control programs have ceased requires development of molecular-genetic tools to track gene flow and assess genetic diversity of Schistosoma populations. We identified many microsatellite loci in the draft genome of Schistosoma japonicum using defined search criteria and selected a subset for further analysis. From an initial panel of 50 loci, 20 new microsatellites were selected for eventual optimization and application to a panel of worms from endemic areas. All but one of the selected microsatellites contain simple tri-nucleotide repeats. Moderate to high levels of polymorphism were detected. Numbers of alleles ranged from 6 to 14 and observed heterozygosity was always >0.6. The loci reported here will facilitate high resolution population-genetic studies on schistosomes in re-emergent foci.
Islam, Md Shiful; Choudhury, Mouraj; Majlish, Al-Nahian Khan; Islam, Tahmina; Ghosh, Ajit
Glutathione S-transferases (GSTs) are ubiquitous enzymes which play versatile functions including cellular detoxification and stress tolerance. In this study, a comprehensive genome-wide identification of GST gene family was carried out in potato (Solanum tuberosum L.). The result demonstrated the presence of at least 90 GST genes in potato which is greater than any other reported species. According to the phylogenetic analyses of Arabidopsis, rice and potato GST members, GSTs could be subdivided into ten different classes and each class is found to be highly conserved. The largest class of potato GST family is tau with 66 members, followed by phi and lambda. The chromosomal localization analysis revealed the highly uneven distribution of StGST genes across the potato genome. Transcript profiling of 55 StGST genes showed the tissue-specific expression for most of the members. Moreover, expression of StGST genes were mainly repressed in response to abiotic stresses, while largely induced in response to biotic and hormonal elicitations. Further analysis of StGST gene's promoter identified the presence of various stress responsive cis-regulatory elements. Moreover, one of the highly stress responsive StGST members, StGSTU46, showed strong affinity towards flurazole with lowest binding energy of -7.6kcal/mol that could be used as antidote to protect crop against herbicides. These findings will facilitate the further functional and evolutionary characterization of GST genes in potato. Copyright © 2017 Elsevier B.V. All rights reserved.
Ruttink, Tom; Roldán-Ruiz, Isabel; Asp, Torben
To advance the application of molecular breeding in Lolium perenne, we have generated a sequence resource to facilitate gene discovery and SNP marker development. Illumina GAII transcriptome sequencing was performed on meristem-enriched samples of 14 Lolium genotypes. De novo assemblies for indiv......To advance the application of molecular breeding in Lolium perenne, we have generated a sequence resource to facilitate gene discovery and SNP marker development. Illumina GAII transcriptome sequencing was performed on meristem-enriched samples of 14 Lolium genotypes. De novo assemblies...... of SNP markers in selected candidate genes. In parallel, a germplasm collection of 602 Lolium genotypes was established and is being phenotyped for plant architecture, reproductive characteristics, flowering time, and forage quality traits. We will test through association genetics whether phenotypic...
Full Text Available The sequencing of the full nuclear genome of sesame (Sesamum indicum L. provides the platform for functional analyses of genome components and their application in breeding programs. Although the importance of microsatellites markers or simple sequence repeats (SSR in crop genotyping, genetics, and breeding applications is well established, only a little information exist concerning SSRs at the whole genome level in sesame. In addition, SSRs represent a suitable marker type for sesame molecular breeding in developing countries where it is mainly grown. In this study, we identified 138,194 genome-wide SSRs of which 76.5% were physically mapped onto the 13 pseudo-chromosomes. Among these SSRs, up to three primers pairs were supplied for 101,930 SSRs and used to in silico amplify the reference genome together with two newly sequenced sesame accessions. A total of 79,957 SSRs (78% were polymorphic between the three genomes thereby suggesting their promising use in different genomics-assisted breeding applications. From these polymorphic SSRs, 23 were selected and validated to have high polymorphic potential in 48 sesame accessions from different growing areas of Africa. Furthermore, we have developed an online user-friendly database, SisatBase (http://www.sesame-bioinfo.org/SisatBase/, which provides free access to SSRs data as well as an integrated platform for functional analyses. Altogether, the reference SSR and SisatBase would serve as useful resources for genetic assessment, genomic studies, and breeding advancement in sesame, especially in developing countries.
Vanwesenbeeck, Ine; Westeneng, Judith; de Boer, Thilly; Reinders, Jo; van Zorge, Ruth
Today, more than half of the world population is under the age of 25 years and one in four is under age 18. The urgency of expanding access to Comprehensive Sexuality Education (CSE) notably for children and young people in Africa and Asia is greater than ever before. However, many challenges to the implementation and delivery of CSE in resource…
Full Text Available Identification and elucidation of functions of plant genes is valuable for both basic and applied research. In addition to natural variation in model plants, numerous loss-of-function resources have been produced by mutagenesis with chemicals, irradiation, or insertions of transposable elements or T-DNA. However, we may be unable to observe loss-of-function phenotypes for genes with functionally redundant homologs, and for those essential for growth and development. To offset such disadvantages, gain-of-function transgenic resources have been exploited. Activation-tagged lines have been generated using obligatory overexpression of endogenous genes by random insertion of an enhancer. Recent progress in DNA sequencing technology and bioinformatics has enabled the preparation of genomewide collections of full-length cDNAs (fl-cDNAs in some model species. Using the fl-cDNA clones, a novel gain-of-function strategy, Fl-cDNA OvereXpressor gene (FOX-hunting system, has been developed. A mutant phenotype in a FOX line can be directly attributed to the overexpressed fl-cDNA. Investigating a large population of FOX lines could reveal important genes conferring favorable phenotypes for crop breeding. Alternatively, a unique loss-of-function approach Chimeric REpressor gene Silencing Technology (CRES-T has been developed. In CRES-T, overexpression of a chimeric repressor, composed of the coding sequence of a transcription factor (TF and short peptide designated as the repression domain, could interfere with the action of endogenous TF in plants. Although plant TFs usually consist of gene families, CRES-T is effective, in principle, even for the TFs with functional redundancy. In this review, we focus on the current status of the gene-overexpression strategies and resources for identifying and elucidating novel functions of cereal genes. We discuss the potential of these research tools for identifying useful genes and phenotypes for application in crop
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.
The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640
Goron, Travis L; Raizada, Manish N
Small millets are nutrient-rich food sources traditionally grown and consumed by subsistence farmers in Asia and Africa. They include finger millet (Eleusine coracana), foxtail millet (Setaria italica), kodo millet (Paspalum scrobiculatum), proso millet (Panicum miliaceum), barnyard millet (Echinochloa spp.), and little millet (Panicum sumatrense). Local farmers value the small millets for their nutritional and health benefits, tolerance to extreme stress including drought, and ability to grow under low nutrient input conditions, ideal in an era of climate change and steadily depleting natural resources. Little scientific attention has been paid to these crops, hence they have been termed "orphan cereals." Despite this challenge, an advantageous quality of the small millets is that they continue to be grown in remote regions of the world which has preserved their biodiversity, providing breeders with unique alleles for crop improvement. The purpose of this review, first, is to highlight the diverse traits of each small millet species that are valued by farmers and consumers which hold potential for selection, improvement or mechanistic study. For each species, the germplasm, genetic and genomic resources available will then be described as potential tools to exploit this biodiversity. The review will conclude with noting current trends and gaps in the literature and make recommendations on how to better preserve and utilize diversity within these species to accelerate a New Green Revolution for subsistence farmers in Asia and Africa.
Travis Luc Goron
Full Text Available Small millets are nutrient-rich food sources traditionally grown and consumed by subsistence farmers in Asia and Africa. They include finger millet (Eleusine coracana, foxtail millet (Setaria italica, kodo millet (Paspalum scrobiculatum, proso millet (Panicum miliaceum, barnyard millet (Echinochloa spp., and little millet (Panicum sumatrense. Local farmers value the small millets for their nutritional and health, tolerance to extreme stress including drought, and ability to grow under low nutrient input conditions, ideal in an era of climate change and steadily depleting natural resources. Little scientific attention has been paid to these crops, hence they have been termed orphan cereals. Despite this challenge, an advantageous quality of the small millets is that they continue to be grown in remote regions of the world which has preserved their biodiversity, providing breeders with unique alleles for crop improvement. The purpose of this review, first, is to highlight the diverse traits of each small millet species that are valued by farmers and consumers (e.g. nutritional quality which hold potential for selection, improvement or mechanistic study. For each species, the germplasm, genetic and genomic resources available will then be described as potential tools to exploit this biodiversity. The review will conclude with noting current trends and gaps in the literature and make recommendations on how to better preserve and utilize diversity within these species to accelerate a New Green Revolution for subsistence farmers in Asia and Africa.
Cooper, Laurel; Meier, Austin; Laporte, Marie-Angélique; Elser, Justin L; Mungall, Chris; Sinn, Brandon T; Cavaliere, Dario; Carbon, Seth; Dunn, Nathan A; Smith, Barry; Qu, Botong; Preece, Justin; Zhang, Eugene; Todorovic, Sinisa; Gkoutos, Georgios; Doonan, John H; Stevenson, Dennis W; Arnaud, Elizabeth
Abstract The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository. PMID:29186578
The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a textbased browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.
Eppinger, Mark; Pearson, Talima; Koenig, Sara S. K.
In this genomic epidemiology study, we have applied high-resolution whole-genome-based sequence typing methodologies on a comprehensive set of genome sequences that have become available in the aftermath of the Haitian cholera epidemic. These sequence resources enabled us to reassess the degree...
Brall, Caroline; Maeckelberghe, Els; Porz, Rouven; Makhoul, Jihad; Schröder-Bäck, Peter
Research ethics anew gained importance due to the changing scientific landscape and increasing demands and competition in the academic field. These changes are further exaggerated because of scarce(r) resources in some countries on the one hand and advances in genomics on the other. In this paper,
Lane, Alexander; Boecklemann, Astrid; Woronuk, Grant N; Sarker, Lukman; Mahmoud, Soheil S
We are developing Lavandula angustifolia (lavender) as a model system for investigating molecular regulation of essential oil (a mixture of mono- and sesquiterpenes) production in plants. As an initial step toward building the necessary 'genomics toolbox' for this species, we constructed two cDNA libraries from lavender leaves and flowers, and obtained sequence information for 14,213 high-quality expressed sequence tags (ESTs). Based on homology to sequences present in GenBank, our EST collection contains orthologs for genes involved in the 1-deoxy-D: -xylulose-5-phosphate (DXP) and the mevalonic acid (MVA) pathways of terpenoid biosynthesis, and for known terpene synthases and prenyl transferases. To gain insight into the regulation of terpene metabolism in lavender flowers, we evaluated the transcriptional activity of the genes encoding for 1-deoxy-D: -xylulose-5-phosphate synthase (DXS) and HMG-CoA reductase (HMGR), which represent regulatory steps of the DXP and MVA pathways, respectively, in glandular trichomes (oil glands) by real-time PCR. While HMGR transcripts were barely detectable, DXS was heavily expressed in this tissue, indicating that essential oil constituents are predominantly produced through the DXP pathway in lavender glandular trichomes. As anticipated, the linalool synthase (LinS)-the gene responsible for the production of linalool, a major constituent of lavender essential oil-was also strongly expressed in glands. Surprisingly, the most abundant transcript in floral glandular trichomes corresponded to a sesquiterpene synthase (cadinene synthase, CadS), although sesquiterpenes are minor constituents of lavender essential oils. This result, coupled to the weak activity of the MVA pathway (the main route for sesquiterpene production) in trichomes, indicates that precursor supply may represent a bottleneck in the biosynthesis of sesquiterpenes in lavender flowers.
Brauchli, Rebecca; Jenny, Gregor J; Füllemann, Désirée; Bauer, Georg F
Studies using the Job Demands-Resources (JD-R) model commonly have a heterogeneous focus concerning the variables they investigate-selective job demands and resources as well as burnout and work engagement. The present study applies the rationale of the JD-R model to expand the relevant outcomes of job demands and job resources by linking the JD-R model to the logic of a generic health development framework predicting more broadly positive and negative health. The resulting JD-R health model was operationalized and tested with a generalizable set of job characteristics and positive and negative health outcomes among a heterogeneous sample of 2,159 employees. Applying a theory-driven and a data-driven approach, measures which were generally relevant for all employees were selected. Results from structural equation modeling indicated that the model fitted the data. Multiple group analyses indicated invariance across six organizations, gender, job positions, and three times of measurement. Initial evidence was found for the validity of an expanded JD-R health model. Thereby this study contributes to the current research on job characteristics and health by combining the core idea of the JD-R model with the broader concepts of salutogenic and pathogenic health development processes as well as both positive and negative health outcomes.
Full Text Available Studies using the Job Demands-Resources (JD-R model commonly have a heterogeneous focus concerning the variables they investigate—selective job demands and resources as well as burnout and work engagement. The present study applies the rationale of the JD-R model to expand the relevant outcomes of job demands and job resources by linking the JD-R model to the logic of a generic health development framework predicting more broadly positive and negative health. The resulting JD-R health model was operationalized and tested with a generalizable set of job characteristics and positive and negative health outcomes among a heterogeneous sample of 2,159 employees. Applying a theory-driven and a data-driven approach, measures which were generally relevant for all employees were selected. Results from structural equation modeling indicated that the model fitted the data. Multiple group analyses indicated invariance across six organizations, gender, job positions, and three times of measurement. Initial evidence was found for the validity of an expanded JD-R health model. Thereby this study contributes to the current research on job characteristics and health by combining the core idea of the JD-R model with the broader concepts of salutogenic and pathogenic health development processes as well as both positive and negative health outcomes.
Sorghum is the second cereal crop to have a full genome completely sequenced (Nature (2009), 457:551). This achievement is widely recognized as a scientific milestone for grass genetics and genomics in general. However, the true worth of genetic information lies in translating the sequence informa...
and the previously published chickpea intraspecific map, integration of maps was performed which revealed improvement of marker density and saturation of the region in the vicinity of sfl (double-podding gene thereby bringing about an advancement of the current map. Conclusion An arsenal of 181 new chickpea STMS markers was reported. The developed intraspecific linkage map defined map positions of 138 markers which included 101 new locations.Map integration with a previously published map was carried out which revealed an advanced map with improved density. This study is a major contribution towards providing advanced genomic resources which will facilitate chickpea geneticists and molecular breeders in developing superior genotypes with improved traits.
Gaur, Rashmi; Sethy, Niroj K; Choudhary, Shalu; Shokeen, Bhumika; Gupta, Varsha; Bhatia, Sabhyata
intraspecific map, integration of maps was performed which revealed improvement of marker density and saturation of the region in the vicinity of sfl (double-podding) gene thereby bringing about an advancement of the current map. An arsenal of 181 new chickpea STMS markers was reported. The developed intraspecific linkage map defined map positions of 138 markers which included 101 new locations.Map integration with a previously published map was carried out which revealed an advanced map with improved density. This study is a major contribution towards providing advanced genomic resources which will facilitate chickpea geneticists and molecular breeders in developing superior genotypes with improved traits.
The paper reviews the history of Animal genetic resources (AnGRs) and claims that over the course of history they have been conceptually transformed from economic, ecologic and scientific life forms into political objects, reflecting in the way in which any valuation of AnGRs is today inherently imbued with national politics and its values enacted by legally binding global conventions. Historically, the first calls to conservation were based on the economic, ecological and scientific values of the AnGR. While the historical arguments are valid and still commonly proposed values for conservation, the AnGR have become highly politicized since the adoption of the Convention of Biological Diversity (CBD), the subsequent Interlaken Declaration, the Global Plan for Action (GPA) and the Nagoya Protocol. The scientific and political definitions of the AnGRs were creatively reshuffled within these documents and the key criteria by which they are now identified and valued today were essentially redefined. The criteria of "in situ condition" has become the necessary starting point for all valuation efforts of AnGRs, effectively transforming their previous nature as natural property and global genetic commons into objects of national concern pertaining to territorially discrete national genetic landscapes, regulated by the sovereign powers of the parties to the global conventions.
Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong
Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
The Cancer Analysis Virtual Machine (CAVM) project will leverage cloud technology, the UCSC Cancer Genomics Browser, and the Galaxy analysis workflow system to provide investigators with a flexible, scalable platform for hosting, visualizing and analyzing their own genomic data.
Chen, Jinying; Jagannatha, Abhyuday N; Fodeh, Samah J; Yu, Hong
Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target
Júlia Bálint Čeh
Full Text Available The paper presents the analysis of existing bilingual Slovenian-Hungarian dictionaries, which was made as part of the project aiming to design a concept for a new comprehensive Slovenian-Hungarian dictionary. First, a short historical overview of Slovenian-Hungarian lexicography is provided, including first collections of dialect vocabulary, glossaries, and collections and dictionaries of idioms. Then, an overview of Slovenian-Hungarian and Hungarian-Slovenian dictionaries is made, the first one being published in 1961. The paper then focuses on a comparison on three Slovenian-Hungarian dictionaries, which are currently used by majority of users, namely Slovenian-Hungarian part of the dictionary by Elizabeta Bernjak (1995, Slovenian-Hungarian dictionary by Jože Hradil (1996, and Slovenian-Hungarian part of the Hradil’s bidirectional dictionary. The dictionaries are compared in terms of size, headword list, coverage, headword presentation, grammar information, as well as in terms of other elements of dictionary microstructure such as translations and examples. The discussion section includes an analysis of the coverage offered by the dictionaries of the vocabulary compilled by teachers at bilingual schools in Prekmurje. The results indicate that the coverage of various levels of vocabulary, frequent or rare, is rather poor; as dictionaries are medium-sized and outdated, this is to be expected, however as the analysis shows, some basic concepts are also often not covered (e.g. research, death, allergy. The second part of the discussion is dedicated to the presentation of selected examples of good practice in bilingual lexicography, such as Comprehensive English-Slovenian dictionary Oxford-DZS as the first bilingual dictionary in Slovenia to use the corpus-based approach, as well as offer much more contextual information on the headwords. Also presented are English-Spanish online dictionaries by Oxford University Press and Collins, the focus
Full Text Available Abstract Background MicroRNAs (miRNAs regulate several biological processes through post-transcriptional gene silencing. The efficiency of binding of miRNAs to target transcripts depends on the sequence as well as intramolecular structure of the transcript. Single Nucleotide Polymorphisms (SNPs can contribute to alterations in the structure of regions flanking them, thereby influencing the accessibility for miRNA binding. Description The entire human genome was analyzed for SNPs in and around predicted miRNA target sites. Polymorphisms within 200 nucleotides that could alter the intramolecular structure at the target site, thereby altering regulation were annotated. Collated information was ported in a MySQL database with a user-friendly interface accessible through the URL: http://miracle.igib.res.in/dbSMR. Conclusion The database has a user-friendly interface where the information can be queried using either the gene name, microRNA name, polymorphism ID or transcript ID. Combination queries using 'AND' or 'OR' is also possible along with specifying the degree of change of intramolecular bonding with and without the polymorphism. Such a resource would enable researchers address questions like the role of regulatory SNPs in the 3' UTRs and population specific regulatory modulations in the context of microRNA targets.
Yang, Qing; Sun, Fanyue; Yang, Zhi; Li, Hongjun
Calanus sinicus Brodsky (Copepoda, Crustacea) is a dominant zooplanktonic species widely distributed in the margin seas of the Northwest Pacific Ocean. In this study, we utilized an RNA-Seq-based approach to develop molecular resources for C. sinicus. Adult samples were sequenced using the Illumina HiSeq 2000 platform. The sequencing data generated 69,751 contigs from 58.9 million filtered reads. The assembled contigs had an average length of 928.8 bp. Gene annotation allowed the identification of 43,417 unigene hits against the NCBI database. Gene ontology (GO) and KEGG pathway mapping analysis revealed various functional genes related to diverse biological functions and processes. Transcripts potentially involved in stress response and lipid metabolism were identified among these genes. Furthermore, 4,871 microsatellites and 110,137 single nucleotide polymorphisms (SNPs) were identified in the C. sinicus transcriptome sequences. SNP validation by the melting temperature (T m)-shift method suggested that 16 primer pairs amplified target products and showed biallelic polymorphism among 30 individuals. The present work demonstrates the power of Illumina-based RNA-Seq for the rapid development of molecular resources in nonmodel species. The validated SNP set from our study is currently being utilized in an ongoing ecological analysis to support a future study of C. sinicus population genetics. PMID:24982883
Brekke, L. D.; Pruitt, T.; Gangopadhyay, S.; Raff, D. A.
The SECURE Water Act § 9503(b)(2) authorizes the U.S. Department of Interior's Bureau of Reclamation to assess climate change risks for water and environmental resources in eight "major Reclamation river basins" in the Western United States (i.e. Colorado, Columbia, Klamath, Missouri, Rio Grande, Sacramento, San Joaquin, and Truckee basins). The legislation calls for Reclamation to provide periodic reports on implications for water supplies, water deliveries, hydropower generation, fish and wildlife, water quality, flood control, ecological resiliency, and recreation. Reclamation's is developing a framework for consistently characterizing risks in Western U.S. river basins through the West-Wide Climate Risk Assessments, part of the Basin Study Program. One initial activity within this framework is focused on characterizing hydrologic and water supply implications of climate change. The centerpiece of this activity is the development of a west-wide ensemble of hydrologic projections, tiering from information in the online archive "Bias Corrected and Downscaled WCRP CMIP3 Climate Projections" (http://gdo-dcp.ucllnl.org/downscaled_cmip3_projections/dcpInterface.html) and utilizing a network of hydrologic model applications featured in the University of Washington and Princeton University's "Experimental National Hydrologic Prediction System" (http://www.hydro.washington.edu/forecast/westwide/index.shtml). The resulting hydrologic information has the same space and time attributes as the underlying downscaled climate information: 112 projections of monthly downscaled CMIP3 conditions from 1950-2099 at 1/8° resolution over the Western U.S. (nested within the underlying archive’s contiguous U.S. domain). Such attributes permit a time evolving risk-based portrayal of hydrologic conditions, which is useful for climate change adaptation discussions where the timing of impacts matters in relation the initiation and investment of adaptation or mitigation measures
Robert E Druzinsky
Full Text Available In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required.Here we present the Mammalian Feeding Muscle Ontology (MFMO, a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/, a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions. We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar.Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive
Druzinsky, Robert E; Balhoff, James P; Crompton, Alfred W; Done, James; German, Rebecca Z; Haendel, Melissa A; Herrel, Anthony; Herring, Susan W; Lapp, Hilmar; Mabee, Paula M; Muller, Hans-Michael; Mungall, Christopher J; Sternberg, Paul W; Van Auken, Kimberly; Vinyard, Christopher J; Williams, Susan H; Wall, Christine E
In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results
Szczecińska, Monika; Sawicki, Jakub
The European continent is presently colonized by nine species of the genus Pulsatilla, five of which are encountered only in mountainous regions of southwest and south-central Europe. The remaining four species inhabit lowlands in the north-central and eastern parts of the continent. Most plants of the genus Pulsatilla are rare and endangered, which is why most research efforts focused on their biology, ecology and hybridization. The objective of this study was to develop genomic resources, including complete plastid genomes and nuclear rRNA clusters, for three sympatric Pulsatilla species that are most commonly found in Central Europe. The results will supply valuable information about genetic variation, which can be used in the process of designing primers for population studies and conservation genetics research. The complete plastid genomes together with the nuclear rRNA cluster can serve as a useful tool in hybridization studies. Six complete plastid genomes and nuclear rRNA clusters were sequenced from three species of Pulsatilla using the Illumina sequencing technology. Four junctions between single copy regions and inverted repeats and junctions between the identified locally-collinear blocks (LCB) were confirmed by Sanger sequencing. Pulsatilla genomes of 120 unique genes had a total length of approximately 161-162 kb, and 21 were duplicated in the inverted repeats (IR) region. Comparative plastid genomes of newly-sequenced Pulsatilla and the previously-identified plastomes of Aconitum and Ranunculus species belonging to the family Ranunculaceae revealed several variations in the structure of the genome, but the gene content remained constant. The nuclear rRNA cluster (18S-ITS1-5.8S-ITS2-26S) of studied Pulsatilla species is 5795 bp long. Among five analyzed regions of the rRNA cluster, only Internal Transcribed Spacer 2 (ITS2) enabled the molecular delimitation of closely-related Pulsatilla patens and Pulsatilla vernalis. The determination of complete
Zhao, Ying; Thammannagowda, Shivegowda; Staton, Margaret; Tang, Sha; Xia, Xinli; Yin, Weilun; Liang, Haiying
The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomic resources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.
Full Text Available Metabolic syndrome (MetS is a complex disorder related to insulin resistance, obesity, and inflammation. Genetic and environmental factors also contribute to the development of MetS, and through genome-wide association studies (GWASs, important susceptibility loci have been identified. However, GWASs focus more on individual single-nucleotide polymorphisms (SNPs, explaining only a small portion of genetic heritability. To overcome this limitation, pathway analyses are being applied to GWAS datasets. The aim of this study is to elucidate the biological pathways involved in the pathogenesis of MetS through pathway analysis. Cohort data from the Korea Associated Resource (KARE was used for analysis, which include 8,842 individuals (age, 52.2 ± 8.9 years; body mass index, 24.6 ± 3.2 kg/m2. A total of 312,121 autosomal SNPs were obtained after quality control. Pathway analysis was conducted using Meta-analysis Gene-Set Enrichment of Variant Associations (MAGENTA to discover the biological pathways associated with MetS. In the discovery phase, SNPs from chromosome 12, including rs11066280, rs2074356, and rs12229654, were associated with MetS (p < 5 × 10-6, and rs11066280 satisfied the Bonferroni-corrected cutoff (unadjusted p < 1.38 × 10-7, Bonferroni-adjusted p < 0.05. Through pathway analysis, biological pathways, including electron carrier activity, signaling by platelet-derived growth factor (PDGF, the mitogen-activated protein kinase kinase kinase cascade, PDGF binding, peroxisome proliferator-activated receptor (PPAR signaling, and DNA repair, were associated with MetS. Through pathway analysis of MetS, pathways related with PDGF, mitogen-activated protein kinase, and PPAR signaling, as well as nucleic acid binding, protein secretion, and DNA repair, were identified. Further studies will be needed to clarify the genetic pathogenesis leading to MetS.
Sills, Eric Scott; Yang, Zhihong; Walsh, David J; Salem, Shala A
The unacceptable multiple gestation rate currently associated with in vitro fertilization (IVF) would be substantially alleviated if the routine practice of transferring more than one embryo were reconsidered. While transferring a single embryo is an effective method to reduce the clinical problem of multiple gestation, rigid adherence to this approach has been criticized for negatively impacting clinical pregnancy success in IVF. In general, single embryo transfer is viewed cautiously by IVF patients although greater acceptance would result from a more effective embryo selection method. Selection of one embryo for fresh transfer on the basis of chromosomal normalcy should achieve the dual objective of maintaining satisfactory clinical pregnancy rates and minimizing the multiple gestation problem, because embryo aneuploidy is a major contributing factor in implantation failure and miscarriage in IVF. The initial techniques for preimplantation genetic screening unfortunately lacked sufficient sensitivity and did not yield the expected results in IVF. However, newer molecular genetic methods could be incorporated with standard IVF to bring the goal of single embryo transfer within reach. Aiming to make multiple embryo transfers obsolete and unnecessary, and recognizing that array comparative genomic hybridization (aCGH) will typically require an additional 12 h of laboratory time to complete, we propose adopting aCGH for mainstream use in clinical IVF practice. As aCGH technology continues to develop and becomes increasingly available at lower cost, it may soon be considered unusual for IVF laboratories to select a single embryo for fresh transfer without regard to its chromosomal competency. In this report, we provide a rationale supporting aCGH as the preferred methodology to provide a comprehensive genetic assessment of the single embryo before fresh transfer in IVF. The logistics and cost of integrating aCGH with IVF to enable fresh embryo transfer are also
Tilahun, Binyam; Fritz, Fleur
Electronic medical record (EMR) systems are increasingly being implemented in hospitals of developing countries to improve patient care and clinical service. However, only limited evaluation studies are available concerning the level of adoption and determinant factors of success in those settings. The objective of this study was to assess the usage pattern, user satisfaction level, and determinants of health professional's satisfaction towards a comprehensive EMR system implemented in Ethiopia where parallel documentation using the EMR and the paper-based medical records is in practice. A quantitative, cross-sectional study design was used to assess the usage pattern, user satisfaction level, and determinant factors of an EMR system implemented in Ethiopia based on the DeLone and McLean model of information system success. Descriptive statistical methods were applied to analyze the data and a binary logistic regression model was used to identify determinant factors. Health professionals (N=422) from five hospitals were approached and 406 responded to the survey (96.2% response rate). Out of the respondents, 76.1% (309/406) started to use the system immediately after implementation and user training, but only 31.7% (98/309) of the professionals reported using the EMR during the study (after 3 years of implementation). Of the 12 core EMR functions, 3 were never used by most respondents, and they were also unaware of 4 of the core EMR functions. It was found that 61.4% (190/309) of the health professionals reported over all dissatisfaction with the EMR (median=4, interquartile range (IQR)=1) on a 5-level Likert scale. Physicians were more dissatisfied (median=5, IQR=1) when compared to nurses (median=4, IQR=1) and the health management information system (HMIS) staff (median=2, IQR=1). Of all the participants, 64.4% (199/309) believed that the EMR had no positive impact on the quality of care. The participants indicated an agreement with the system and information
Cardone Maria Francesca
Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.
Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin
The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.
Complete mitochondrial genomes and nuclear ribosomal RNA operons of two species of Diplostomum (Platyhelminthes: Trematoda): a molecular resource for taxonomy and molecular epidemiology of important fish pathogens.
Brabec, Jan; Kostadinova, Aneta; Scholz, Tomáš; Littlewood, D Timothy J
The genus Diplostomum (Platyhelminthes: Trematoda: Diplostomidae) is a diverse group of freshwater parasites with complex life-cycles and global distribution. The larval stages are important pathogens causing eye fluke disease implicated in substantial impacts on natural fish populations and losses in aquaculture. However, the problematic species delimitation and difficulties in the identification of larval stages hamper the assessment of the distributional and host ranges of Diplostomum spp. and their transmission ecology. Total genomic DNA was isolated from adult worms and shotgun sequenced using Illumina MiSeq technology. Mitochondrial (mt) genomes and nuclear ribosomal RNA (rRNA) operons were assembled using established bioinformatic tools and fully annotated. Mt protein-coding genes and nuclear rRNA genes were subjected to phylogenetic analysis by maximum likelihood and the resulting topologies compared. We characterised novel complete mt genomes and nuclear rRNA operons of two closely related species, Diplostomum spathaceum and D. pseudospathaceum. Comparative mt genome assessment revealed that the cox1 gene and its 'barcode' region used for molecular identification are the most conserved regions; instead, nad4 and nad5 genes were identified as most promising molecular diagnostic markers. Using the novel data, we provide the first genome wide estimation of the phylogenetic relationships of the order Diplostomida, one of the two fundamental lineages of the Digenea. Analyses of the mitogenomic data invariably recovered the Diplostomidae as a sister lineage of the order Plagiorchiida rather than as a basal lineage of the Diplostomida as inferred in rDNA phylogenies; this was concordant with the mt gene order of Diplostomum spp. exhibiting closer match to the conserved gene order of the Plagiorchiida. Complete sequences of the mt genome and rRNA operon of two species of Diplostomum provide a valuable resource for novel genetic markers for species delineation and
Mofiz, Ehtesham; Holt, Deborah C; Seemann, Torsten; Currie, Bart J; Fischer, Katja; Papenfuss, Anthony T
The scabies mite, Sarcoptes scabiei, is a parasitic arachnid and cause of the infectious skin disease scabies in humans and mange in other animal species. Scabies infections are a major health problem, particularly in remote Indigenous communities in Australia, where secondary group A streptococcal and Staphylococcus aureus infections of scabies sores are thought to drive the high rate of rheumatic heart disease and chronic kidney disease. We sequenced the genome of two samples of Sarcoptes scabiei var. hominis obtained from unrelated patients with crusted scabies located in different parts of northern Australia using the Illumina HiSeq. We also sequenced samples of Sarcoptes scabiei var. suis from a pig model. Because of the small size of the scabies mite, these data are derived from pools of thousands of mites and are metagenomic, including host and microbiome DNA. We performed cleaning and de novo assembly and present Sarcoptes scabiei var. hominis and var. suis draft reference genomes. We have constructed a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins. We have developed extensive genomic resources for the scabies mite, including reference genomes and a preliminary annotation.
... Comprehensive Care Share this page Facebook Twitter Email Comprehensive Care Understand the importance of comprehensive MS care ... In this article A complex disease requires a comprehensive approach Today multiple sclerosis (MS) is not a ...
Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sferruzzi-Perri, Amanda N.; López-Tello, Jorge; Fowden, Abigail L.; Constancia, Miguel
Pregnancy success and life-long health depend on a cooperative interaction between the mother and the fetus in the allocation of resources. As the site of materno-fetal nutrient transfer, the placenta is central to this interplay; however, the relative importance of the maternal versus fetal genotypes in modifying the allocation of resources to the fetus is unknown. Using genetic inactivation of the growth and metabolism regulator, Pik3ca (encoding PIK3CA also known as p110α, α/+), we examined the interplay between the maternal genome and the fetal genome on placental phenotype in litters of mixed genotype generated through reciprocal crosses of WT and α/+ mice. We demonstrate that placental growth and structure were impaired and associated with reduced growth of α/+ fetuses. Despite its defective development, the α/+ placenta adapted functionally to increase the supply of maternal glucose and amino acid to the fetus. The specific nature of these changes, however, depended on whether the mother was α/+ or WT and related to alterations in endocrine and metabolic profile induced by maternal p110α deficiency. Our findings thus show that the maternal genotype and environment programs placental growth and function and identify the placenta as critical in integrating both intrinsic and extrinsic signals governing materno-fetal resource allocation. PMID:27621448
Skrzypek, Marek S; Nash, Robert S; Wong, Edith D; MacPherson, Kevin A; Hellerstedt, Sage T; Engel, Stacia R; Karra, Kalpana; Weng, Shuai; Sheppard, Travis K; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Cherry, J Michael
Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and...
Full Text Available Background: The European continent is presently colonized by nine species of the genus Pulsatilla, five of which are encountered only in mountainous regions of southwest and south-central Europe. The remaining four species inhabit lowlands in the north-central and eastern parts of the continent. Most plants of the genus Pulsatilla are rare and endangered, which is why most research efforts focused on their biology, ecology and hybridization. The objective of this study was to develop genomic resources, including complete plastid genomes and nuclear rRNA clusters, for three sympatric Pulsatilla species that are most commonly found in Central Europe. The results will supply valuable information about genetic variation, which can be used in the process of designing primers for population studies and conservation genetics research. The complete plastid genomes together with the nuclear rRNA cluster can serve as a useful tool in hybridization studies. Methodology/principal findings: Six complete plastid genomes and nuclear rRNA clusters were sequenced from three species of Pulsatilla using the Illumina sequencing technology. Four junctions between single copy regions and inverted repeats and junctions between the identified locally-collinear blocks (LCB were confirmed by Sanger sequencing. Pulsatilla genomes of 120 unique genes had a total length of approximately 161–162 kb, and 21 were duplicated in the inverted repeats (IR region. Comparative plastid genomes of newly-sequenced Pulsatilla and the previously-identified plastomes of Aconitum and Ranunculus species belonging to the family Ranunculaceae revealed several variations in the structure of the genome, but the gene content remained constant. The nuclear rRNA cluster (18S-ITS1-5.8S-ITS2-26S of studied Pulsatilla species is 5795 bp long. Among five analyzed regions of the rRNA cluster, only Internal Transcribed Spacer 2 (ITS2 enabled the molecular delimitation of closely-related Pulsatilla
Harr, Bettina; Karakoc, Emre; Neme, Rafik; Teschke, Meike; Pfeifle, Christine; Pezer, Željka; Babiker, Hiba; Linnenbrink, Miriam; Montero, Inka; Scavetta, Rick; Abai, Mohammad Reza; Molins, Marta Puente; Schlegel, Mathias; Ulrich, Rainer G.; Altmüller, Janine; Franitza, Marek; Büntge, Anna; Künzel, Sven; Tautz, Diethard
Wild populations of the house mouse (Mus musculus) represent the raw genetic material for the classical inbred strains in biomedical research and are a major model system for evolutionary biology. We provide whole genome sequencing data of individuals representing natural populations of M. m. domesticus (24 individuals from 3 populations), M. m. helgolandicus (3 individuals), M. m. musculus (22 individuals from 3 populations) and M. spretus (8 individuals from one population). We use a single pipeline to map and call variants for these individuals and also include 10 additional individuals of M. m. castaneus for which genomic data are publically available. In addition, RNAseq data were obtained from 10 tissues of up to eight adult individuals from each of the three M. m. domesticus populations for which genomic data were collected. Data and analyses are presented via tracks viewable in the UCSC or IGV genome browsers. We also provide information on available outbred stocks and instructions on how to keep them in the laboratory. PMID:27622383
Dornback, M.; Hourigan, T.; Etnoyer, P.; McGuinn, R.; Cross, S. L.
Research on deep-sea corals has expanded rapidly over the last two decades, as scientists began to realize their value as long-lived structural components of high biodiversity habitats and archives of environmental information. The NOAA Deep Sea Coral Research and Technology Program's National Database for Deep-Sea Corals and Sponges is a comprehensive resource for georeferenced data on these organisms in U.S. waters. The National Database currently includes more than 220,000 deep-sea coral records representing approximately 880 unique species. Database records from museum archives, commercial and scientific bycatch, and from journal publications provide baseline information with relatively coarse spatial resolution dating back as far as 1842. These data are complemented by modern, in-situ submersible observations with high spatial resolution, from surveys conducted by NOAA and NOAA partners. Management of high volumes of modern high-resolution observational data can be challenging. NOAA is working with our data partners to incorporate this occurrence data into the National Database, along with images and associated information related to geoposition, time, biology, taxonomy, environment, provenance, and accuracy. NOAA is also working to link associated datasets collected by our program's research, to properly archive them to the NOAA National Data Centers, to build a robust metadata record, and to establish a standard protocol to simplify the process. Access to the National Database is provided through an online mapping portal. The map displays point based records from the database. Records can be refined by taxon, region, time, and depth. The queries and extent used to view the map can also be used to download subsets of the database. The database, map, and website is already in use by NOAA, regional fishery management councils, and regional ocean planning bodies, but we envision it as a model that can expand to accommodate data on a global scale.
Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gräf, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio
Janda, Jaroslav; Šafář, Jan; Kubaláková, Marie; Bartoš, Jan; Kovářová, Pavlína; Suchánková, Pavla; Pateyron, S.; Čihalíková, Jarmila; Sourdille, P.; Šimková, Hana; Faivre-Rampant, P.; Hřibová, Eva; Bernard, M.; Lukaszewski, A.; Doležel, Jaroslav; Chalhoub, B.
Roč. 47, - (2006), s. 977-986 ISSN 0960-7412 R&D Projects: GA ČR GA521/04/0607; GA ČR GP521/05/P257; GA ČR GD521/05/H013; GA MŠk LC06004 Institutional research plan: CEZ:AV0Z50380511 Keywords : wheat * genomics * chromosome sorting Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 6.565, year: 2006
Drenkhan, Fabian; Huggel, Christian; Salzmann, Nadine; Giráldez, Claudia; Suarez, Wilson; Rohrer, Mario; Molina, Edwin; Montoya, Nilton; Miñan, Fiorella
Glaciers have been an important element of Andean societies and livelihoods as direct freshwater supply for agriculture irrigation, hydropower generation and mining activities. Peru's mainly remotely living population in the Central Andes has to cope with a strong seasonal variation of precipitations and river runoff interannually superimposed by El Niño impacts. Direct glacier and lake water discharge thus constitute a vital continuous water supply and represent a regulating buffer as far as hydrological variability is concerned. This crucial buffer effect is gradually altered by accelerated glacier retreat which leads most likely to an increase of annual river runoff variability. Furthermore, a near-future crossing of the 'peak water' is expected, from where on prior enhanced streamflow decreases and levels out towards a new still unknown minimum discharge. Consequently, a sustainable future water supply especially during low-level runoff dry season might not be guaranteed whereas Peru's water demand increases significantly. Here we present a comprehensive review, the current conditions and perspectives for water resources in the Cusco area with focus on the Vilcanota River, Cordillera Vilcanota, Southern Peru. With 279 km2 the Cordillera Vilcanota represents the second largest glacierized mountain range of the tropics worldwide. Especially as of the second half of the 1980s, it has been strongly affected by massive ice loss with around 30% glacier area decline until present. Furthermore, glacier vanishing triggers the formation of new lakes and increase of lake levels and therefore constitutes determining hazardous drivers for mass movements related to deglaciation effects. The Vilcanota River still lacks more profound hydrological studies. It is likely that its peak water has already been or might be crossed in near-future. This has strong implications for the still at 0.9% (2.2%) annually growing population of the Cusco department (Cusco city). People mostly
The mitochondrial genome’s non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been define...
Goodstein, David; Batra, Sajeev; Carlson, Joseph; Hayes, Richard; Phillips, Jeremy; Shu, Shengqiang; Schmutz, Jeremy; Rokhsar, Daniel
The Dept. of Energy Joint Genome Institute is a genomics user facility supporting DOE mission science in the areas of Bioenergy, Carbon Cycling, and Biogeochemistry. The Plant Program at the JGI applies genomic, analytical, computational and informatics platforms and methods to: 1. Understand and accelerate the improvement (domestication) of bioenergy crops 2. Characterize and moderate plant response to climate change 3. Use comparative genomics to identify constrained elements and infer gene function 4. Build high quality genomic resource platforms of JGI Plant Flagship genomes for functional and experimental work 5. Expand functional genomic resources for Plant Flagship genomes
Noda, Hiroaki; Kawai, Sawako; Koizumi, Yoko; Matsui, Kageaki; Zhang, Qiang; Furukawa, Shigetoyo; Shimomura, Michihiko; Mita, Kazuei
The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST) analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest.
Full Text Available Abstract Background The brown planthopper (BPH, Nilaparvata lugens (Hemiptera, Delphacidae, is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. Results More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. Conclusion The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest.
Wangler, Michael F.; Hu, Yanhui
ABSTRACT Human genome-wide association studies (GWAS) have successfully identified thousands of susceptibility loci for common diseases with complex genetic etiologies. Although the susceptibility variants identified by GWAS usually have only modest effects on individual disease risk, they contribute to a substantial burden of trait variation in the overall population. GWAS also offer valuable clues to disease mechanisms that have long proven to be elusive. These insights could lead the way to breakthrough treatments; however, several challenges hinder progress, making innovative approaches to accelerate the follow-up of results from GWAS an urgent priority. Here, we discuss the largely untapped potential of the fruit fly, Drosophila melanogaster, for functional investigation of findings from human GWAS. We highlight selected examples where strong genomic conservation with humans along with the rapid and powerful genetic tools available for flies have already facilitated fine mapping of association signals, elucidated gene mechanisms, and revealed novel disease-relevant biology. We emphasize current research opportunities in this rapidly advancing field, and present bioinformatic analyses that systematically explore the applicability of Drosophila for interrogation of susceptibility signals implicated in more than 1000 human traits, based on all GWAS completed to date. Thus, our discussion is targeted at both human geneticists seeking innovative strategies for experimental validation of findings from GWAS, as well as the Drosophila research community, by whom ongoing investigations of the implicated genes will powerfully inform our understanding of human disease. PMID:28151408
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.
Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
I-Min A Chen
Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.
Brall, Caroline; Maeckelberghe, Els; Porz, Rouven; Makhoul, Jihad; Schröder-Bäck, Peter
Research ethics anew gained importance due to the changing scientific landscape and increasing demands and competition in the academic field. These changes are further exaggerated because of scarce(r) resources in some countries on the one hand and advances in genomics on the other. In this paper, we will highlight the current challenges thereof to scientific integrity. To mark key developments in research ethics, we will distinguish between what we call research ethics 1.0 and research ethics 2.0. Whereas research ethics 1.0 focuses on individual integrity and informed consent, research ethics 2.0 entails social scientific integrity within a broader perspective of a research network. This research network can be regarded as a network of responsibilities in which every stakeholder involved has to jointly meet the ethical challenges posed to research. © 2017 S. Karger AG, Basel.
Knies, David; Wittmüß, Philipp; Appel, Sebastian; Sawodny, Oliver; Ederer, Michael; Feuer, Ronny
The coccolithophorid unicellular alga Emiliania huxleyi is known to form large blooms, which have a strong effect on the marine carbon cycle. As a photosynthetic organism, it is subjected to a circadian rhythm due to the changing light conditions throughout the day. For a better understanding of the metabolic processes under these periodically-changing environmental conditions, a genome-scale model based on a genome reconstruction of the E. huxleyi strain CCMP 1516 was created. It comprises 410 reactions and 363 metabolites. Biomass composition is variable based on the differentiation into functional biomass components and storage metabolites. The model is analyzed with a flux balance analysis approach called diurnal flux balance analysis (diuFBA) that was designed for organisms with a circadian rhythm. It allows storage metabolites to accumulate or be consumed over the diurnal cycle, while keeping the structure of a classical FBA problem. A feature of this approach is that the production and consumption of storage metabolites is not defined externally via the biomass composition, but the result of optimal resource management adapted to the diurnally-changing environmental conditions. The model in combination with this approach is able to simulate the variable biomass composition during the diurnal cycle in proximity to literature data.
Full Text Available The coccolithophorid unicellular alga Emiliania huxleyi is known to form large blooms, which have a strong effect on the marine carbon cycle. As a photosynthetic organism, it is subjected to a circadian rhythm due to the changing light conditions throughout the day. For a better understanding of the metabolic processes under these periodically-changing environmental conditions, a genome-scale model based on a genome reconstruction of the E. huxleyi strain CCMP 1516 was created. It comprises 410 reactions and 363 metabolites. Biomass composition is variable based on the differentiation into functional biomass components and storage metabolites. The model is analyzed with a flux balance analysis approach called diurnal flux balance analysis (diuFBA that was designed for organisms with a circadian rhythm. It allows storage metabolites to accumulate or be consumed over the diurnal cycle, while keeping the structure of a classical FBA problem. A feature of this approach is that the production and consumption of storage metabolites is not defined externally via the biomass composition, but the result of optimal resource management adapted to the diurnally-changing environmental conditions. The model in combination with this approach is able to simulate the variable biomass composition during the diurnal cycle in proximity to literature data.
Toxoplasma gondii is an important protozoan parasite that infects all warm-blooded animals and causes opportunistic infections in immuno-compromised humans. Its closest relative, Neospora caninum, is an important veterinary pathogen that causes spontaneous abortion in livestock. Comparative genomics of these two closely related coccidians has been of particular interest to identify genes that contribute to varied host cell specificity and disease. Here, we describe a manual evaluation of these genomes based on strand-specific RNA sequencing and shotgun proteomics from the invasive tachyzoite stages of these two parasites. We have corrected predicted structures of over one third of the previously annotated gene models and have annotated untranslated regions (UTRs) in over half of the predicted protein-coding genes. We observe distinctly long UTRs in both the organisms, almost four times longer than other model eukaryotes. We have also identified a putative set of cis-natural antisense transcripts (cis-NATs) and long intergenic non-coding RNAs (lincRNAs). We have significantly improved the annotation quality in these genomes that would serve as a manually curated dataset for Toxoplasma and Neospora research communities.
Lu, Yan; Zhang, Chenglin; Wu, Xiaobing; Han, Haitang; Zhao, Yaofeng; Ren, Liming
Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC) library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλ)n. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates. PMID:26901135
Full Text Available Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλn. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates.
Full Text Available Abstract Background Cytokine-activated transcription factors from the STAT (Signal Transducers and Activators of Transcription family control common and context-specific genetic programs. It is not clear to what extent cell-specific features determine the binding capacity of seven STAT members and to what degree they share genetic targets. Molecular insight into the biology of STATs was gained from a meta-analysis of 29 available ChIP-seq data sets covering genome-wide occupancy of STATs 1, 3, 4, 5A, 5B and 6 in several cell types. Results We determined that the genomic binding capacity of STATs is primarily defined by the cell type and to a lesser extent by individual family members. For example, the overlap of shared binding sites between STATs 3 and 5 in T cells is greater than that between STAT5 in T cells and non-T cells. Even for the top 1,000 highly enriched STAT binding sites, ~15% of STAT5 binding sites in mouse female liver are shared by other STATs in different cell types while in T cells ~90% of STAT5 binding sites are co-occupied by STAT3, STAT4 and STAT6. In addition, we identified 116 cis-regulatory modules (CRM, which are recognized by all STAT members across cell types defining a common JAK-STAT signature. Lastly, in liver STAT5 binding significantly coincides with binding of the cell-specific transcription factors HNF4A, FOXA1 and FOXA2 and is associated with cell-type specific gene transcription. Conclusions Our results suggest that genomic binding of STATs is primarily determined by the cell type and further specificity is achieved in part by juxtaposed binding of cell-specific transcription factors.
Full Text Available Abstract Background The human genome carries a high load of proviral-like sequences, called Human Endogenous Retroviruses (HERVs, which are the genomic traces of ancient infections by active retroviruses. These elements are in most cases defective, but open reading frames can still be found for the retroviral envelope gene, with sixteen such genes identified so far. Several of them are conserved during primate evolution, having possibly been co-opted by their host for a physiological role. Results To characterize further their status, we presently sequenced 12 of these genes from a panel of 91 Caucasian individuals. Genomic analyses reveal strong sequence conservation (only two non synonymous Single Nucleotide Polymorphisms [SNPs] for the two HERV-W and HERV-FRD envelope genes, i.e. for the two genes specifically expressed in the placenta and possibly involved in syncytiotrophoblast formation. We further show – using an ex vivo fusion assay for each allelic form – that none of these SNPs impairs the fusogenic function. The other envelope proteins disclose variable polymorphisms, with the occurrence of a stop codon and/or frameshift for most – but not all – of them. Moreover, the sequence conservation analysis of the orthologous genes that can be found in primates shows that three env genes have been maintained in a fully coding state throughout evolution including envW and envFRD. Conclusion Altogether, the present study strongly suggests that some but not all envelope encoding sequences are bona fide genes. It also provides new tools to elucidate the possible role of endogenous envelope proteins as susceptibility factors in a number of pathologies where HERVs have been suspected to be involved.
Cormier, Catherine Y.; Mohr, Stephanie E.; Zuo, Dongmei; Hu, Yanhui; Rolfs, Andreas; Kramer, Jason; Taycher, Elena; Kelley, Fontina; Fiacco, Michael; Turnbull, Greggory; LaBaer, Joshua
The Protein Structure Initiative Material Repository (PSI-MR; http://psimr.asu.edu) provides centralized storage and distribution for the protein expression plasmids created by PSI researchers. These plasmids are a resource that allows the research community to dissect the biological function of proteins whose structures have been identified by the PSI. The plasmid annotation, which includes the full length sequence, vector information and associated publications, is stored in a freely available, searchable database called DNASU (http://dnasu.asu.edu). Each PSI plasmid is also linked to a variety of additional resources, which facilitates cross-referencing of a particular plasmid to protein annotations and experimental data. Plasmid samples can be requested directly through the website. We have also developed a novel strategy to avoid the most common concern encountered when distributing plasmids namely, the complexity of material transfer agreement (MTA) processing and the resulting delays this causes. The Expedited Process MTA, in which we created a network of institutions that agree to the terms of transfer in advance of a material request, eliminates these delays. Our hope is that by creating a repository of expression-ready plasmids and expediting the process for receiving these plasmids, we will help accelerate the accessibility and pace of scientific discovery. PMID:19906724
Full Text Available Abstract Genomics is study of genome which provides large amount of data for which large storage and computation power is needed. These issues are solved by cloud computing that provides various cloud platforms for genomics. These platforms provides many services to user like easy access to data easy sharing and transfer providing storage in hundreds of terabytes more computational power. Some cloud platforms are Google genomics DNAnexus and Globus genomics. Various features of cloud computing to genomics are like easy access and sharing of data security of data less cost to pay for resources but still there are some demerits like large time needed to transfer data less network bandwidth.
Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.
The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934
Szmant Alina M
-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. Conclusion Partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies.
Schwarz, Jodi A; Brokstein, Peter B; Voolstra, Christian; Terry, Astrid Y; Manohar, Chitra F; Miller, David J; Szmant, Alina M; Coffroth, Mary Alice; Medina, Mónica
Hydra magnipapillata. Partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies.
Moolhuijzen, P; Cakir, M; Hunter, A; Schibeci, D; Macgregor, A; Smith, C; Francki, M; Jones, M G K; Appels, R; Bellgard, M
The identification of markers in legume pasture crops, which can be associated with traits such as protein and lipid production, disease resistance, and reduced pod shattering, is generally accepted as an important strategy for improving the agronomic performance of these crops. It has been demonstrated that many quantitative trait loci (QTLs) identified in one species can be found in other plant species. Detailed legume comparative genomic analyses can characterize the genome organization between model legume species (e.g., Medicago truncatula, Lotus japonicus) and economically important crops such as soybean (Glycine max), pea (Pisum sativum), chickpea (Cicer arietinum), and lupin (Lupinus angustifolius), thereby identifying candidate gene markers that can be used to track QTLs in lupin and pasture legume breeding. LegumeDB is a Web-based bioinformatics resource for legume researchers. LegumeDB analysis of Medicago truncatula expressed sequence tags (ESTs) has identified novel simple sequence repeat (SSR) markers (16 tested), some of which have been putatively linked to symbiosome membrane proteins in root nodules and cell-wall proteins important in plant-pathogen defence mechanisms. These novel markers by preliminary PCR assays have been detected in Medicago truncatula and detected in at least one other legume species, Lotus japonicus, Glycine max, Cicer arietinum, and (or) Lupinus angustifolius (15/16 tested). Ongoing research has validated some of these markers to map them in a range of legume species that can then be used to compile composite genetic and physical maps. In this paper, we outline the features and capabilities of LegumeDB as an interactive application that provides legume genetic and physical comparative maps, and the efficient feature identification and annotation of the vast tracks of model legume sequences for convenient data integration and visualization. LegumeDB has been used to identify potential novel cross-genera polymorphic legume
Full Text Available Abstract Background Monoclonal antibodies are used extensively throughout the biomedical sciences for detection of antigens, either in vitro or in vivo. We, for example, have used them for quantitation of proteins on "reverse-phase" protein lysate arrays. For those studies, we quality-controlled > 600 available monoclonal antibodies and also needed to develop precise information on the genes that encode their antigens. Translation among the various protein and gene identifier types proved non-trivial because of one-to-many and many-to-one relationships. To organize the antibody, protein, and gene information, we initially developed a relational database in Filemaker for our own use. When it became apparent that the information would be useful to many other researchers faced with the need to choose or characterize antibodies, we developed it further as AbMiner, a fully relational web-based database under MySQL, programmed in Java. Description AbMiner is a user-friendly, web-based relational database of information on > 600 commercially available antibodies that we validated by Western blot for protein microarray studies. It includes many types of information on the antibody, the immunogen, the vendor, the antigen, and the antigen's gene. Multiple gene and protein identifier types provide links to corresponding entries in a variety of other public databases, including resources for phosphorylation-specific antibodies. AbMiner also includes our quality-control data against a pool of 60 diverse cancer cell types (the NCI-60 and also protein expression levels for the NCI-60 cells measured using our high-density "reverse-phase" protein lysate microarrays for a selection of the listed antibodies. Some other available database resources give information on antibody specificity for one or a couple of cell types. In contrast, the data in AbMiner indicate specificity with respect to the antigens in a pool of 60 diverse cell types from nine different
Major, Sylvia M; Nishizuka, Satoshi; Morita, Daisaku; Rowland, Rick; Sunshine, Margot; Shankavaram, Uma; Washburn, Frank; Asin, Daniel; Kouros-Mehr, Hosein; Kane, David; Weinstein, John N
Monoclonal antibodies are used extensively throughout the biomedical sciences for detection of antigens, either in vitro or in vivo. We, for example, have used them for quantitation of proteins on "reverse-phase" protein lysate arrays. For those studies, we quality-controlled > 600 available monoclonal antibodies and also needed to develop precise information on the genes that encode their antigens. Translation among the various protein and gene identifier types proved non-trivial because of one-to-many and many-to-one relationships. To organize the antibody, protein, and gene information, we initially developed a relational database in Filemaker for our own use. When it became apparent that the information would be useful to many other researchers faced with the need to choose or characterize antibodies, we developed it further as AbMiner, a fully relational web-based database under MySQL, programmed in Java. AbMiner is a user-friendly, web-based relational database of information on > 600 commercially available antibodies that we validated by Western blot for protein microarray studies. It includes many types of information on the antibody, the immunogen, the vendor, the antigen, and the antigen's gene. Multiple gene and protein identifier types provide links to corresponding entries in a variety of other public databases, including resources for phosphorylation-specific antibodies. AbMiner also includes our quality-control data against a pool of 60 diverse cancer cell types (the NCI-60) and also protein expression levels for the NCI-60 cells measured using our high-density "reverse-phase" protein lysate microarrays for a selection of the listed antibodies. Some other available database resources give information on antibody specificity for one or a couple of cell types. In contrast, the data in AbMiner indicate specificity with respect to the antigens in a pool of 60 diverse cell types from nine different tissues of origin. AbMiner is a relational database that
This document is the continuation of the article entitled: “The Expansion of El Coco Coastal Urban Space and Its Relationship with Vulnerability to Pollution of Water Resources, Nicoya Peninsula, Costa Rica,” included in the Central American Geographic Magazine, Issue No.50, I Semester 2013. The conditions of water resources in El Coco urban coastal space are questioned depending on factors, categories, impact indicators, vulnerability ranges, and those involved in the decision-making process...
Full Text Available The vetch (Vicia sativa is one of the most important annual forage legumes globally due to its multiple uses and high nutritional content. Despite these agronomical benefits, many drawbacks, including cyano-alanine toxin, has reduced the agronomic value of vetch varieties. Here, we used 454 technology to sequence the two V. sativa subspecies (ssp. sativa and ssp. nigra to enrich functional information and genetic marker resources for the vetch research community. A total of 86,532 and 47,103 reads produced 35,202 and 18,808 unigenes with average lengths of 735 and 601 bp for V. sativa sativa and V. sativa nigra, respectively. Gene Ontology annotations and the cluster of orthologous gene classes were used to annotate the function of the Vicia transcriptomes. The Vicia transcriptome sequences were then mined for simple sequence repeat (SSR and single nucleotide polymorphism (SNP markers. About 13% and 3% of the Vicia unigenes contained the putative SSR and SNP sequences, respectively. Among those SSRs, 100 were chosen for the validation and the polymorphism test using the Vicia germplasm set. Thus, our approach takes advantage of the utility of transcriptomic data to expedite a vetch breeding program.
Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Kim, Chang-Kug; Seol, Young-Joo; Perumal, Sampath; Lee, Jonghoon; Waminal, Nomar Espinosa; Jayakodi, Murukarthick; Lee, Sang-Choon; Jin, Seungwoo; Choi, Beom-Soon; Yu, Yeisoo; Ko, Ho-Cheol; Choi, Ji-Weon; Ryu, Kyoung-Yul; Sohn, Seong-Han; Parkin, Isobel; Yang, Tae-Jin
The concept of U's triangle, which revealed the importance of polyploidization in plant genome evolution, described natural allopolyploidization events in Brassica using three diploids [B. rapa (A genome), B. nigra (B), and B. oleracea (C)] and derived allotetraploids [B. juncea (AB genome), B. napus (AC), and B. carinata (BC)]. However, comprehensive understanding of Brassica genome evolution has not been fully achieved. Here, we performed low-coverage (2-6×) whole-genome sequencing of 28 accessions of Brassica as well as of Raphanus sativus [R genome] to explore the evolution of six Brassica species based on chloroplast genome and ribosomal DNA variations. Our phylogenomic analyses led to two main conclusions. (1) Intra-species-level chloroplast genome variations are low in the three allotetraploids (2~7 SNPs), but rich and variable in each diploid species (7~193 SNPs). (2) Three allotetraploids maintain two 45SnrDNA types derived from both ancestral species with maternal dominance. Furthermore, this study sheds light on the maternal origin of the AC chloroplast genome. Overall, this study clarifies the genetic relationships of U's triangle species based on a comprehensive genomics approach and provides important genomic resources for correlative and evolutionary studies.
... 18 Conservation of Power and Water Resources 2 2010-04-01 2010-04-01 false Comprehensive plan. 801... POLICIES § 801.5 Comprehensive plan. (a) The Compact requires that the Commission formulate and adopt a comprehensive plan for the immediate and long-range development and use of the water resources of the basin. (1...
... Care Genomic Medicine Working Group New Horizons and Research Patient Management Policy and Ethics Issues Quick Links for Patient Care Education All About the Human Genome Project Fact Sheets Genetic Education Resources for ...
Yu, Hong; Soler, Marçal; San Clemente, Hélène; Mila, Isabelle; Paiva, Jorge A P; Myburg, Alexander A; Bouzayen, Mondher; Grima-Pettenati, Jacqueline; Cassan-Wang, Hua
Auxin plays a pivotal role in various plant growth and development processes, including vascular differentiation. The modulation of auxin responsiveness through the auxin perception and signaling machinery is believed to be a major regulatory mechanism controlling cambium activity and wood formation. To gain more insights into the roles of key Aux/IAA gene regulators of the auxin response in these processes, we identified and characterized members of the Aux/IAA family in the genome of Eucalyptus grandis, a tree of worldwide economic importance. We found that the gene family in Eucalyptus is slightly smaller than that in Populus and Arabidopsis, but all phylogenetic groups are represented. High-throughput expression profiling of different organs and tissues highlighted several Aux/IAA genes expressed in vascular cambium and/or developing xylem, some showing differential expression in response to developmental (juvenile vs. mature) and/or to environmental (tension stress) cues. Based on the expression profiles, we selected a promising candidate gene, EgrIAA4, for functional characterization. We showed that EgrIAA4 protein is localized in the nucleus and functions as an auxin-responsive repressor. Overexpressing a stabilized version of EgrIAA4 in Arabidopsis dramatically impeded plant growth and fertility and induced auxin-insensitive phenotypes such as inhibition of primary root elongation, lateral root emergence and agravitropism. Interestingly, the lignified secondary walls of the interfascicular fibers appeared very late, whereas those of the xylary fibers were virtually undetectable, suggesting that EgrIAA4 may play crucial roles in fiber development and secondary cell wall deposition. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: email@example.com.
Zimmerman, Rebekah S; Jalas, Chaim; Tao, Xin; Fedick, Anastasia M; Kim, Julia G; Pepe, Russell J; Northrop, Lesley E; Scott, Richard T; Treff, Nathan R
To develop a novel and robust protocol for multifactorial preimplantation genetic testing of trophectoderm biopsies using quantitative polymerase chain reaction (qPCR). Prospective and blinded. Not applicable. Couples indicated for preimplantation genetic diagnosis (PGD). None. Allele dropout (ADO) and failed amplification rate, genotyping consistency, chromosome screening success rate, and clinical outcomes of qPCR-based screening. The ADO frequency on a single cell from a fibroblast cell line was 1.64% (18/1,096). When two or more cells were tested, the ADO frequency dropped to 0.02% (1/4,426). The rate of amplification failure was 1.38% (55/4,000) overall, with 2.5% (20/800) for single cells and 1.09% (35/3,200) for samples that had two or more cells. Among 152 embryos tested in 17 cases by qPCR-based PGD and CCS, 100% were successfully given a diagnosis, with 0% ADO or amplification failure. Genotyping consistency with reference laboratory results was >99%. Another 304 embryos from 43 cases were included in the clinical application of qPCR-based PGD and CCS, for which 99.7% (303/304) of the embryos were given a definitive diagnosis, with only 0.3% (1/304) having an inconclusive result owing to recombination. In patients receiving a transfer with follow-up, the pregnancy rate was 82% (27/33). This study demonstrates that the use of qPCR for PGD testing delivers consistent and more reliable results than existing methods and that single gene disorder PGD can be run concurrently with CCS without the need for additional embryo biopsy or whole genome amplification. Copyright © 2016 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Full Text Available The Hsp20 genes are present in all plant species and play important roles in alleviating heat stress and enhancing plant thermotolerance by preventing the irreversible aggregation of denaturing proteins. However, very little is known about the CaHsp20 gene family in pepper (Capsicum annuum L., an important vegetable crop with character of temperate but thermosensitive. In this study, a total of 35 putative pepper Hsp20 genes (CaHsp20s were identified and renamed on the basis of their molecular weight, and then their gene structure, genome location, gene duplication, phylogenetic relationship and interaction network were also analyzed. The expression patterns of CaHsp20 genes in four different tissues (root, stem, leaf and flower from the thermotolerant line R9 under heat stress condition were measured using semi-quantitative RT-PCR. The transcripts of most CaHsp20 genes maintained a low level in all of the four tissues under normal temperature condition, but were highly induced by heat stress, while the expression of CaHsp16.6b, 16.7 and 23.8 were only detected in specific tissues and were not so sensitive to heat stress like other CaHsp20 genes. In addition, compared to those in thermotolerant line R9, the expression peak of most CaHsp20 genes in thermosensitive line B6 under heat stress was hysteretic, and several CaHsp20 genes (CaHsp16.4, 18.2a, 18.7, 21.2, 22.0, 25.8 and 25.9 showed higher expression levels in both line B6 and R9. These data suggest that the CaHsp20 genes may be involved in heat stress and defense responses in pepper, which provides the basis for further functional analyses of CaHsp20s in the formation of pepper acquired thermotoleance.
Saunders, Rebecca E; Instrell, Rachael; Rispoli, Rossella; Jiang, Ming; Howell, Michael
High-throughput screening (HTS) uses technologies such as RNA interference to generate loss-of-function phenotypes on a genomic scale. As these technologies become more popular, many research institutes have established core facilities of expertise to deal with the challenges of large-scale HTS experiments. As the efforts of core facility screening projects come to fruition, focus has shifted towards managing the results of these experiments and making them available in a useful format that can be further mined for phenotypic discovery. The HTS-DB database provides a public view of data from screening projects undertaken by the HTS core facility at the CRUK London Research Institute. All projects and screens are described with comprehensive assay protocols, and datasets are provided with complete descriptions of analysis techniques. This format allows users to browse and search data from large-scale studies in an informative and intuitive way. It also provides a repository for additional measurements obtained from screens that were not the focus of the project, such as cell viability, and groups these data so that it can provide a gene-centric summary across several different cell lines and conditions. All datasets from our screens that can be made available can be viewed interactively and mined for further hit lists. We believe that in this format, the database provides researchers with rapid access to results of large-scale experiments that might facilitate their understanding of genes/compounds identified in their own research. DATABASE URL: http://hts.cancerresearchuk.org/db/public.
Full Text Available Pigeonpea is an important pulse crop grown predominantly in the tropical and sub-tropical regions of the world. Although pigeonpea growing area has considerably increased, yield has remained stagnant for the last six decades mainly due to the exposure of the crop to various biotic and abiotic constraints. In addition, low level of genetic variability and limited genomic resources have been serious impediments to pigeonpea crop improvement through modern breeding approaches. In recent years, however, due to the availability of next generation sequencing and high-throughput genotyping technologies, the scenario has changed tremendously. The reduced sequencing costs resulting in the decoding of the pigeonpea genome has led to the development of various genomic resources including molecular markers, transcript sequences and comprehensive genetic maps. Mapping of some important traits including resistance to Fusarium wilt and sterility mosaic disease, fertility restoration, determinacy with other agronomically important traits have paved the way for applying genomics-assisted breeding (GAB through marker assisted selection as well as genomic selection. This would lead to accelerate the development and improvement of both varieties and hybrids in pigeonpea. Particularly for hybrid breeding programme, mitochondrial genomes of cytoplasmic male sterile lines, maintainers and hybrids have also been sequenced to identify genes responsible for cytoplasmic male sterility. Furthermore, several diagnostic molecular markers have been developed to assess the purity of commercial hybrids. In summary, pigeonpea has become a genomic resources-rich crop and efforts have already been initiated to integrate these resources in pigeonpea breeding.
Manichaikul, Ani; Hoffman, Eric A.; Smolonska, Joanna; Gao, Wei; Cho, Michael H.; Baumhauer, Heather; Budoff, Matthew; Austin, John H. M.; Washko, George R.; Carr, J. Jeffrey; Kaufman, Joel D.; Pottinger, Tess; Powell, Charles A.; Wijmenga, Cisca; Zanen, Pieter; Groen, Harry J. M.; Postma, Dirkje S.; Wanner, Adam; Rouhani, Farshid N.; Brantly, Mark L.; Powell, Rhea; Smith, Benjamin M.; Rabinowitz, Dan; Raffel, Leslie J.; Hinckley Stukovsky, Karen D.; Crapo, James D.; Beaty, Terri H.; Hokanson, John E.; Silverman, Edwin K.; Dupuis, Josée; O’Connor, George T.; Boezen, H. Marike; Rich, Stephen S.
Rationale: Pulmonary emphysema overlaps partially with spirometrically defined chronic obstructive pulmonary disease and is heritable, with moderately high familial clustering. Objectives: To complete a genome-wide association study (GWAS) for the percentage of emphysema-like lung on computed tomography in the Multi-Ethnic Study of Atherosclerosis (MESA) Lung/SNP Health Association Resource (SHARe) Study, a large, population-based cohort in the United States. Methods: We determined percent emphysema and upper-lower lobe ratio in emphysema defined by lung regions less than −950 HU on cardiac scans. Genetic analyses were reported combined across four race/ethnic groups: non-Hispanic white (n = 2,587), African American (n = 2,510), Hispanic (n = 2,113), and Chinese (n = 704) and stratified by race and ethnicity. Measurements and Main Results: Among 7,914 participants, we identified regions at genome-wide significance for percent emphysema in or near SNRPF (rs7957346; P = 2.2 × 10−8) and PPT2 (rs10947233; P = 3.2 × 10−8), both of which replicated in an additional 6,023 individuals of European ancestry. Both single-nucleotide polymorphisms were previously implicated as genes influencing lung function, and analyses including lung function revealed independent associations for percent emphysema. Among Hispanics, we identified a genetic locus for upper-lower lobe ratio near the α-mannosidase–related gene MAN2B1 (rs10411619; P = 1.1 × 10−9; minor allele frequency [MAF], 4.4%). Among Chinese, we identified single-nucleotide polymorphisms associated with upper-lower lobe ratio near DHX15 (rs7698250; P = 1.8 × 10−10; MAF, 2.7%) and MGAT5B (rs7221059; P = 2.7 × 10−8; MAF, 2.6%), which acts on α-linked mannose. Among African Americans, a locus near a third α-mannosidase–related gene, MAN1C1 (rs12130495; P = 9.9 × 10−6; MAF, 13.3%) was associated with percent emphysema. Conclusions: Our results suggest that some genes previously identified as
Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is
In Kazakhstan during the period of transition to the market economy in the condition of reduction of coal production and increasing expenditures in coal branch, the problem of of the rational utilization of coal resources becomes the most vital issue. In the thesis theoretical and methodological aspects of socio-economic efficiency of utilization of the fuel and energetic resources are investigated. Different fields of usage of coal and coal wastes are studied, economic evaluation of mechanic and thermo-chemical methods of producing coal in process of bringing resources saving technologies; the national efficiency of using products in the quantity of technological raw and energetic fuel is brought out; the influence refining for the widening of the raw-base of industry, promoting the economic results of production and the lowering environmental pollution. It was estimated that the extracted coal of the region includes 1020 thousand tonne of aluminium oxide and 996 thousand tonne of sulphur; in the course of extracting and coal processing 3650 thousand tonne of firm wastes appeared; during the extracting of Ehkibastuz coal - 90970 thousand tonne, and the Karaganda coal - 40040 thousand tonne.The coal components and wastes mentioned above should be considered not only as source of environment pollution but also as potential resource for the production of industrial goods according to their qualitative characteristics and the availability of technical ideas of the processing. The implementation of the mentioned pre-sup-positions in the conditions of the forming market economy will allow to use the organic part of coal more competently, to involve the other useful components of coal in the sphere of production consumption, to utilize gaseous and firm wastes and to gain of the basis the expansion of resource base of same branches of industry and the reduction of environment pollution. It will be also accompanied by the needs in capital investments for the industrial
Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping
Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483
Santiago Moreno, J.; Lopez Sebastian, A.; Castano, C.; Coloma, M. A.; Gomez Brunet, A.; Toledano Diaz, A.; Prieto, M. T.; Campo, J. L.
Semen was collected from 10 Black Castellana roosters and the classic sperm variables (ejaculate volume, sperm concentration and sperm motility) examined. In addition, the hypo-osmotic swelling test was used to investigate sperm cell membrane integrity, and acidic aniline blue staining used to screen for morphological abnormalities (including acrosome integrity) and to examine the condensation status of the chromatin. The latter was also examined by Gram staining. Large and small semen volumes were associated high and low sperm concentrations respectively (R2=0.04, P<0.05). The percentage of motile spermatozoa correlated strongly with the percentage of sperm cells showing an intact acrosome (R2=0.13, P<0.001) and with the percentage of morphologically normal spermatozoa (R2=0.04, P<0.05). The percentage of Gram positive spermatozoa was positively correlated with semen appearance (R2=0.12, P<0.05), sperm cell concentration (R2=0.13, P<0.05), and with the sperm motility variables studied (R2=0.14, P<0.05 for percentage mobility, and R2=0.12, P<0.05 for quality of movement). Only three of the 10 roosters, all with fertilisation potentials of 80-90%, were considered potential sperm donors for genome resource banking purposes. The remaining birds were all of low fertility (. 50%); in fact, some produced semen volumes too small to perform fertility tests. Semen volume and membrane integrity were found to be the best variables for predicting the fertilisation potential of rooster ejaculates. (Author) 37 refs.
U.S. Department of Health & Human Services — MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human...
International research cooperation project. Assessment report on the R and D of the comprehensive development/utilization technology of energy of gas hydrate resource; Gas hydrate shigen no energy sogo kaihatsu riyo gijutsu no kenkyu kaihatsu hyoka hokokusho
As to 'the R and D of the comprehensive development/utilization technology of gas hydrate resource,' assessment was conducted and reported from an aspect of the third party. This R and D is a timely project being aimed at establishing the basic technology on gas hydrate from both aspects of fundamental research and practical research. In the development of gas hydrate resource in the tundra zone, the development of measuring methods for thermal conductivity and dielectric constants advanced the establishment of a guide for exploration and possibilities of assessment of the resource amount. In the development/production, it can be said that the knowledge/information collected by exchanging methane in gas hydrate with CO2 means no needs for new supply of heat and also contributes to the isolation of CO2. As to the utilization technology, the results were rated very high also internationally of tackling the quantitative evaluation method at molecular levels of the gas included in hydrate using Raman spectroscopy to establish the industrial gas separation method using the low-temperature environment in the tundra zone. (NEDO)
Report on final evaluation of industrial science and technology research and development system. Comprehensive basic technologies for development of ocean resources. Manganese nodule exploitation system; Kaiyo shigen sogo kiban gijutsu (mangan dankai saiko system). Saishu hyoka hokokusho
Described herein are the final evaluation results of the basic research and development of the system for exploiting manganese nodules as one of ocean resources. A 9-year project was started in the FY 1981 to establish the techniques to efficiently, economically exploit Mn nodules on a commercial basis, which are occurring on deep sea bottoms (4,000 to 6,000 m deep), in order to stably supply non-ferrous metallic resources, e.g., Ni, Cu, Co and Mn, which are essential for economic activities of Japan. Originally, the UN convention related to ocean laws raised development of unique exploitation techniques as the prerequisite condition for obtaining the right to develop Mn nodules. However, the situations around development of Mn nodules were changed since then, to devalue objects, significance and urgency of this project. The fourth amendment of the basic plans decided to suspend the comprehensive ocean tests in 1996, and to implement only the ocean/land tests in which part of the individual elementary techniques were combined. Therefore, the technological validation of the overall system could not be done sufficiently, and degree of achievement of the project is low, viewed from insufficient prospects of the commercial production. However, this project produced good results in individual elementary techniques, which are of significance for the resources policies. (NEDO)
The deep-sea manganese nodule mining system in this report collects efficiently and continuously a large quantity of manganese nodules in existence in the 5,000m-deep sea floor. The aim of the project is to develop and build an experimental system for a real mining machine and to perform a comprehensive marine test to find out if such a real machine is technologically and economically feasible. The system under this project is divided into four systems, which are a nodule mining system, nodule lifting system, nodule handling system, and a measurement control system. The nodule mining system travels on the sea bottom efficiently collecting manganese nodules and forwarding them into the nodule lifting system. Only a few systems of this kind have so far been developed, however, and therefore much endeavor needs to be exerted for the development of technologies involved. The nodule lifting system is divided into a pump lift unit, air lift unit, and a nodule lifting pipe. The pump lift unit and air lift unit elevate manganese nodules supplied from the nodule collecting unit to the sea surface. The nodule lifting pipe provides a passage for nodules to run through upward. (NEDO)
Full Text Available Abstract Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project and TIGR (The Institute for Genomic Research genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of
Hagen, J; Lee, E F; Fairlie, W D; Kalinna, B H
As research on parasitic helminths is moving into the post-genomic era, an enormous effort is directed towards deciphering gene function and to achieve gene annotation. The sequences that are available in public databases undoubtedly hold information that can be utilized for new interventions and control but the exploitation of these resources has until recently remained difficult. Only now, with the emergence of methods to genetically manipulate and transform parasitic worms will it be possible to gain a comprehensive understanding of the molecular mechanisms involved in nutrition, metabolism, developmental switches/maturation and interaction with the host immune system. This review focuses on functional genomics approaches in parasitic helminths that are currently used, to highlight potential applications of these technologies in the areas of cell biology, systems biology and immunobiology of parasitic helminths. © 2011 Blackwell Publishing Ltd.
Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.
The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293
Manichaikul, Ani; Hoffman, Eric A.; Smolonska, Joanna; Gao, Wei; Cho, Michael H.; Baumhauer, Heather; Budoff, Matthew; Austin, John H. M.; Washko, George R.; Carr, J. Jeffrey; Kaufman, Joel D.; Pottinger, Tess; Powell, Charles A.; Wijmenga, Cisca; Zanen, Pieter; Groen, Harry J.M.; Postma, Dirkje S.; Wanner, Adam; Rouhani, Farshid N.; Brantly, Mark L.; Powell, Rhea; Smith, Benjamin M.; Rabinowitz, Dan; Raffel, Leslie J.; Stukovsky, Karen D. Hinckley; Crapo, James D.; Beaty, Terri H.; Hokanson, John E.; Silverman, Edwin K.; Dupuis, Josee; O'Connor, George T.; Boezen, Hendrika; Rich, Stephen S.; Barr, R. Graham
Rationale: Pulmonary emphysema overlaps partially with spirometrically defined chronic obstructive pulmonary disease and is heritable, with moderately high familial clustering. Objectives: To complete a genome-wide association study (GWAS) for the percentage of emphysema-like lung on computed
Report on comprehensive surveys of nationwide geothermal resources in fiscal 1979. Conceptual design of a database system; 1979 nendo zenkoku chinetsu shigen sogo chosa hokokusho. Database system gainen sekkei
Conceptual design was made on a database system as part of the comprehensive surveys of nationwide geothermal resources. Underground hot water in depths of several kilometers close to the ground surface is a utilizable geothermal energy. Exploration using the ground surface survey is much less expensive than the test drilling survey, but has greater error in estimation because of being an indirect method. However, integrating data by freely using a number of exploration methods can improve the accuracy of estimation on the whole. In performing the conceptual design of a geothermal resource information system, the functions of this large scale database were used as the framework. Further data collection, distribution and interactive type man-machine communication, modeling, and environment surveillance functions were incorporated. Considerations were also given on further diversified utilization patterns and on support to users in remote areas and end users. What is important in designing the system is that constituting elements of hardware and software should function while being combined organically as one system, rather than the elements work independently. In addition, sufficient expandability and flexibility are indispensable. (NEDO)
Hybrid Capture-Based Comprehensive Genomic Profiling Identifies Lung Cancer Patients with Well-Characterized Sensitizing Epidermal Growth Factor Receptor Point Mutations That Were Not Detected by Standard of Care Testing.
Suh, James H; Schrock, Alexa B; Johnson, Adrienne; Lipson, Doron; Gay, Laurie M; Ramkissoon, Shakti; Vergilio, Jo-Anne; Elvin, Julia A; Shakir, Abdur; Ruehlman, Peter; Reckamp, Karen L; Ou, Sai-Hong Ignatius; Ross, Jeffrey S; Stephens, Philip J; Miller, Vincent A; Ali, Siraj M
In our recent study, of cases positive for epidermal growth factor receptor ( EGFR ) exon 19 deletions using comprehensive genomic profiling (CGP), 17/77 (22%) patients with prior standard of care (SOC) EGFR testing results available were previously negative for exon 19 deletion. Our aim was to compare the detection rates of CGP versus SOC testing for well-characterized sensitizing EGFR point mutations (pm) in our 6,832-patient cohort. DNA was extracted from 40 microns of formalin-fixed paraffin-embedded sections from 6,832 consecutive cases of non-small cell lung cancer (NSCLC) of various histologies (2012-2015). CGP was performed using a hybrid capture, adaptor ligation-based next-generation sequencing assay to a mean coverage depth of 576×. Genomic alterations (pm, small indels, copy number changes and rearrangements) involving EGFR were recorded for each case and compared with prior testing results if available. Overall, there were 482 instances of EGFR exon 21 L858R (359) and L861Q (20), exon 18 G719X (73) and exon 20 S768I (30) pm, of which 103 unique cases had prior EGFR testing results that were available for review. Of these 103 cases, CGP identified 22 patients (21%) with sensitizing EGFR pm that were not detected by SOC testing, including 9/75 (12%) patients with L858R, 4/7 (57%) patients with L861Q, 8/20 (40%) patients with G719X, and 4/7 (57%) patients with S768I pm (some patients had multiple EGFR pm). In cases with available clinical data, benefit from small molecule inhibitor therapy was observed. CGP, even when applied to low tumor purity clinical-grade specimens, can detect well-known EGFR pm in NSCLC patients that would otherwise not be detected by SOC testing. Taken together with EGFR exon 19 deletions, over 20% of patients who are positive for EGFR -activating mutations using CGP are previously negative by SOC EGFR mutation testing, suggesting that thousands of such patients per year in the U.S. alone could experience improved clinical
Lu, Jianguo; Peatman, Eric; Yang, Qing; Wang, Shaolin; Hu, Zhiliang; Reecy, James; Kucuktas, Huseyin; Liu, Zhanjiang
The catfish genome database, cBARBEL (abbreviated from catfish Breeder And Researcher Bioinformatics Entry Location) is an online open-access database for genome biology of ictalurid catfish (Ictalurus spp.). It serves as a comprehensive, integrative platform for all aspects of catfish genetics, genomics and related data resources. cBARBEL provides BLAST-based, fuzzy and specific search functions, visualization of catfish linkage, physical and integrated maps, a catfish EST contig viewer with SNP information overlay, and GBrowse-based organization of catfish genomic data based on sequence similarity with zebrafish chromosomes. Subsections of the database are tightly related, allowing a user with a sequence or search string of interest to navigate seamlessly from one area to another. As catfish genome sequencing proceeds and ongoing quantitative trait loci (QTL) projects bear fruit, cBARBEL will allow rapid data integration and dissemination within the catfish research community and to interested stakeholders. cBARBEL can be accessed at http://catfishgenome.org.
Richardson, Sylvia; Tseng, George C.; Sun, Wei
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531
Bolger, Marie; Gundlach, Heidrun; Scholz, Uwe; Mayer, Klaus; Usadel, Björn; Schwacke, Rainer; Schmutzer, Thomas; Chen, Jinbo; Arend, Daniel; Oppermann, Markus; Weise, Stephan; Lange, Matthias; Fiorani, Fabio; Spannagl, Manuel
Recent advances in sequencing technologies have greatly accelerated the rate of plant genome and applied breeding research. Despite this advancing trend, plant genomes continue to present numerous difficulties to the standard tools and pipelines not only for genome assembly but also gene annotation and downstream analysis.Here we give a perspective on tools, resources and services necessary to assemble and analyze plant genomes and link them to plant phenotypes.
Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi
This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.
This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.
Vaez Barzani, Ahmad
In this thesis we present an overview of bioinformatics-based approaches for genomic association mapping, with emphasis on human quantitative traits and their contribution to complex diseases. We aim to provide a comprehensive walk-through of the classic steps of genomic association mapping
Full Text Available Lepidoptera, butterflies and moths, is the second largest animal order and includes numerous agricultural pests. To facilitate comparative genomics in Lepidoptera, we isolated BAC clones containing conserved and putative single-copy genes from libraries of three pests, Heliothis virescens, Ostrinia nubilalis, and Plutella xylostella, harboring the haploid chromosome number, =31, which are not closely related with each other or with the silkworm, Bombyx mori, (=28, the sequenced model lepidopteran. A total of 108–184 clones representing 101–182 conserved genes were isolated for each species. For 79 genes, clones were isolated from more than two species, which will be useful as common markers for analysis using fluorescence in situ hybridization (FISH, as well as for comparison of genome sequence among multiple species. The PCR-based clone isolation method presented here is applicable to species which lack a sequenced genome but have a significant collection of cDNA or EST sequences.
Chen, Wei-Hua; van Noort, Vera; Lluch-Senar, Maria; Hennrich, Marco L.; H. Wodke, Judith A.; Yus, Eva; Alibés, Andreu; Roma, Guglielmo; Mende, Daniel R.; Pesavento, Christina; Typas, Athanasios; Gavin, Anne-Claude; Serrano, Luis; Bork, Peer
We developed a comprehensive resource for the genome-reduced bacterium Mycoplasma pneumoniae comprising 1748 consistently generated ‘-omics’ data sets, and used it to quantify the power of antisense non-coding RNAs (ncRNAs), lysine acetylation, and protein phosphorylation in predicting protein abundance (11%, 24% and 8%, respectively). These factors taken together are four times more predictive of the proteome abundance than of mRNA abundance. In bacteria, post-translational modifications (PTMs) and ncRNA transcription were both found to increase with decreasing genomic GC-content and genome size. Thus, the evolutionary forces constraining genome size and GC-content modify the relative contributions of the different regulatory layers to proteome homeostasis, and impact more genomic and genetic features than previously appreciated. Indeed, these scaling principles will enable us to develop more informed approaches when engineering minimal synthetic genomes. PMID:26773059
Verma, Mohit; Kumar, Vinay; Patel, Ravi K.; Garg, Rohini; Jain, Mukesh
Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database fea...
Comprehensive Hard Materials deals with the production, uses and properties of the carbides, nitrides and borides of these metals and those of titanium, as well as tools of ceramics, the superhard boron nitrides and diamond and related compounds. Articles include the technologies of powder production (including their precursor materials), milling, granulation, cold and hot compaction, sintering, hot isostatic pressing, hot-pressing, injection moulding, as well as on the coating technologies for refractory metals, hard metals and hard materials. The characterization, testing, quality assurance and applications are also covered. Comprehensive Hard Materials provides meaningful insights on materials at the leading edge of technology. It aids continued research and development of these materials and as such it is a critical information resource to academics and industry professionals facing the technological challenges of the future. Hard materials operate at the leading edge of technology, and continued res...
... Care Genomic Medicine Working Group New Horizons and Research Patient Management Policy and Ethics Issues Quick Links for Patient Care Education All About the Human Genome Project Fact Sheets Genetic Education Resources for ...
Modern logistics comprises operative logistics, analytical logistics and management of logistic networks. Central task of operative logistics is the efficient supply of required goods at the right place within the right time. Tasks of analytical logistics are designing optimal networks and systems, developing strategies for planning, scheduling and operation, and organizing efficient order and performance processes. Logistic management plans, implements and operates logistic networks and schedules orders, stocks and resources. This reference-book offers a unique survey of modern logistics. It contains proven strategies, rules and tools for the solution of a multitude of logistic problems. The analytically derived algorithms and formulas can be used for the computer-based planning of logistic systems and for the dynamic scheduling of orders and resources in supply networks. They enable significant improvements of performance, quality and costs. Their application is demonstrated by several examples from industr...
Chan, Esther T; Cherry, J Michael
The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org. DATABASE URL: http://www.yeastgenome.org.
Gao, Yangchun; Li, Shiguo; Zhan, Aibin
Invasive species cause huge damages to ecology, environment and economy globally. The comprehensive understanding of invasion mechanisms, particularly genetic bases of micro-evolutionary processes responsible for invasion success, is essential for reducing potential damages caused by invasive species. The golden star tunicate, Botryllus schlosseri, has become a model species in invasion biology, mainly owing to its high invasiveness nature and small well-sequenced genome. However, the genome-wide genetic markers have not been well developed in this highly invasive species, thus limiting the comprehensive understanding of genetic mechanisms of invasion success. Using restriction site-associated DNA (RAD) tag sequencing, here we developed a high-quality resource of 14,119 out of 158,821 SNPs for B. schlosseri. These SNPs were relatively evenly distributed at each chromosome. SNP annotations showed that the majority of SNPs (63.20%) were located at intergenic regions, and 21.51% and 14.58% were located at introns and exons, respectively. In addition, the potential use of the developed SNPs for population genomics studies was primarily assessed, such as the estimate of observed heterozygosity (H O ), expected heterozygosity (H E ), nucleotide diversity (π), Wright's inbreeding coefficient (F IS ) and effective population size (Ne). Our developed SNP resource would provide future studies the genome-wide genetic markers for genetic and genomic investigations, such as genetic bases of micro-evolutionary processes responsible for invasion success.
Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel
The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.
Full Text Available During the meeting in Arlington, USA in 2001, the scientists grouped in PROMUSA agreed with the launching of the Global Musa Genomics Consortium. The Consortium aims to apply genomics technologies to the improvement of this important crop. These genome projects put banana as the third model species after Arabidopsis and rice that will be analyzed and sequenced. Comparing to Arabidopsis and rice, banana genome provides a unique and powerful insight into structural and in functional genomics that could not be found in those two species. This paper discussed these subjects-including the importance of banana as the fourth main food in the world, the evolution and biodiversity of this genetic resource and its parasite.
Complete mitochondrial genomes and nuclear ribosomal RNA operons of two species of Diplostomum (Platyhelminthes: Trematoda): a molecular resource for taxonomy and molecular epidemiology of important fish pathogens
Brabec, Jan; Kostadinova, Aneta; Scholz, Tomáš; Littlewood, D. T. J.
Roč. 8, JUN 19 2015 (2015), s. 336 ISSN 1756-3305 R&D Projects: GA MŠk(CZ) EE2.3.30.0032; GA ČR(CZ) GA15-14198S Grant - others:GA MŠk(CZ) LM2010005 Institutional support: RVO:60077344 Keywords : Diplostomum (Platyhelminthes: Trematoda) * fish pathogens * mitochondrial genome * ribosomal RNA * illumina next-generation sequencing * phylogeny Subject RIV: EG - Zoology Impact factor: 3.234, year: 2015
Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V
The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of 'index' orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Dembowski, Jill A; DeLuca, Neal A
provides a comprehensive view of how HSV-1 selectively utilizes cellular resources.
Jill A Dembowski
during infection and provides a comprehensive view of how HSV-1 selectively utilizes cellular resources.
Full Text Available Niche adaptation has long been recognized to drive intra-species differentiation and speciation, yet knowledge about its relatedness with hereditary variation of microbial genomes is relatively limited. Using Leptospirillum ferriphilum species as a case study, we present a detailed analysis of genomic features of five recognized strains. Genome-to-genome distance calculation preliminarily determined the roles of spatial distance and environmental heterogeneity that potentially contribute to intra-species variation within L. ferriphilum species at the genome level. Mathematical models were further constructed to extrapolate the expansion of L. ferriphilum genomes (an ‘open’ pan-genome, indicating the emergence of novel genes with new sequenced genomes. The identification of diverse mobile genetic elements (MGEs (such as transposases, integrases, and phage-associated genes revealed the prevalence of horizontal gene transfer events, which is an important evolutionary mechanism that provides avenues for the recruitment of novel functionalities and further for the genetic divergence of microbial genomes. Comprehensive analysis also demonstrated that the genome reduction by gene loss in a broad sense might contribute to the observed diversification. We thus inferred a plausible explanation to address this observation: the community-dependent adaptation that potentially economizes the limiting resources of the entire community. Now that the introduction of new genes is accompanied by a parallel abandonment of some other ones, our results provide snapshots on the biological fitness cost of environmental adaptation within the L. ferriphilum genomes. In short, our genome-wide analyses bridge the relation between genetic variation of L. ferriphilum with its evolutionary adaptation.
Droc, Gaëtan; Larivière, Delphine; Guignon, Valentin; Yahiaoui, Nabila; This, Dominique; Garsmeur, Olivier; Dereeper, Alexis; Hamelin, Chantal; Argout, Xavier; Dufayard, Jean-François; Lengelle, Juliette; Baurens, Franc-Christophe; Cenci, Alberto; Pitollat, Bertrand; D’Hont, Angélique; Ruiz, Manuel; Rouard, Mathieu; Bocs, Stéphanie
Banana is one of the world’s favorite fruits and one of the most important crops for developing countries. The banana reference genome sequence (Musa acuminata) was recently released. Given the taxonomic position of Musa, the completed genomic sequence has particular comparative value to provide fresh insights about the evolution of the monocotyledons. The study of the banana genome has been enhanced by a number of tools and resources that allows harnessing its sequence. First, we set up essential tools such as a Community Annotation System, phylogenomics resources and metabolic pathways. Then, to support post-genomic efforts, we improved banana existing systems (e.g. web front end, query builder), we integrated available Musa data into generic systems (e.g. markers and genetic maps, synteny blocks), we have made interoperable with the banana hub, other existing systems containing Musa data (e.g. transcriptomics, rice reference genome, workflow manager) and finally, we generated new results from sequence analyses (e.g. SNP and polymorphism analysis). Several uses cases illustrate how the Banana Genome Hub can be used to study gene families. Overall, with this collaborative effort, we discuss the importance of the interoperability toward data integration between existing information systems. Database URL: http://banana-genome.cirad.fr/ PMID:23707967
Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A
The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.
Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B
The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.
Full Text Available The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.We developed a data warehouse system (INDIGO that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments.We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.
Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E
The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Full Text Available Rice ( L. is the leading genomics system among the crop plants. The sequence of the rice genome, the first cereal plant genome, was published in 2005. This review summarizes progress made in rice genome annotations, comparative genomics, and functional genomics researches. It also maps out the status of rice genomics globally and provides a vision of future research directions and resource building.
Full Text Available Abstract Background Genome-wide screening in human and mouse cells using RNA interference and open reading frame over-expression libraries is rapidly becoming a viable experimental approach for many research labs. There are a variety of gene expression modulation libraries commercially available, however, detailed and validated protocols as well as the reagents necessary for deconvolving genome-scale gene screens using these libraries are lacking. As a solution, we designed a comprehensive platform for highly multiplexed functional genetic screens in human, mouse and yeast cells using popular, commercially available gene modulation libraries. The Gene Modulation Array Platform (GMAP is a single microarray-based detection solution for deconvolution of loss and gain-of-function pooled screens. Results Experiments with specially constructed lentiviral-based plasmid pools containing ~78,000 shRNAs demonstrated that the GMAP is capable of deconvolving genome-wide shRNA "dropout" screens. Further experiments with a larger, ~90,000 shRNA pool demonstrate that equivalent results are obtained from plasmid pools and from genomic DNA derived from lentivirus infected cells. Parallel testing of large shRNA pools using GMAP and next-generation sequencing methods revealed that the two methods provide valid and complementary approaches to deconvolution of genome-wide shRNA screens. Additional experiments demonstrated that GMAP is equivalent to similar microarray-based products when used for deconvolution of open reading frame over-expression screens. Conclusion Herein, we demonstrate four major applications for the GMAP resource, including deconvolution of pooled RNAi screens in cells with at least 90,000 distinct shRNAs. We also provide detailed methodologies for pooled shRNA screen readout using GMAP and compare next-generation sequencing to GMAP (i.e. microarray based deconvolution methods.
Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W
The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.
Hall, R.D.; Beale, M.; Fiehn, O.; Hardy, N.; Summer, L.; Bino, R.
After the establishment of technologies for high-throughput DNA sequencing (genomics), gene expression analysis (transcriptomics), and protein analysis (proteomics), the remaining functional genomics challenge is that of metabolomics. Metabolomics is the term coined for essentially comprehensive,
Archibald, A.L.; Bottema, C.D.; Brauning, R.; Burgess, S.C.; Burt, D.W.; Casas, E.; Cheng, H.H.; Clarke, L.; Couldrey, C.; Dalrymple, B.P.; Elsik, C.G.; Foissac, S.; Giuffra, E.; Groenen, M.A.M.; Hayes, B.J.; Huang, L.S.; Khatib, H.; Kijas, J.W.; Kim, H.; Lunney, J.K.; McCarthy, F.M.; McEwan, J.; Moore, S.; Nanduri, B.; Notredame, C.; Palti, Y.; Plastow, G.S.; Reecy, J.M.; Rohrer, G.; Sarropoulou, E.; Schmidt, C.J.; Silverstein, J.; Tellam, R.L.; Tixier-Boichard, M.; Tosser-klopp, G.; Tuggle, C.K.; Vilkki, J.; White, S.N.; Zhao, S.; Zhou, H.
We describe the organization of a nascent international effort, the Functional Annotation of Animal Genomes (FAANG) project, whose aim is to produce comprehensive maps of functional elements in the genomes of domesticated animal species.
Anthon, Christian; Tafer, Hakim; Havgaard, Jakob H
BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However......, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome...
Rodent malaria parasites (RMPs) serve as tractable models for experimental genetics, and as valuable tools to study malaria parasite biology and host-parasitevector interactions. Plasmodium vinckei, one of four RMPs adapted to laboratory mice, is the most geographically widespread species and displays considerable phenotypic and genotypic diversity amongst its subspecies and strains. The phenotypes and genotypes of P. vinckei isolates have been relatively less characterized compared to other RMPs, hampering its use as an experimental model for malaria. Here, we have studied the phenotypes and sequenced the genomes and transcriptomes of ten P. vinckei isolates including representatives of all five subspecies, all of which were collected from wild thicket rats (Thamnomys rutilans) in sub-Saharan Central Africa between the late 1940s and mid 1960s. We have generated a comprehensive resource for P. vinckei comprising of five high-quality reference genomes, growth profiles and genotypes of P. vinckei isolates, and expression profiles of genes across the intra-erythrocytic developmental stages of the parasite. We observe significant phenotypic and genotypic diversity among P. vinckei isolates, making them particularly suitable for classical genetics and genomics-driven studies on malaria parasite biology. As part of a proof of concept study, we have shown that experimental genetic crosses can be performed between P. vinckei parasites to potentially identify genotype-phenotype relationships. We have also shown that they are amenable to genetic manipulation in the laboratory.
Stano, Matej; Klucar, Lubos
phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright Â© 2011 Elsevier Inc. All rights reserved.
Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong
The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Oct 20, 2014 ... 1Repository of Tomato Genomics Resources, Department of Plant Sciences, School .... Due to its position at the crossroads of Sanger's sequencing .... replacement for the microarray-based expression profiling. .... during RNA fragmentation step prior to library construction, ...... tomato pollen as a test case.
Wordley, Claire; Slate, Jon; Stapley, Jessica
Online sequence databases can provide valuable resources for the development of cross-species genetic markers. In particular, mining expressed tag sequences (EST) for microsatellites and developing conserved cross-species microsatellite markers can provide a rapid and relatively inexpensive method to develop new markers for a range of species. Here, we adopt this approach to develop cross-species microsatellite markers in Anolis lizards, which is a model genus in evolutionary biology and ecology. Using EST sequences from Anolis carolinensis, we identified 127 microsatellites that satisfied our criteria, and tested 49 of these in five species of Anolis (carolinensis, distichus, apletophallus, porcatus and sagrei). We identified between 8 and 25 new variable genetic markers for five Anolis species. These markers will be a valuable resource for studies of population genetics, comparative mapping, mating systems, behavioural ecology and adaptive radiations in this diverse lineage. © 2010 Blackwell Publishing Ltd.
Molineris, I.; Sales, G.
The amount of information about genomes, both in the form of complete sequences and annotations, has been exponentially increasing in the last few years. As a result there is the need for tools providing a graphical representation of such information that should be comprehensive and intuitive. Visual representation is especially important in the comparative genomics field since it should provide a combined view of data belonging to different genomes. We believe that existing tools are limited in this respect as they focus on a single genome at a time (conservation histograms) or compress alignment representation to a single dimension. We have therefore developed a web-based tool called Comparative Genome Viewer (Cgv): it integrates a bidimensional representation of alignments between two regions, both at small and big scales, with the richness of annotations present in other genome browsers. We give access to our system through a web-based interface that provides the user with an interactive representation that can be updated in real time using the mouse to move from region to region and to zoom in on interesting details.
Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang
Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research a...
Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V
The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
Uhlén, Mathias; Hallström, Björn M.; Lindskog, Cecilia
a framework for defining the molecular constituents of the human body as well as for generating comprehensive lists of proteins expressed across tissues or in a tissue-restricted manner. Here, we review publicly available human transcriptome resources and discuss body-wide data from independent genome......Quantifying the differential expression of genes in various human organs, tissues, and cell types is vital to understand human physiology and disease. Recently, several large-scale transcriptomics studies have analyzed the expression of protein-coding genes across tissues. These datasets provide...
Saina, Josphat K; Gichira, Andrew W; Li, Zhi-Zhong; Hu, Guang-Wan; Wang, Qing-Feng; Liao, Kuo
The plant chloroplast (cp) genome is a highly conserved structure which is beneficial for evolution and systematic research. Currently, numerous complete cp genome sequences have been reported due to high throughput sequencing technology. However, there is no complete chloroplast genome of genus Dodonaea that has been reported before. To better understand the molecular basis of Dodonaea viscosa chloroplast, we used Illumina sequencing technology to sequence its complete genome. The whole length of the cp genome is 159,375 base pairs (bp), with a pair of inverted repeats (IRs) of 27,099 bp separated by a large single copy (LSC) 87,204 bp, and small single copy (SSC) 17,972 bp. The annotation analysis revealed a total of 115 unique genes of which 81 were protein coding, 30 tRNA, and four ribosomal RNA genes. Comparative genome analysis with other closely related Sapindaceae members showed conserved gene order in the inverted and single copy regions. Phylogenetic analysis clustered D. viscosa with other species of Sapindaceae with strong bootstrap support. Finally, a total of 249 SSRs were detected. Moreover, a comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates in D. viscosa showed very low values. The availability of cp genome reported here provides a valuable genetic resource for comprehensive further studies in genetic variation, taxonomy and phylogenetic evolution of Sapindaceae family. In addition, SSR markers detected will be used in further phylogeographic and population structure studies of the species in this genus.
Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas
Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. PMID:25324309
Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas
Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
DeLong, Edward F
The complete genome sequence of Thermoplasma acidophilum, an acid- and heat-loving archaeon, has recently been reported. Comparative genomic analysis of this 'extremophile' is providing new insights into the metabolic machinery, ecology and evolution of thermophilic archaea.
Fahy, E.; Subramaniam, S.; Brown, H.A.; Glass, C.K.; Merrill, A.H.; Murphy, R.C.; Raetz, C.R.H.; Russell, D.W.; Seyama, Y.; Shaw, W.; Shimizu, T.; Spener, F.; van Meer, G.|info:eu-repo/dai/nl/068570368; VanNieuwenhze, M.S.; White, S.H.|info:eu-repo/dai/nl/304843539; Witztum, J.; Dennis, E.A.
Lipids are produced, transported, and recognized by the concerted actions of numerous enzymes, binding proteins, and receptors. A comprehensive analysis of lipid molecules, “lipidomics,” in the context of genomics and proteomics is crucial to understanding cellular physiology and pathology;
Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya
For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...
... Facts for Families Guide Facts for Families - Vietnamese Comprehensive Psychiatric Evaluation No. 52; Updated October 2017 Evaluation ... with serious emotional and behavioral problems need a comprehensive psychiatric evaluation. Comprehensive psychiatric evaluations usually require a ...
Segers, P.C.J.; Segers, E.; Broek, P. van den
The present chapter gives an overview of the literature on hypertext comprehension, children's hypertext comprehension and individual variation therein, ending with a perspective for future research. Hypertext comprehension requires the reader to make bridging inferences between the different parts
Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong
Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner
Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...
Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.
Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID
Trent Harold F
Full Text Available Abstract Background The Marine Genomics project is a functional genomics initiative developed to provide a pipeline for the curation of Expressed Sequence Tags (ESTs and gene expression microarray data for marine organisms. It provides a unique clearing-house for marine specific EST and microarray data and is currently available at http://www.marinegenomics.org. Description The Marine Genomics pipeline automates the processing, maintenance, storage and analysis of EST and microarray data for an increasing number of marine species. It currently contains 19 species databases (over 46,000 EST sequences that are maintained by registered users from local and remote locations in Europe and South America in addition to the USA. A collection of analysis tools are implemented. These include a pipeline upload tool for EST FASTA file, sequence trace file and microarray data, an annotative text search, automated sequence trimming, sequence quality control (QA/QC editing, sequence BLAST capabilities and a tool for interactive submission to GenBank. Another feature of this resource is the integration with a scientific computing analysis environment implemented by MATLAB. Conclusion The conglomeration of multiple marine organisms with integrated analysis tools enables users to focus on the comprehensive descriptions of transcriptomic responses to typical marine stresses. This cross species data comparison and integration enables users to contain their research within a marine-oriented data management and analysis environment.
Decision making for water resource planning is often related to social, economic and environmental factors. There are various methods for making decisions about water resource planning alternatives and measures with various shortcomings. A comprehensive entropy weight observability-controllability risk analysis ...
Jo, Jihoon; Oh, Jooseong; Lee, Hyun-Gwan; Hong, Hyun-Hee; Lee, Sung-Gwon; Cheon, Seongmin; Kern, Elizabeth M A; Jin, Soyeong; Cho, Sung-Jin; Park, Joong-Ki; Park, Chungoo
The Japanese sea cucumber (Apostichopus japonicus Selenka 1867) is an economically important species as a source of seafood and ingredient in traditional medicine. It is mainly found off the coasts of northeast Asia. Recently, substantial exploitation and widespread biotic diseases in A. japonicus have generated increasing conservation concern. However, the genomic knowledge base and resources available for researchers to use in managing this natural resource and to establish genetically based breeding systems for sea cucumber aquaculture are still in a nascent stage. A total of 312 Gb of raw sequences were generated using the Illumina HiSeq 2000 platform and assembled to a final size of 0.66 Gb, which is about 80.5% of the estimated genome size (0.82 Gb). We observed nucleotide-level heterozygosity within the assembled genome to be 0.986%. The resulting draft genome assembly comprising 132 607 scaffolds with an N50 value of 10.5 kb contains a total of 21 771 predicted protein-coding genes. We identified 6.6-14.5 million heterozygous single nucleotide polymorphisms in the assembled genome of the three natural color variants (green, red, and black), resulting in an estimated nucleotide diversity of 0.00146. We report the first draft genome of A. japonicus and provide a general overview of the genetic variation in the three major color variants of A. japonicus. These data will help provide a comprehensive view of the genetic, physiological, and evolutionary relationships among color variants in A. japonicus, and will be invaluable resources for sea cucumber genomic research. © The Author 2017. Published by Oxford University Press.
A comprehensive analysis of three Asiatic black bear mitochondrial genomes (subspecies ussuricus, formosanus and mupinensis), with emphasis on the complete mtDNA sequence of Ursus thibetanus ussuricus (Ursidae).
Hwang, Dae-Sik; Ki, Jang-Seu; Jeong, Dong-Hyuk; Kim, Bo-Hyun; Lee, Bae-Keun; Han, Sang-Hoon; Lee, Jae-Seong
In the present paper, we describe the mitochondrial genome sequence of the Asiatic black bear (Ursus thibetanus ussuricus) with particular emphasis on the control region (CR), and compared with mitochondrial genomes on molecular relationships among the bears. The mitochondrial genome sequence of U. thibetanus ussuricus was 16,700 bp in size with mostly conserved structures (e.g. 13 protein-coding, two rRNA genes, 22 tRNA genes). The CR consisted of several typical conserved domains such as F, E, D, and C boxes, and a conserved sequence block. Nucleotide sequences and the repeated motifs in the CR were different among the bear species, and their copy numbers were also variable according to populations, even within F1 generations of U. thibetanus ussuricus. Comparative analyses showed that the CR D1 region was highly informative for the discrimination of the bear family. These findings suggest that nucleotide sequences of both repeated motifs and CR D1 in the bear family are good markers for species discriminations.
Myelodysplastic syndromes (MDS) are characterized by clonal proliferation of hematopoietic stem/progenitor cells and their apoptosis, and show a propensity to progress to acute myelogenous leukemia (AML). Although MDS are recognized as neoplastic diseases caused by genomic aberrations of hematopoietic cells, the details of the genetic abnormalities underlying disease development have not as yet been fully elucidated due to difficulties in analyzing chromosomal abnormalities. Recent advances in comprehensive analyses of disease genomes including whole-genome sequencing technologies have revealed the genomic abnormalities in MDS. Surprisingly, gene mutations were found in approximately 80-90% of cases with MDS, and the novel mutations discovered with these technologies included previously unknown, MDS-specific, mutations such as those of the genes in the RNA-splicing machinery. It is anticipated that these recent studies will shed new light on the pathophysiology of MDS due to genomic aberrations.
Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan
mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...
Ambrosini, Giovanna; Dreos, René; Kumar, Sunil; Bucher, Philipp
ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .
Kiper, Ilkser Erdem; Bloomer, Paulette; Borsa, Philippe; Hoareau, Thierry Bernard
Rabbitfishes are reef-associated fishes that support local fisheries throughout the Indo-West Pacific region. Sound management of the resource requires the development of molecular tools for appropriate stock delimitation of the different species in the family. Microsatellite markers were developed for the cordonnier, Siganus sutor, and their potential for cross-amplification was investigated in 12 congeneric species. A library of 792 repeat-containing sequences was built. Nineteen sets of newly developed primers, and 14 universal finfish microsatellites were tested in S. sutor. Amplification success of the 19 Siganus-specific markers ranged from 32 to 79% in the 12 other Siganus species, slightly decreasing when the genetic distance of the target species to S. sutor increased. Seventeen of these markers were polymorphic in S. sutor and were further assayed in S. luridus, S. rivulatus, and S. spinus, of which respectively 9, 10 and 8 were polymorphic. Statistical power analysis and an analysis of molecular variance showed that subtle genetic differentiation can be detected using these markers, highlighting their utility for the study of genetic diversity and population genetic structure in rabbitfishes.
the proposed project : 1. To continue to acquire a comprehensive understanding of prostate cancer genomics . 2. To develop an understanding of... Genetics I • ECEV 35901 Evolutionary Genomics • Fundamentals of Clinical Research • HGEN 47400 Introduction to Probability and Statistics for Geneticists...Marc Gillard,2 David M. Hatcher,5 Westin R. Tom,5 Walter M. Stadler2 and Kevin P. White1,2,3 1Institute for Genomics and Systems Biology , Departments of
Lewis, Nathan E; Liu, Xin; Li, Yuxiang
stymied by the lack of a unifying genomic resource for CHO cells. Here we report a 2.4-Gb draft genome sequence of a female Chinese hamster, Cricetulus griseus, harboring 24,044 genes. We also resequenced and analyzed the genomes of six CHO cell lines from the CHO-K1, DG44 and CHO-S lineages...
Hickey, Glenn; Paten, Benedict; Earl, Dent; Zerbino, Daniel; Haussler, David
Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. email@example.com or firstname.lastname@example.org Supplementary data are available at Bioinformatics online.
Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong
Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species.
Jang, Yeongjun; Choi, Taekjin; Kim, Jongho; Park, Jisub; Seo, Jihae; Kim, Sangok; Kwon, Yeajee; Lee, Seungjae; Lee, Sanghyuk
Increasing affordability of next-generation sequencing (NGS) has created an opportunity for realizing genomically-informed personalized cancer therapy as a path to precision oncology. However, the complex nature of genomic information presents a huge challenge for clinicians in interpreting the patient's genomic alterations and selecting the optimum approved or investigational therapy. An elaborate and practical information system is urgently needed to support clinical decision as well as to test clinical hypotheses quickly. Here, we present an integrated clinical and genomic information system (CGIS) based on NGS data analyses. Major components include modules for handling clinical data, NGS data processing, variant annotation and prioritization, drug-target-pathway analysis, and population cohort explorer. We built a comprehensive knowledgebase of genes, variants, drugs by collecting annotated information from public and in-house resources. Structured reports for molecular pathology are generated using standardized terminology in order to help clinicians interpret genomic variants and utilize them for targeted cancer therapy. We also implemented many features useful for testing hypotheses to develop prognostic markers from mutation and gene expression data. Our CGIS software is an attempt to provide useful information for both clinicians and scientists who want to explore genomic information for precision oncology.
Hugenholtz, Philip; Skarshewski, Adam; Parks, Donovan H
Reconstructing the complete evolutionary history of extant life on our planet will be one of the most fundamental accomplishments of scientific endeavor, akin to the completion of the periodic table, which revolutionized chemistry. The road to this goal is via comparative genomics because genomes are our most comprehensive and objective evolutionary documents. The genomes of plant and animal species have been systematically targeted over the past decade to provide coverage of the tree of life. However, multicellular organisms only emerged in the last 550 million years of more than three billion years of biological evolution and thus comprise a small fraction of total biological diversity. The bulk of biodiversity, both past and present, is microbial. We have only scratched the surface in our understanding of the microbial world, as most microorganisms cannot be readily grown in the laboratory and remain unknown to science. Ground-breaking, culture-independent molecular techniques developed over the past 30 years have opened the door to this so-called microbial dark matter with an accelerating momentum driven by exponential increases in sequencing capacity. We are on the verge of obtaining representative genomes across all life for the first time. However, historical use of morphology, biochemical properties, behavioral traits, and single-marker genes to infer organismal relationships mean that the existing highly incomplete tree is riddled with taxonomic errors. Concerted efforts are now needed to synthesize and integrate the burgeoning genomic data resources into a coherent universal tree of life and genome-based taxonomy. Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved.
Robert J. Redden
Full Text Available Pea (Pisum sativum L. was the original model organism used in Mendel’s discovery (1866 of the laws of inheritance, making it the foundation of modern plant genetics. However, subsequent progress in pea genomics has lagged behind many other plant species. Although the size and repetitive nature of the pea genome has so far restricted its sequencing, comprehensive genomic and post genomic resources already exist. These include BAC libraries, several types of molecular marker sets, both transcriptome and proteome datasets and mutant populations for reverse genetics. The availability of the full genome sequences of three legume species has offered significant opportunities for genome wide comparison revealing synteny and co-linearity to pea. A combination of a candidate gene and colinearity approach has successfully led to the identification of genes underlying agronomically important traits including virus resistances and plant architecture. Some of this knowledge has already been applied to marker assisted selection (MAS programs, increasing precision and shortening the breeding cycle. Yet, complete translation of marker discovery to pea breeding is still to be achieved. Molecular analysis of pea collections has shown that although substantial variation is present within the cultivated genepool, wild material offers the possibility to incorporate novel traits that may have been inadvertently eliminated. Association mapping analysis of diverse pea germplasm promises to identify genetic variation related to desirable agronomic traits, which are historically difficult to breed for in a traditional manner. The availability of high throughput ‘omics’ methodologies offers great promise for the development of novel, highly accurate selective breeding tools for improved pea genotypes that are sustainable under current and future climates and farming systems.
Sauer, Sascha; Konthur, Zoltán; Lehrach, Hans
The problems we face today in public health as a result of the -- fortunately -- increasing age of people and the requirements of developing countries create an urgent need for new and innovative approaches in medicine and in agronomics. Genomic and functional genomic approaches have a great potential to at least partially solve these problems in the future. Important progress has been made by procedures to decode genomic information of humans, but also of other key organisms. The basic comprehension of genomic information (and its transfer) should now give us the possibility to pursue the next important step in life science eventually leading to a basic understanding of biological information flow; the elucidation of the function of all genes and correlative products encoded in the genome, as well as the discovery of their interactions in a molecular context and the response to environmental factors. As a result of the sequencing projects, we are now able to ask important questions about sequence variation and can start to comprehensively study the function of expressed genes on different levels such as RNA, protein or the cell in a systematic context including underlying networks. In this article we review and comment on current trends in large-scale systematic biological research. A particular emphasis is put on technology developments that can provide means to accomplish the tasks of future lines of functional genomics.
Umen, James G.; Olson, Bradley J.S.C.
Volvocine algae are a group of chlorophytes that together comprise a unique model for evolutionary and developmental biology. The species Chlamydomonas reinhardtii and Volvox carteri represent extremes in morphological diversity within the Volvocine clade. Chlamydomonas is unicellular and reflects the ancestral state of the group, while Volvox is multicellular and has evolved numerous innovations including germ-soma differentiation, sexual dimorphism, and complex morphogenetic patterning. The Chlamydomonas genome sequence has shed light on several areas of eukaryotic cell biology, metabolism and evolution, while the Volvox genome sequence has enabled a comparison with Chlamydomonas that reveals some of the underlying changes that enabled its transition to multicellularity, but also underscores the subtlety of this transition. Many of the tools and resources are in place to further develop Volvocine algae as a model for evolutionary genomics. PMID:25883411
Home; Journals; Journal of Genetics; Online Resources. Journal of Genetics. Online Resources. Volume 97. 2018 | Online resources. Volume 96. 2017 | Online resources. Volume 95. 2016 | Online resources. Volume 94. 2015 | Online resources. Volume 93. 2014 | Online resources. Volume 92. 2013 | Online resources ...
Full Text Available Aquaporins (Aqps are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication.In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplicati