WorldWideScience

Sample records for integrating genomic indel

  1. An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

    Science.gov (United States)

    Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

    2017-10-06

    Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.

  2. Restricted DCJ-indel model: sorting linear genomes with DCJ and indels

    Science.gov (United States)

    2012-01-01

    Background The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-indel model that considers DCJ and indels. Results In the present work we consider the restricted DCJ-indel model for sorting linear genomes with unequal contents. We allow DCJ operations and indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-indel distance. Conclusions We have given a tight upper bound for the restricted DCJ-indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-indel distances are equal remains open. PMID:23281630

  3. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

    Science.gov (United States)

    Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael

    2015-01-01

    With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710

  4. On Sorting Genomes with DCJ and Indels

    Science.gov (United States)

    Braga, Marília D. V.

    A previous work of Braga, Willing and Stoye compared two genomes with unequal content, but without duplications, and presented a new linear time algorithm to compute the genomic distance, considering double cut and join (DCJ) operations, insertions and deletions. Here we derive from this approach an algorithm to sort one genome into another one also using DCJ, insertions and deletions. The optimal sorting scenarios can have different compositions and we compare two types of sorting scenarios: one that maximizes and one that minimizes the number of DCJ operations with respect to the number of insertions and deletions.

  5. Indel Group in Genomes (IGG) Molecular Genetic Markers1[OPEN

    Science.gov (United States)

    Burkart-Waco, Diana; Kuppu, Sundaram; Britt, Anne; Chetelat, Roger

    2016-01-01

    Genetic markers are essential when developing or working with genetically variable populations. Indel Group in Genomes (IGG) markers are primer pairs that amplify single-locus sequences that differ in size for two or more alleles. They are attractive for their ease of use for rapid genotyping and their codominant nature. Here, we describe a heuristic algorithm that uses a k-mer-based approach to search two or more genome sequences to locate polymorphic regions suitable for designing candidate IGG marker primers. As input to the IGG pipeline software, the user provides genome sequences and the desired amplicon sizes and size differences. Primer sequences flanking polymorphic insertions/deletions are produced as output. IGG marker files for three sets of genomes, Solanum lycopersicum/Solanum pennellii, Arabidopsis (Arabidopsis thaliana) Columbia-0/Landsberg erecta-0 accessions, and S. lycopersicum/S. pennellii/Solanum tuberosum (three-way polymorphic) are included. PMID:27436831

  6. Identification of genomic indels and structural variations using split reads

    Directory of Open Access Journals (Sweden)

    Urban Alexander E

    2011-07-01

    Full Text Available Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC, a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read. All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions. A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models. This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions. We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole

  7. Indel-II region deletion sizes in the white spot syndrome virus genome correlate with shrimp disease outbreaks in southern Vietnam

    NARCIS (Netherlands)

    Tran Thi Tuyet, H.; Zwart, M.P.; Phuong, N.T.; Oanh, D.T.H.; Jong, de M.C.M.; Vlak, J.M.

    2012-01-01

    Sequence comparisons of the genomes of white spot syndrome virus (WSSV) strains have identified regions containing variable-length insertions/deletions (i.e. indels). Indel-I and Indel-II, positioned between open reading frames (ORFs) 14/15 and 23/24, respectively, are the largest and the most

  8. Genome-wide indel markers shared by diverse Asian rice cultivars compared to Japanese rice cultivar ?Koshihikari?

    OpenAIRE

    Yonemaru, Jun-ichi; Choi, Sun Hee; Sakai, Hiroaki; Ando, Tsuyu; Shomura, Ayahiko; Yano, Masahiro; Wu, Jianzhong; Fukuoka, Shuichi

    2015-01-01

    Insertion-deletion (indel) polymorphisms, such as simple sequence repeats, have been widely used as DNA markers to identify QTLs and genes and to facilitate rice breeding. Recently, next-generation sequencing has produced deep sequences that allow genome-wide detection of indels. These polymorphisms can potentially be used to develop high-accuracy polymerase chain reaction (PCR)-based markers. Here, re-sequencing of 5 indica, 2 aus, and 3 tropical japonica cultivars and Japanese elite cultiva...

  9. ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.

    Science.gov (United States)

    Yang, Rendong; Nelson, Andrew C; Henzler, Christine; Thyagarajan, Bharat; Silverstein, Kevin A T

    2015-12-07

    Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.

  10. Development of novel InDel markers and genetic diversity in Chenopodium quinoa through whole-genome re-sequencing.

    Science.gov (United States)

    Zhang, Tifu; Gu, Minfeng; Liu, Yuhe; Lv, Yuanda; Zhou, Ling; Lu, Haiyan; Liang, Shuaiqiang; Bao, Huabin; Zhao, Han

    2017-09-05

    Quinoa (Chenopodium quinoa Willd.) is a balanced nutritional crop, but its breeding improvement has been limited by the lack of information on its genetics and genomics. Therefore, it is necessary to obtain knowledge on genomic variation, population structure, and genetic diversity and to develop novel Insertion/Deletion (InDel) markers for quinoa by whole-genome re-sequencing. We re-sequenced 11 quinoa accessions and obtained a coverage depth between approximately 7× to 23× the quinoa genome. Based on the 1453-megabase (Mb) assembly from the reference accession Riobamba, 8,441,022 filtered bi-allelic single nucleotide polymorphisms (SNPs) and 842,783 filtered InDels were identified, with an estimated SNP and InDel density of 5.81 and 0.58 per kilobase (kb). From the genomic InDel variations, 85 dimorphic InDel markers were newly developed and validated. Together with the 62 simple sequence repeat (SSR) markers reported, a total of 147 markers were used for genotyping the 129 quinoa accessions. Molecular grouping analysis showed classification into two major groups, the Andean highland (composed of the northern and southern highland subgroups) and Chilean coastal, based on combined STRUCTURE, phylogenetic tree and PCA (Principle Component Analysis) analyses. Further analysis of the genetic diversity exhibited a decreasing tendency from the Chilean coast group to the Andean highland group, and the gene flow between subgroups was more frequent than that between the two subgroups and the Chilean coastal group. The majority of the variations (approximately 70%) were found through an analysis of molecular variation (AMOVA) due to the diversity between the groups. This was congruent with the observation of a highly significant F ST value (0.705) between the groups, demonstrating significant genetic differentiation between the Andean highland type of quinoa and the Chilean coastal type. Moreover, a core set of 16 quinoa germplasms that capture all 362 alleles was

  11. KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses.

    Science.gov (United States)

    Kim, Jungeun; Weber, Jessica A; Jho, Sungwoong; Jang, Jinho; Jun, JeHoon; Cho, Yun Sung; Kim, Hak-Min; Kim, Hyunho; Kim, Yumi; Chung, OkSung; Kim, Chang Geun; Lee, HyeJin; Kim, Byung Chul; Han, Kyudong; Koh, InSong; Chae, Kyun Shik; Lee, Semin; Edwards, Jeremy S; Bhak, Jong

    2018-04-04

    High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.

  12. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  13. Safeguarding genome integrity

    DEFF Research Database (Denmark)

    Sørensen, Claus Storgaard; Syljuåsen, Randi G

    2012-01-01

    Mechanisms that preserve genome integrity are highly important during the normal life cycle of human cells. Loss of genome protective mechanisms can lead to the development of diseases such as cancer. Checkpoint kinases function in the cellular surveillance pathways that help cells to cope with D...

  14. Statistical Methods in Integrative Genomics

    Science.gov (United States)

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  15. Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing

    DEFF Research Database (Denmark)

    Skovgaard, Ole; Bak, Mads; Løbner-Olesen, Anders

    2011-01-01

    a combination of WGS and genome copy number analysis, for the identification of mutations that suppress the growth deficiency imposed by excessive initiations from the Escherichia coli origin of replication, oriC. The E. coli chromosome, like the majority of bacterial chromosomes, is circular, and DNA...... replication is initiated by assembling two replication complexes at the origin, oriC. These complexes then replicate the chromosome bidirectionally toward the terminus, ter. In a population of growing cells, this results in a copy number gradient, so that origin-proximal sequences are more frequent than...... origin-distal sequences. Major rearrangements in the chromosome are, therefore, readily identified by changes in copy number, i.e., certain sequences become over- or under-represented. Of the eight mutations analyzed in detail here, six were found to affect a single gene only, one was a large chromosomal...

  16. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    2014-06-01

    Full Text Available This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels, BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads, or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  17. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    Science.gov (United States)

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  18. Integrating genomics into evolutionary medicine.

    Science.gov (United States)

    Rodríguez, Juan Antonio; Marigorta, Urko M; Navarro, Arcadi

    2014-12-01

    The application of the principles of evolutionary biology into medicine was suggested long ago and is already providing insight into the ultimate causes of disease. However, a full systematic integration of medical genomics and evolutionary medicine is still missing. Here, we briefly review some cases where the combination of the two fields has proven profitable and highlight two of the main issues hindering the development of evolutionary genomic medicine as a mature field, namely the dissociation between fitness and health and the still considerable difficulties in predicting phenotypes from genotypes. We use publicly available data to illustrate both problems and conclude that new approaches are needed for evolutionary genomic medicine to overcome these obstacles. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. The integrated microbial genome resource of analysis.

    Science.gov (United States)

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  20. Genomics Portals: integrative web-platform for mining genomics data.

    Science.gov (United States)

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  1. Genomics Portals: integrative web-platform for mining genomics data

    Directory of Open Access Journals (Sweden)

    Ghosh Krishnendu

    2010-01-01

    Full Text Available Abstract Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc, and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  2. phiGENOME: an integrative navigation throughout bacteriophage genomes.

    Science.gov (United States)

    Stano, Matej; Klucar, Lubos

    2011-11-01

    phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Characterization and potential functional significance of human-chimpanzee large INDEL variation

    Directory of Open Access Journals (Sweden)

    Polavarapu Nalini

    2011-10-01

    Full Text Available Abstract Background Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six million years ago, their genomes are more than 98.5% identical at protein-coding loci. This modest degree of nucleotide divergence is not sufficient to explain the extensive phenotypic differences between the two species. It has been hypothesized that the genetic basis of the phenotypic differences lies at the level of gene regulation and is associated with the extensive insertion and deletion (INDEL variation between the two species. To test the hypothesis that large INDELs (80 to 12,000 bp may have contributed significantly to differences in gene regulation between the two species, we categorized human-chimpanzee INDEL variation mapping in or around genes and determined whether this variation is significantly correlated with previously determined differences in gene expression. Results Extensive, large INDEL variation exists between the human and chimpanzee genomes. This variation is primarily attributable to retrotransposon insertions within the human lineage. There is a significant correlation between differences in gene expression and large human-chimpanzee INDEL variation mapping in genes or in proximity to them. Conclusions The results presented herein are consistent with the hypothesis that large INDELs, particularly those associated with retrotransposons, have played a significant role in human-chimpanzee regulatory evolution.

  4. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  5. Transcription as a Threat to Genome Integrity.

    Science.gov (United States)

    Gaillard, Hélène; Aguilera, Andrés

    2016-06-02

    Genomes undergo different types of sporadic alterations, including DNA damage, point mutations, and genome rearrangements, that constitute the basis for evolution. However, these changes may occur at high levels as a result of cell pathology and trigger genome instability, a hallmark of cancer and a number of genetic diseases. In the last two decades, evidence has accumulated that transcription constitutes an important natural source of DNA metabolic errors that can compromise the integrity of the genome. Transcription can create the conditions for high levels of mutations and recombination by its ability to open the DNA structure and remodel chromatin, making it more accessible to DNA insulting agents, and by its ability to become a barrier to DNA replication. Here we review the molecular basis of such events from a mechanistic perspective with particular emphasis on the role of transcription as a genome instability determinant.

  6. Integrating genomics into undergraduate nursing education.

    Science.gov (United States)

    Daack-Hirsch, Sandra; Dieter, Carla; Quinn Griffin, Mary T

    2011-09-01

    To prepare the next generation of nurses, faculty are now faced with the challenge of incorporating genomics into curricula. Here we discuss how to meet this challenge. Steps to initiate curricular changes to include genomics are presented along with a discussion on creating a genomic curriculum thread versus a standalone course. Ideas for use of print material and technology on genomic topics are also presented. Information is based on review of the literature and curriculum change efforts by the authors. In recognition of advances in genomics, the nursing profession is increasing an emphasis on the integration of genomics into professional practice and educational standards. Incorporating genomics into nurses' practices begins with changes in our undergraduate curricula. Information given in didactic courses should be reinforced in clinical practica, and Internet-based tools such as WebQuest, Second Life, and wikis offer attractive, up-to-date platforms to deliver this now crucial content. To provide information that may assist faculty to prepare the next generation of nurses to practice using genomics. © 2011 Sigma Theta Tau International.

  7. GAPIT: genome association and prediction integrated tool.

    Science.gov (United States)

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  8. Fast and sensitive detection of indels induced by precise gene targeting

    DEFF Research Database (Denmark)

    Yang, Zhang; Steentoft, Catharina; Hauge, Camilla

    2015-01-01

    The nuclease-based gene editing tools are rapidly transforming capabilities for altering the genome of cells and organisms with great precision and in high throughput studies. A major limitation in application of precise gene editing lies in lack of sensitive and fast methods to detect...... and characterize the induced DNA changes. Precise gene editing induces double-stranded DNA breaks that are repaired by error-prone non-homologous end joining leading to introduction of insertions and deletions (indels) at the target site. These indels are often small and difficult and laborious to detect...

  9. Genomic integrity and the ageing brain.

    Science.gov (United States)

    Chow, Hei-man; Herrup, Karl

    2015-11-01

    DNA damage is correlated with and may drive the ageing process. Neurons in the brain are postmitotic and are excluded from many forms of DNA repair; therefore, neurons are vulnerable to various neurodegenerative diseases. The challenges facing the field are to understand how and when neuronal DNA damage accumulates, how this loss of genomic integrity might serve as a 'time keeper' of nerve cell ageing and why this process manifests itself as different diseases in different individuals.

  10. Simple Detection of Large InDeLS by DHPLC: The ACE Gene as a Model

    Directory of Open Access Journals (Sweden)

    Renata Guedes Koyama

    2008-01-01

    Full Text Available Insertion-deletion polymorphism (InDeL is the second most frequent type of genetic variation in the human genome. For the detection of large InDeLs, researchers usually resort to either PCR gel analysis or RFLP, but these are time consuming and dependent on human interpretation. Therefore, a more efficient method for genotyping this kind of genetic variation is needed. In this report, we describe a method that can detect large InDeLs by DHPLC (denaturating high-performance liquid chromatography using the angiotensin-converting enzyme (ACE gene I/D polymorphism as a model. The InDeL targeted in this study is characterized by a 288 bp Alu element insertion (I. We used DHPLC at nondenaturating conditions to analyze the PCR product with a flow through the chromatographic column under two different gradients based on the differences between D and I sequences. The analysis described is quick and easy, making this technique a suitable and efficient means for DHPLC users to screen InDeLs in genetic epidemiological studies.

  11. On peculiar Šindel sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2010-01-01

    Roč. 17, č. 2 (2010), s. 129-140 ISSN 0972-5555 R&D Projects: GA AV ČR(CZ) IAA100190803 Institutional research plan: CEZ:AV0Z10190503 Keywords : quadratic residue * Chinese remainder theorem * primitive Šindel sequences * Prague clock sequence Subject RIV: BA - General Mathematics http://www.pphmj.com/abstract/5095.htm

  12. Variant Review with the Integrative Genomics Viewer.

    Science.gov (United States)

    Robinson, James T; Thorvaldsdóttir, Helga; Wenger, Aaron M; Zehir, Ahmet; Mesirov, Jill P

    2017-11-01

    Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV's variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR . ©2017 American Association for Cancer Research.

  13. Population Genomics of Infectious and Integrated Wolbachia pipientis Genomes in Drosophila ananassae

    Science.gov (United States)

    Choi, Jae Young; Bubnell, Jaclyn E.; Aquadro, Charles F.

    2015-01-01

    Coevolution between Drosophila and its endosymbiont Wolbachia pipientis has many intriguing aspects. For example, Drosophila ananassae hosts two forms of W. pipientis genomes: One being the infectious bacterial genome and the other integrated into the host nuclear genome. Here, we characterize the infectious and integrated genomes of W. pipientis infecting D. ananassae (wAna), by genome sequencing 15 strains of D. ananassae that have either the infectious or integrated wAna genomes. Results indicate evolutionarily stable maternal transmission for the infectious wAna genome suggesting a relatively long-term coevolution with its host. In contrast, the integrated wAna genome showed pseudogene-like characteristics accumulating many variants that are predicted to have deleterious effects if present in an infectious bacterial genome. Phylogenomic analysis of sequence variation together with genotyping by polymerase chain reaction of large structural variations indicated several wAna variants among the eight infectious wAna genomes. In contrast, only a single wAna variant was found among the seven integrated wAna genomes examined in lines from Africa, south Asia, and south Pacific islands suggesting that the integration occurred once from a single infectious wAna genome and then spread geographically. Further analysis revealed that for all D. ananassae we examined with the integrated wAna genomes, the majority of the integrated wAna genomic regions is represented in at least two copies suggesting a double integration or single integration followed by an integrated genome duplication. The possible evolutionary mechanism underlying the widespread geographical presence of the duplicate integration of the wAna genome is an intriguing question remaining to be answered. PMID:26254486

  14. Integrative Genomics Viewer (IGV) | Informatics Technology for Cancer Research (ITCR)

    Science.gov (United States)

    The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

  15. MycoCosm, an Integrated Fungal Genomics Resource

    Energy Technology Data Exchange (ETDEWEB)

    Shabalov, Igor; Grigoriev, Igor

    2012-03-16

    MycoCosm is a web-based interactive fungal genomics resource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/month or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.

  16. IMG: the integrated microbial genomes database and comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640

  17. Evolutionary inference via the Poisson Indel Process.

    Science.gov (United States)

    Bouchard-Côté, Alexandre; Jordan, Michael I

    2013-01-22

    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.

  18. Integrating cancer genomic data into electronic health records

    Directory of Open Access Journals (Sweden)

    Jeremy L. Warner

    2016-10-01

    Full Text Available Abstract The rise of genomically targeted therapies and immunotherapy has revolutionized the practice of oncology in the last 10–15 years. At the same time, new technologies and the electronic health record (EHR in particular have permeated the oncology clinic. Initially designed as billing and clinical documentation systems, EHR systems have not anticipated the complexity and variety of genomic information that needs to be reviewed, interpreted, and acted upon on a daily basis. Improved integration of cancer genomic data with EHR systems will help guide clinician decision making, support secondary uses, and ultimately improve patient care within oncology clinics. Some of the key factors relating to the challenge of integrating cancer genomic data into EHRs include: the bioinformatics pipelines that translate raw genomic data into meaningful, actionable results; the role of human curation in the interpretation of variant calls; and the need for consistent standards with regard to genomic and clinical data. Several emerging paradigms for integration are discussed in this review, including: non-standardized efforts between individual institutions and genomic testing laboratories; “middleware” products that portray genomic information, albeit outside of the clinical workflow; and application programming interfaces that have the potential to work within clinical workflow. The critical need for clinical-genomic knowledge bases, which can be independent or integrated into the aforementioned solutions, is also discussed.

  19. Perspectives of Integrative Cancer Genomics in Next Generation Sequencing Era

    Directory of Open Access Journals (Sweden)

    So Mee Kwon

    2012-06-01

    Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.

  20. G-InforBIO: integrated system for microbial genomics

    Directory of Open Access Journals (Sweden)

    Abe Takashi

    2006-08-01

    Full Text Available Abstract Background Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. Results The G-InforBIO system is a novel tool for genome data management and sequence analysis. The system can import genome data encoded as eXtensible Markup Language documents as formatted text documents, including annotations and sequences, from DNA Data Bank of Japan and GenBank encoded as flat files. The genome database is constructed automatically after importing, and the database can be exported as documents formatted with eXtensible Markup Language or tab-deliminated text. Users can retrieve data from the database by keyword searches, edit annotation data of genes, and process data with G-InforBIO. In addition, information in the G-InforBIO database can be analyzed seamlessly with nine different software programs, including programs for clustering and homology analyses. Conclusion The G-InforBIO system simplifies genome analyses by integrating several available software programs to allow efficient handling and manipulation of genome data. G-InforBIO is freely available from the download site.

  1. Integrated proteomic and genomic analysis of colorectal cancer

    Science.gov (United States)

    Investigators who analyzed 95 human colorectal tumor samples have determined how gene alterations identified in previous analyses of the same samples are expressed at the protein level. The integration of proteomic and genomic data, or proteogenomics, pro

  2. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Jizhong [Univ. of Oklahoma, Norman, OK (United States); He, Zhili [Univ. of Oklahoma, Norman, OK (United States)

    2014-04-08

    As a part of the Shewanella Federation project, we have used integrated genomic, proteomic and computational technologies to study various aspects of energy metabolism of two Shewanella strains from a systems-level perspective.

  3. An InDel in the Promoter of Al-ACTIVATED MALATE TRANSPORTER9 Selected during Tomato Domestication Determines Fruit Malate Contents and Aluminum Tolerance[OPEN

    Science.gov (United States)

    Wang, Xin; Hu, Tixu; Zhang, Fengxia; Wang, Bing; Li, Changxin; Yang, Tianxia; Li, Hanxia; Lu, Yongen; Ye, Zhibiao

    2017-01-01

    Deciphering the mechanism of malate accumulation in plants would contribute to a greater understanding of plant chemistry, which has implications for improving flavor quality in crop species and enhancing human health benefits. However, the regulation of malate metabolism is poorly understood in crops such as tomato (Solanum lycopersicum). Here, we integrated a metabolite-based genome-wide association study with linkage mapping and gene functional studies to characterize the genetics of malate accumulation in a global collection of tomato accessions with broad genetic diversity. We report that TFM6 (tomato fruit malate 6), which corresponds to Al-ACTIVATED MALATE TRANSPORTER9 (Sl-ALMT9 in tomato), is the major quantitative trait locus responsible for variation in fruit malate accumulation among tomato genotypes. A 3-bp indel in the promoter region of Sl-ALMT9 was linked to high fruit malate content. Further analysis indicated that this indel disrupts a W-box binding site in the Sl-ALMT9 promoter, which prevents binding of the WRKY transcription repressor Sl-WRKY42, thereby alleviating the repression of Sl-ALMT9 expression and promoting high fruit malate accumulation. Evolutionary analysis revealed that this highly expressed Sl-ALMT9 allele was selected for during tomato domestication. Furthermore, vacuole membrane-localized Sl-ALMT9 increases in abundance following Al treatment, thereby elevating malate transport and enhancing Al resistance. PMID:28814642

  4. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  5. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information

    Science.gov (United States)

    2012-01-01

    Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource

  6. INE: a rice genome database with an integrated map view.

    Science.gov (United States)

    Sakata, K; Antonio, B A; Mukai, Y; Nagasaki, H; Sakai, Y; Makino, K; Sasaki, T

    2000-01-01

    The Rice Genome Research Program (RGP) launched a large-scale rice genome sequencing in 1998 aimed at decoding all genetic information in rice. A new genome database called INE (INtegrated rice genome Explorer) has been developed in order to integrate all the genomic information that has been accumulated so far and to correlate these data with the genome sequence. A web interface based on Java applet provides a rapid viewing capability in the database. The first operational version of the database has been completed which includes a genetic map, a physical map using YAC (Yeast Artificial Chromosome) clones and PAC (P1-derived Artificial Chromosome) contigs. These maps are displayed graphically so that the positional relationships among the mapped markers on each chromosome can be easily resolved. INE incorporates the sequences and annotations of the PAC contig. A site on low quality information ensures that all submitted sequence data comply with the standard for accuracy. As a repository of rice genome sequence, INE will also serve as a common database of all sequence data obtained by collaborating members of the International Rice Genome Sequencing Project (IRGSP). The database can be accessed at http://www. dna.affrc.go.jp:82/giot/INE. html or its mirror site at http://www.staff.or.jp/giot/INE.html

  7. Incorporating indel information into phylogeny estimation for rapidly emerging pathogens

    Directory of Open Access Journals (Sweden)

    Suchard Marc A

    2007-03-01

    Full Text Available Abstract Background Phylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short times since divergence. To improve resolution of such phylogenies we propose using insertion and deletion (indel information in addition to substitution information. We accomplish this through joint estimation of alignment and phylogeny in a Bayesian framework, drawing inference using Markov chain Monte Carlo. Joint estimation of alignment and phylogeny sidesteps biases that stem from conditioning on a single alignment by taking into account the ensemble of near-optimal alignments. Results We introduce a novel Markov chain transition kernel that improves computational efficiency by proposing non-local topology rearrangements and by block sampling alignment and topology parameters. In addition, we extend our previous indel model to increase biological realism by placing indels preferentially on longer branches. We demonstrate the ability of indel information to increase phylogenetic resolution in examples drawn from within-host viral sequence samples. We also demonstrate the importance of taking alignment uncertainty into account when using such information. Finally, we show that codon-based substitution models can significantly affect alignment quality and phylogenetic inference by unrealistically forcing indels to begin and end between codons. Conclusion These results indicate that indel information can improve phylogenetic resolution of recently diverged pathogens and that alignment uncertainty should be considered in such analyses.

  8. Improving Microbial Genome Annotations in an Integrated Database Context

    Science.gov (United States)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  9. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  10. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  11. Human Papillomavirus Genome Integration and Head and Neck Cancer.

    Science.gov (United States)

    Pinatti, L M; Walline, H M; Carey, T E

    2018-06-01

    We conducted a critical review of human papillomavirus (HPV) integration into the host genome in oral/oropharyngeal cancer, reviewed the literature for HPV-induced cancers, and obtained current data for HPV-related oral and oropharyngeal cancers. In addition, we performed studies to identify HPV integration sites and the relationship of integration to viral-host fusion transcripts and whether integration is required for HPV-associated oncogenesis. Viral integration of HPV into the host genome is not required for the viral life cycle and might not be necessary for cellular transformation, yet HPV integration is frequently reported in cervical and head and neck cancer specimens. Studies of large numbers of early cervical lesions revealed frequent viral integration into gene-poor regions of the host genome with comparatively rare integration into cellular genes, suggesting that integration is a stochastic event and that site of integration may be largely a function of chance. However, more recent studies of head and neck squamous cell carcinomas (HNSCCs) suggest that integration may represent an additional oncogenic mechanism through direct effects on cancer-related gene expression and generation of hybrid viral-host fusion transcripts. In HNSCC cell lines as well as primary tumors, integration into cancer-related genes leading to gene disruption has been reported. The studies have shown that integration-induced altered gene expression may be associated with tumor recurrence. Evidence from several studies indicates that viral integration into genic regions is accompanied by local amplification, increased expression in some cases, interruption of gene expression, and likely additional oncogenic effects. Similarly, reported examples of viral integration near microRNAs suggest that altered expression of these regulatory molecules may also contribute to oncogenesis. Future work is indicated to identify the mechanisms of these events on cancer cell behavior.

  12. Integrative Genomic Analysis of Coincident Cancer Foci Implicates CTNNB1 and PTEN Alterations in Ductal Prostate Cancer.

    Science.gov (United States)

    Gillard, Marc; Lack, Justin; Pontier, Andrea; Gandla, Divya; Hatcher, David; Sowalsky, Adam G; Rodriguez-Nieves, Jose; Vander Griend, Donald; Paner, Gladell; VanderWeele, David

    2017-12-08

    Ductal adenocarcinoma of the prostate is an aggressive subtype, with high rates of biochemical recurrence and overall poor prognosis. It is frequently found coincident with conventional acinar adenocarcinoma. The genomic features driving evolution to its ductal histology and the biology associated with its poor prognosis remain unknown. To characterize genomic features distinguishing ductal adenocarcinoma from coincident acinar adenocarcinoma foci from the same patient. Ten patients with coincident acinar and ductal prostate cancer underwent prostatectomy. Laser microdissection was used to separately isolate acinar and ductal foci. DNA and RNA were extracted, and used for integrative genomic and transcriptomic analyses. Single nucleotide mutations, small indels, copy number estimates, and expression profiles were identified. Phylogenetic relationships between coincident foci were determined, and characteristics distinguishing ductal from acinar foci were identified. Exome sequencing, copy number estimates, and fusion genes demonstrated coincident ductal and acinar adenocarcinoma diverged from a common progenitor, yet they harbored distinct alterations unique to each focus. AR expression and activity were similar in both histologies. Nine of 10 cases had mutually exclusive CTNNB1 hotspot mutations or phosphatase and tensin homolog (PTEN) alterations in the ductal component, and these were absent in the acinar foci. These alterations were associated with changes in expression in WNT- and PI3K-pathway genes. Coincident ductal and acinar histologies typically are clonally related and thus arise from the same cell of origin. Ductal foci are enriched for cases with either a CTNNB1 hotspot mutation or a PTEN alteration, and are associated with WNT- or PI3K-pathway activation. These alterations are mutually exclusive and may represent distinct subtypes. The aggressive subtype ductal adenocarcinoma is closely related to conventional acinar prostate cancer. Ductal foci

  13. Integrated Genomic Characterization of Papillary Thyroid Carcinoma

    Science.gov (United States)

    Agrawal, Nishant; Akbani, Rehan; Aksoy, B. Arman; Ally, Adrian; Arachchi, Harindra; Asa, Sylvia L.; Auman, J. Todd; Balasundaram, Miruna; Balu, Saianand; Baylin, Stephen B.; Behera, Madhusmita; Bernard, Brady; Beroukhim, Rameen; Bishop, Justin A.; Black, Aaron D.; Bodenheimer, Tom; Boice, Lori; Bootwalla, Moiz S.; Bowen, Jay; Bowlby, Reanne; Bristow, Christopher A.; Brookens, Robin; Brooks, Denise; Bryant, Robert; Buda, Elizabeth; Butterfield, Yaron S.N.; Carling, Tobias; Carlsen, Rebecca; Carter, Scott L.; Carty, Sally E.; Chan, Timothy A.; Chen, Amy Y.; Cherniack, Andrew D.; Cheung, Dorothy; Chin, Lynda; Cho, Juok; Chu, Andy; Chuah, Eric; Cibulskis, Kristian; Ciriello, Giovanni; Clarke, Amanda; Clayman, Gary L.; Cope, Leslie; Copland, John; Covington, Kyle; Danilova, Ludmila; Davidsen, Tanja; Demchok, John A.; DiCara, Daniel; Dhalla, Noreen; Dhir, Rajiv; Dookran, Sheliann S.; Dresdner, Gideon; Eldridge, Jonathan; Eley, Greg; El-Naggar, Adel K.; Eng, Stephanie; Fagin, James A.; Fennell, Timothy; Ferris, Robert L.; Fisher, Sheila; Frazer, Scott; Frick, Jessica; Gabriel, Stacey B.; Ganly, Ian; Gao, Jianjiong; Garraway, Levi A.; Gastier-Foster, Julie M.; Getz, Gad; Gehlenborg, Nils; Ghossein, Ronald; Gibbs, Richard A.; Giordano, Thomas J.; Gomez-Hernandez, Karen; Grimsby, Jonna; Gross, Benjamin; Guin, Ranabir; Hadjipanayis, Angela; Harper, Hollie A.; Hayes, D. Neil; Heiman, David I.; Herman, James G.; Hoadley, Katherine A.; Hofree, Matan; Holt, Robert A.; Hoyle, Alan P.; Huang, Franklin W.; Huang, Mei; Hutter, Carolyn M.; Ideker, Trey; Iype, Lisa; Jacobsen, Anders; Jefferys, Stuart R.; Jones, Corbin D.; Jones, Steven J.M.; Kasaian, Katayoon; Kebebew, Electron; Khuri, Fadlo R.; Kim, Jaegil; Kramer, Roger; Kreisberg, Richard; Kucherlapati, Raju; Kwiatkowski, David J.; Ladanyi, Marc; Lai, Phillip H.; Laird, Peter W.; Lander, Eric; Lawrence, Michael S.; Lee, Darlene; Lee, Eunjung; Lee, Semin; Lee, William; Leraas, Kristen M.; Lichtenberg, Tara M.; Lichtenstein, Lee; Lin, Pei; Ling, Shiyun; Liu, Jinze; Liu, Wenbin; Liu, Yingchun; LiVolsi, Virginia A.; Lu, Yiling; Ma, Yussanne; Mahadeshwar, Harshad S.; Marra, Marco A.; Mayo, Michael; McFadden, David G.; Meng, Shaowu; Meyerson, Matthew; Mieczkowski, Piotr A.; Miller, Michael; Mills, Gordon; Moore, Richard A.; Mose, Lisle E.; Mungall, Andrew J.; Murray, Bradley A.; Nikiforov, Yuri E.; Noble, Michael S.; Ojesina, Akinyemi I.; Owonikoko, Taofeek K.; Ozenberger, Bradley A.; Pantazi, Angeliki; Parfenov, Michael; Park, Peter J.; Parker, Joel S.; Paull, Evan O.; Pedamallu, Chandra Sekhar; Perou, Charles M.; Prins, Jan F.; Protopopov, Alexei; Ramalingam, Suresh S.; Ramirez, Nilsa C.; Ramirez, Ricardo; Raphael, Benjamin J.; Rathmell, W. Kimryn; Ren, Xiaojia; Reynolds, Sheila M.; Rheinbay, Esther; Ringel, Matthew D.; Rivera, Michael; Roach, Jeffrey; Robertson, A. Gordon; Rosenberg, Mara W.; Rosenthall, Matthew; Sadeghi, Sara; Saksena, Gordon; Sander, Chris; Santoso, Netty; Schein, Jacqueline E.; Schultz, Nikolaus; Schumacher, Steven E.; Seethala, Raja R.; Seidman, Jonathan; Senbabaoglu, Yasin; Seth, Sahil; Sharpe, Samantha; Mills Shaw, Kenna R.; Shen, John P.; Shen, Ronglai; Sherman, Steven; Sheth, Margi; Shi, Yan; Shmulevich, Ilya; Sica, Gabriel L.; Simons, Janae V.; Sipahimalani, Payal; Smallridge, Robert C.; Sofia, Heidi J.; Soloway, Matthew G.; Song, Xingzhi; Sougnez, Carrie; Stewart, Chip; Stojanov, Petar; Stuart, Joshua M.; Tabak, Barbara; Tam, Angela; Tan, Donghui; Tang, Jiabin; Tarnuzzer, Roy; Taylor, Barry S.; Thiessen, Nina; Thorne, Leigh; Thorsson, Vésteinn; Tuttle, R. Michael; Umbricht, Christopher B.; Van Den Berg, David J.; Vandin, Fabio; Veluvolu, Umadevi; Verhaak, Roel G.W.; Vinco, Michelle; Voet, Doug; Walter, Vonn; Wang, Zhining; Waring, Scot; Weinberger, Paul M.; Weinstein, John N.; Weisenberger, Daniel J.; Wheeler, David; Wilkerson, Matthew D.; Wilson, Jocelyn; Williams, Michelle; Winer, Daniel A.; Wise, Lisa; Wu, Junyuan; Xi, Liu; Xu, Andrew W.; Yang, Liming; Yang, Lixing; Zack, Travis I.; Zeiger, Martha A.; Zeng, Dong; Zenklusen, Jean Claude; Zhao, Ni; Zhang, Hailei; Zhang, Jianhua; Zhang, Jiashan (Julia); Zhang, Wei; Zmuda, Erik; Zou., Lihua

    2014-01-01

    Summary Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Here, we describe the genomic landscape of 496 PTCs. We observed a low frequency of somatic alterations (relative to other carcinomas) and extended the set of known PTC driver alterations to include EIF1AX, PPM1D and CHEK2 and diverse gene fusions. These discoveries reduced the fraction of PTC cases with unknown oncogenic driver from 25% to 3.5%. Combined analyses of genomic variants, gene expression, and methylation demonstrated that different driver groups lead to different pathologies with distinct signaling and differentiation characteristics. Similarly, we identified distinct molecular subgroups of BRAF-mutant tumors and multidimensional analyses highlighted a potential involvement of oncomiRs in less-differentiated subgroups. Our results propose a reclassification of thyroid cancers into molecular subtypes that better reflect their underlying signaling and differentiation properties, which has the potential to improve their pathological classification and better inform the management of the disease. PMID:25417114

  14. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. Integrative Genome Comparison of Primary and Metastatic Melanomas

    Science.gov (United States)

    Feng, Bin; Nazarian, Rosalynn M.; Bosenberg, Marcus; Wu, Min; Scott, Kenneth L.; Kwong, Lawrence N.; Xiao, Yonghong; Cordon-Cardo, Carlos; Granter, Scott R.; Ramaswamy, Sridhar; Golub, Todd; Duncan, Lyn M.; Wagner, Stephan N.; Brennan, Cameron; Chin, Lynda

    2010-01-01

    A cardinal feature of malignant melanoma is its metastatic propensity. An incomplete view of the genetic events driving metastatic progression has been a major barrier to rational development of effective therapeutics and prognostic diagnostics for melanoma patients. In this study, we conducted global genomic characterization of primary and metastatic melanomas to examine the genomic landscape associated with metastatic progression. In addition to uncovering three genomic subclasses of metastastic melanomas, we delineated 39 focal and recurrent regions of amplification and deletions, many of which encompassed resident genes that have not been implicated in cancer or metastasis. To identify progression-associated metastasis gene candidates, we applied a statistical approach, Integrative Genome Comparison (IGC), to define 32 genomic regions of interest that were significantly altered in metastatic relative to primary melanomas, encompassing 30 resident genes with statistically significant expression deregulation. Functional assays on a subset of these candidates, including MET, ASPM, AKAP9, IMP3, PRKCA, RPA3, and SCAP2, validated their pro-invasion activities in human melanoma cells. Validity of the IGC approach was further reinforced by tissue microarray analysis of Survivin showing significant increased protein expression in thick versus thin primary cutaneous melanomas, and a progression correlation with lymph node metastases. Together, these functional validation results and correlative analysis of human tissues support the thesis that integrated genomic and pathological analyses of staged melanomas provide a productive entry point for discovery of melanoma metastases genes. PMID:20520718

  16. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization.

    Science.gov (United States)

    Wang, Qingguo; Jia, Peilin; Zhao, Zhongming

    2015-01-01

    Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.

  17. The Genome-Scale Integrated Networks in Microorganisms

    Directory of Open Access Journals (Sweden)

    Tong Hao

    2018-02-01

    Full Text Available The genome-scale cellular network has become a necessary tool in the systematic analysis of microbes. In a cell, there are several layers (i.e., types of the molecular networks, for example, genome-scale metabolic network (GMN, transcriptional regulatory network (TRN, and signal transduction network (STN. It has been realized that the limitation and inaccuracy of the prediction exist just using only a single-layer network. Therefore, the integrated network constructed based on the networks of the three types attracts more interests. The function of a biological process in living cells is usually performed by the interaction of biological components. Therefore, it is necessary to integrate and analyze all the related components at the systems level for the comprehensively and correctly realizing the physiological function in living organisms. In this review, we discussed three representative genome-scale cellular networks: GMN, TRN, and STN, representing different levels (i.e., metabolism, gene regulation, and cellular signaling of a cell’s activities. Furthermore, we discussed the integration of the networks of the three types. With more understanding on the complexity of microbial cells, the development of integrated network has become an inevitable trend in analyzing genome-scale cellular networks of microorganisms.

  18. KAIKObase: An integrated silkworm genome database and data mining tool

    Directory of Open Access Journals (Sweden)

    Nagaraju Javaregowda

    2009-10-01

    Full Text Available Abstract Background The silkworm, Bombyx mori, is one of the most economically important insects in many developing countries owing to its large-scale cultivation for silk production. With the development of genomic and biotechnological tools, B. mori has also become an important bioreactor for production of various recombinant proteins of biomedical interest. In 2004, two genome sequencing projects for B. mori were reported independently by Chinese and Japanese teams; however, the datasets were insufficient for building long genomic scaffolds which are essential for unambiguous annotation of the genome. Now, both the datasets have been merged and assembled through a joint collaboration between the two groups. Description Integration of the two data sets of silkworm whole-genome-shotgun sequencing by the Japanese and Chinese groups together with newly obtained fosmid- and BAC-end sequences produced the best continuity (~3.7 Mb in N50 scaffold size among the sequenced insect genomes and provided a high degree of nucleotide coverage (88% of all 28 chromosomes. In addition, a physical map of BAC contigs constructed by fingerprinting BAC clones and a SNP linkage map constructed using BAC-end sequences were available. In parallel, proteomic data from two-dimensional polyacrylamide gel electrophoresis in various tissues and developmental stages were compiled into a silkworm proteome database. Finally, a Bombyx trap database was constructed for documenting insertion positions and expression data of transposon insertion lines. Conclusion For efficient usage of genome information for functional studies, genomic sequences, physical and genetic map information and EST data were compiled into KAIKObase, an integrated silkworm genome database which consists of 4 map viewers, a gene viewer, and sequence, keyword and position search systems to display results and data at the level of nucleotide sequence, gene, scaffold and chromosome. Integration of the

  19. Integrating genomic selection into dairy cattle breeding programmes: a review.

    Science.gov (United States)

    Bouquet, A; Juga, J

    2013-05-01

    Extensive genetic progress has been achieved in dairy cattle populations on many traits of economic importance because of efficient breeding programmes. Success of these programmes has relied on progeny testing of the best young males to accurately assess their genetic merit and hence their potential for breeding. Over the last few years, the integration of dense genomic information into statistical tools used to make selection decisions, commonly referred to as genomic selection, has enabled gains in predicting accuracy of breeding values for young animals without own performance. The possibility to select animals at an early stage allows defining new breeding strategies aimed at boosting genetic progress while reducing costs. The first objective of this article was to review methods used to model and optimize breeding schemes integrating genomic selection and to discuss their relative advantages and limitations. The second objective was to summarize the main results and perspectives on the use of genomic selection in practical breeding schemes, on the basis of the example of dairy cattle populations. Two main designs of breeding programmes integrating genomic selection were studied in dairy cattle. Genomic selection can be used either for pre-selecting males to be progeny tested or for selecting males to be used as active sires in the population. The first option produces moderate genetic gains without changing the structure of breeding programmes. The second option leads to large genetic gains, up to double those of conventional schemes because of a major reduction in the mean generation interval, but it requires greater changes in breeding programme structure. The literature suggests that genomic selection becomes more attractive when it is coupled with embryo transfer technologies to further increase selection intensity on the dam-to-sire pathway. The use of genomic information also offers new opportunities to improve preservation of genetic variation. However

  20. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.

    Science.gov (United States)

    Thorvaldsdóttir, Helga; Robinson, James T; Mesirov, Jill P

    2013-03-01

    Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today's sequencing and array-based profiling methods present major challenges to visualization tools. The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution. A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data. Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues. To that end, IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems. IGV is freely available for download from http://www.broadinstitute.org/igv, under a GNU LGPL open-source license.

  1. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.

    Science.gov (United States)

    Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan

    2012-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.

  2. Integration of genomic information with biological networks using Cytoscape.

    Science.gov (United States)

    Bauer-Mehren, Anna

    2013-01-01

    Cytoscape is an open-source software for visualizing, analyzing, and modeling biological networks. This chapter explains how to use Cytoscape to analyze the functional effect of sequence variations in the context of biological networks such as protein-protein interaction networks and signaling pathways. The chapter is divided into five parts: (1) obtaining information about the functional effect of sequence variation in a Cytoscape readable format, (2) loading and displaying different types of biological networks in Cytoscape, (3) integrating the genomic information (SNPs and mutations) with the biological networks, and (4) analyzing the effect of the genomic perturbation onto the network structure using Cytoscape built-in functions. Finally, we briefly outline how the integrated data can help in building mathematical network models for analyzing the effect of the sequence variation onto the dynamics of the biological system. Each part is illustrated by step-by-step instructions on an example use case and visualized by many screenshots and figures.

  3. GDR (Genome Database for Rosaceae: integrated web resources for Rosaceae genomics and genetics research

    Directory of Open Access Journals (Sweden)

    Ficklin Stephen

    2004-09-01

    Full Text Available Abstract Background Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. Description The Genome Database for Rosaceae (GDR is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. Conclusions The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  4. GDR (Genome Database for Rosaceae): integrated web resources for Rosaceae genomics and genetics research.

    Science.gov (United States)

    Jung, Sook; Jesudurai, Christopher; Staton, Margaret; Du, Zhidian; Ficklin, Stephen; Cho, Ilhyung; Abbott, Albert; Tomkins, Jeffrey; Main, Dorrie

    2004-09-09

    Peach is being developed as a model organism for Rosaceae, an economically important family that includes fruits and ornamental plants such as apple, pear, strawberry, cherry, almond and rose. The genomics and genetics data of peach can play a significant role in the gene discovery and the genetic understanding of related species. The effective utilization of these peach resources, however, requires the development of an integrated and centralized database with associated analysis tools. The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database. GDR contains comprehensive data of the genetically anchored peach physical map, an annotated peach EST database, Rosaceae maps and markers and all publicly available Rosaceae sequences. Annotations of ESTs include contig assembly, putative function, simple sequence repeats, and anchored position to the peach physical map where applicable. Our integrated map viewer provides graphical interface to the genetic, transcriptome and physical mapping information. ESTs, BACs and markers can be queried by various categories and the search result sites are linked to the integrated map viewer or to the WebFPC physical map sites. In addition to browsing and querying the database, users can compare their sequences with the annotated GDR sequences via a dedicated sequence similarity server running either the BLAST or FASTA algorithm. To demonstrate the utility of the integrated and fully annotated database and analysis tools, we describe a case study where we anchored Rosaceae sequences to the peach physical and genetic map by sequence similarity. The GDR has been initiated to meet the major deficiency in Rosaceae genomics and genetics research, namely a centralized web database and bioinformatics tools for data storage, analysis and exchange. GDR can be accessed at http://www.genome.clemson.edu/gdr/.

  5. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  6. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Andrei L. Osterman, Ph.D.

    2012-12-17

    Integration of bioinformatics and experimental techniques was applied to mapping and characterization of the key components (pathways, enzymes, transporters, regulators) of the core metabolic machinery in Shewanella oneidensis and related species with main focus was on metabolic and regulatory pathways involved in utilization of various carbon and energy sources. Among the main accomplishments reflected in ten joint publications with other participants of Shewanella Federation are: (i) A systems-level reconstruction of carbohydrate utilization pathways in the genus of Shewanella (19 species). This analysis yielded reconstruction of 18 sugar utilization pathways including 10 novel pathway variants and prediction of > 60 novel protein families of enzymes, transporters and regulators involved in these pathways. Selected functional predictions were verified by focused biochemical and genetic experiments. Observed growth phenotypes were consistent with bioinformatic predictions providing strong validation of the technology and (ii) Global genomic reconstruction of transcriptional regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors, 8 riboswitches and 6 translational attenuators. Of those, 45 regulons were inferred directly from the genome context analysis, whereas others were propagated from previously characterized regulons in other species. Selected regulatory predictions were experimentally tested. Integration of this analysis with microarray data revealed overall consistency and provided additional layer of interactions between regulons. All the results were captured in the new database RegPrecise, which is a joint development with the LBNL team. A more detailed analysis of the individual subsystems, pathways and regulons in Shewanella spp included bioinfiormatics-based prediction and experimental characterization of: (i) N-Acetylglucosamine catabolic pathway; (ii)Lactate utilization machinery; (iii) Novel Nrt

  7. Visualization of RNA structure models within the Integrative Genomics Viewer.

    Science.gov (United States)

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  8. Structured Matrix Completion with Applications to Genomic Data Integration.

    Science.gov (United States)

    Cai, Tianxi; Cai, T Tony; Zhang, Anru

    2016-01-01

    Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  9. An integrated semiconductor device enabling non-optical genome sequencing.

    Science.gov (United States)

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  10. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  11. STINGRAY: system for integrated genomic resources and analysis.

    Science.gov (United States)

    Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R

    2014-03-07

    The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.

  12. Integrated Genome-Based Studies of Shewanella Echophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Margrethe H. Serres

    2012-06-29

    Shewanella oneidensis MR-1 is a motile, facultative {gamma}-Proteobacterium with remarkable respiratory versatility; it can utilize a range of organic and inorganic compounds as terminal electronacceptors for anaerobic metabolism. The ability to effectively reduce nitrate, S0, polyvalent metals andradionuclides has established MR-1 as an important model dissimilatory metal-reducing microorganism for genome-based investigations of biogeochemical transformation of metals and radionuclides that are of concern to the U.S. Department of Energy (DOE) sites nationwide. Metal-reducing bacteria such as Shewanella also have a highly developed capacity for extracellular transfer of respiratory electrons to solid phase Fe and Mn oxides as well as directly to anode surfaces in microbial fuel cells. More broadly, Shewanellae are recognized free-living microorganisms and members of microbial communities involved in the decomposition of organic matter and the cycling of elements in aquatic and sedimentary systems. To function and compete in environments that are subject to spatial and temporal environmental change, Shewanella must be able to sense and respond to such changes and therefore require relatively robust sensing and regulation systems. The overall goal of this project is to apply the tools of genomics, leveraging the availability of genome sequence for 18 additional strains of Shewanella, to better understand the ecophysiology and speciation of respiratory-versatile members of this important genus. To understand these systems we propose to use genome-based approaches to investigate Shewanella as a system of integrated networks; first describing key cellular subsystems - those involved in signal transduction, regulation, and metabolism - then building towards understanding the function of whole cells and, eventually, cells within populations. As a general approach, this project will employ complimentary "top-down" - bioinformatics-based genome functional predictions, high

  13. Pathogenesis comparison between the United States porcine epidemic diarrhoea virus prototype and S-INDEL-variant strains in conventional neonatal piglets.

    Science.gov (United States)

    Chen, Qi; Gauger, Phillip C; Stafne, Molly R; Thomas, Joseph T; Madson, Darin M; Huang, Haiyan; Zheng, Ying; Li, Ganwu; Zhang, Jianqiang

    2016-05-01

    At least two genetically different porcine epidemic diarrhoea virus (PEDV) strains have been identified in the USA: US PEDV prototype and S-INDEL-variant strains. The objective of this study was to compare the pathogenicity differences of the US PEDV prototype and S-INDEL-variant strains in conventional neonatal piglets under experimental infections. Fifty PEDV-negative 5-day-old pigs were divided into five groups of ten pigs each and were inoculated orogastrically with three US PEDV prototype isolates (IN19338/2013, NC35140/2013 and NC49469/2013), an S-INDEL-variant isolate (IL20697/2014), and virus-negative culture medium, respectively, with virus titres of 104 TCID50 ml- 1, 10 ml per pig. All three PEDV prototype isolates tested in this study, regardless of their phylogenetic clades, had similar pathogenicity and caused severe enteric disease in 5-day-old pigs as evidenced by clinical signs, faecal virus shedding, and gross and histopathological lesions. Compared with pigs inoculated with the three US PEDV prototype isolates, pigs inoculated with the S-INDEL-variant isolate had significantly diminished clinical signs, virus shedding in faeces, gross lesions in small intestines, caeca and colons, histopathological lesions in small intestines, and immunohistochemistry staining in ileum. However, the US PEDV prototype and the S-INDEL-variant strains induced similar viraemia levels in inoculated pigs. Whole genome sequences of the PEDV prototype and S-INDEL-variant strains were determined, but the molecular basis of virulence differences between these PEDV strains remains to be elucidated using a reverse genetics approach.

  14. Data integration to prioritize drugs using genomics and curated data.

    Science.gov (United States)

    Louhimo, Riku; Laakso, Marko; Belitskin, Denis; Klefström, Juha; Lehtonen, Rainer; Hautaniemi, Sampsa

    2016-01-01

    Genomic alterations affecting drug target proteins occur in several tumor types and are prime candidates for patient-specific tailored treatments. Increasingly, patients likely to benefit from targeted cancer therapy are selected based on molecular alterations. The selection of a precision therapy benefiting most patients is challenging but can be enhanced with integration of multiple types of molecular data. Data integration approaches for drug prioritization have successfully integrated diverse molecular data but do not take full advantage of existing data and literature. We have built a knowledge-base which connects data from public databases with molecular results from over 2200 tumors, signaling pathways and drug-target databases. Moreover, we have developed a data mining algorithm to effectively utilize this heterogeneous knowledge-base. Our algorithm is designed to facilitate retargeting of existing drugs by stratifying samples and prioritizing drug targets. We analyzed 797 primary tumors from The Cancer Genome Atlas breast and ovarian cancer cohorts using our framework. FGFR, CDK and HER2 inhibitors were prioritized in breast and ovarian data sets. Estrogen receptor positive breast tumors showed potential sensitivity to targeted inhibitors of FGFR due to activation of FGFR3. Our results suggest that computational sample stratification selects potentially sensitive samples for targeted therapies and can aid in precision medicine drug repositioning. Source code is available from http://csblcanges.fimm.fi/GOPredict/.

  15. Integrated genomic and gene expression profiling identifies two major genomic circuits in urothelial carcinoma.

    Directory of Open Access Journals (Sweden)

    David Lindgren

    Full Text Available Similar to other malignancies, urothelial carcinoma (UC is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21, and BCL2L1 (20q11. We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.

  16. The Proteins API: accessing key integrated protein and genome information.

    Science.gov (United States)

    Nightingale, Andrew; Antunes, Ricardo; Alpi, Emanuele; Bursteinas, Borisas; Gonzales, Leonardo; Liu, Wudong; Luo, Jie; Qi, Guoying; Turner, Edd; Martin, Maria

    2017-07-03

    The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to 'talk' to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.

    Directory of Open Access Journals (Sweden)

    Gómez Marcela

    2009-12-01

    Full Text Available Abstract Background Expressed sequence tags (ESTs are an important source of gene-based markers such as those based on insertion-deletions (Indels or single-nucleotide polymorphisms (SNPs. Several gel based methods have been reported for the detection of sequence variants, however they have not been widely exploited in common bean, an important legume crop of the developing world. The objectives of this project were to develop and map EST based markers using analysis of single strand conformation polymorphisms (SSCPs, to create a transcript map for common bean and to compare synteny of the common bean map with sequenced chromosomes of other legumes. Results A set of 418 EST based amplicons were evaluated for parental polymorphisms using the SSCP technique and 26% of these presented a clear conformational or size polymorphism between Andean and Mesoamerican genotypes. The amplicon based markers were then used for genetic mapping with segregation analysis performed in the DOR364 × G19833 recombinant inbred line (RIL population. A total of 118 new marker loci were placed into an integrated molecular map for common bean consisting of 288 markers. Of these, 218 were used for synteny analysis and 186 presented homology with segments of the soybean genome with an e-value lower than 7 × 10-12. The synteny analysis with soybean showed a mosaic pattern of syntenic blocks with most segments of any one common bean linkage group associated with two soybean chromosomes. The analysis with Medicago truncatula and Lotus japonicus presented fewer syntenic regions consistent with the more distant phylogenetic relationship between the galegoid and phaseoloid legumes. Conclusion The SSCP technique is a useful and inexpensive alternative to other SNP or Indel detection techniques for saturating the common bean genetic map with functional markers that may be useful in marker assisted selection. In addition, the genetic markers based on ESTs allowed the construction

  18. Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.).

    Science.gov (United States)

    Galeano, Carlos H; Fernández, Andrea C; Gómez, Marcela; Blair, Matthew W

    2009-12-23

    Expressed sequence tags (ESTs) are an important source of gene-based markers such as those based on insertion-deletions (Indels) or single-nucleotide polymorphisms (SNPs). Several gel based methods have been reported for the detection of sequence variants, however they have not been widely exploited in common bean, an important legume crop of the developing world. The objectives of this project were to develop and map EST based markers using analysis of single strand conformation polymorphisms (SSCPs), to create a transcript map for common bean and to compare synteny of the common bean map with sequenced chromosomes of other legumes. A set of 418 EST based amplicons were evaluated for parental polymorphisms using the SSCP technique and 26% of these presented a clear conformational or size polymorphism between Andean and Mesoamerican genotypes. The amplicon based markers were then used for genetic mapping with segregation analysis performed in the DOR364 x G19833 recombinant inbred line (RIL) population. A total of 118 new marker loci were placed into an integrated molecular map for common bean consisting of 288 markers. Of these, 218 were used for synteny analysis and 186 presented homology with segments of the soybean genome with an e-value lower than 7 x 10-12. The synteny analysis with soybean showed a mosaic pattern of syntenic blocks with most segments of any one common bean linkage group associated with two soybean chromosomes. The analysis with Medicago truncatula and Lotus japonicus presented fewer syntenic regions consistent with the more distant phylogenetic relationship between the galegoid and phaseoloid legumes. The SSCP technique is a useful and inexpensive alternative to other SNP or Indel detection techniques for saturating the common bean genetic map with functional markers that may be useful in marker assisted selection. In addition, the genetic markers based on ESTs allowed the construction of a transcript map and given their high conservation

  19. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    KAUST Repository

    Alam, Intikhab; Antunes, André ; Kamau, Allan; Ba Alawi, Wail; Kalkatawi, Manal M.; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.

  20. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    KAUST Repository

    Alam, Intikhab

    2013-12-06

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.

  1. GapCoder automates the use of indel characters in phylogenetic analysis.

    Science.gov (United States)

    Young, Nelson D; Healy, John

    2003-02-19

    Several ways of incorporating indels into phylogenetic analysis have been suggested. Simple indel coding has two strengths: (1) biological realism and (2) efficiency of analysis. In the method, each indel with different start and/or end positions is considered to be a separate character. The presence/absence of these indel characters is then added to the data set. We have written a program, GapCoder to automate this procedure. The program can input PIR format aligned datasets, find the indels and add the indel-based characters. The output is a NEXUS format file, which includes a table showing what region each indel characters is based on. If regions are excluded from analysis, this table makes it easy to identify the corresponding indel characters for exclusion. Manual implementation of the simple indel coding method can be very time-consuming, especially in data sets where indels are numerous and/or overlapping. GapCoder automates this method and is therefore particularly useful during procedures where phylogenetic analyses need to be repeated many times, such as when different alignments are being explored or when various taxon or character sets are being explored. GapCoder is currently available for Windows from http://www.home.duq.edu/~youngnd/GapCoder.

  2. IMG 4 version of the integrated microbial genomes comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883

  3. IMG 4 version of the integrated microbial genomes comparative analysis system

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chen, I-Min A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Palaniappan, Krishna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chu, Ken [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Szeto, Ernest [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Pillay, Manoj [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Ratner, Anna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Huang, Jinghua [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Woyke, Tanja [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Huntemann, Marcel [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Anderson, Iain [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Billis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Varghese, Neha [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Mavromatis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Pati, Amrita [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Ivanova, Natalia N. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program

    2013-10-27

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  4. GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data.

    Science.gov (United States)

    Jung, Sook; Staton, Margaret; Lee, Taein; Blenda, Anna; Svancara, Randall; Abbott, Albert; Main, Dorrie

    2008-01-01

    The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org.

  5. Genome-wide profiling of HPV integration in cervical cancer identifies clustered genomic hot spots and a potential microhomology-mediated integration mechanism

    DEFF Research Database (Denmark)

    Hu, Zheng; Zhu, Da; Wang, Wei

    2015-01-01

    Human papillomavirus (HPV) integration is a key genetic event in cervical carcinogenesis1. By conducting whole-genome sequencing and high-throughput viral integration detection, we identified 3,667 HPV integration breakpoints in 26 cervical intraepithelial neoplasias, 104 cervical carcinomas and ...

  6. Integrated Genomic Analysis of the Ubiquitin Pathway across Cancer Types

    Directory of Open Access Journals (Sweden)

    Zhongqi Ge

    2018-04-01

    Full Text Available Summary: Protein ubiquitination is a dynamic and reversible process of adding single ubiquitin molecules or various ubiquitin chains to target proteins. Here, using multidimensional omic data of 9,125 tumor samples across 33 cancer types from The Cancer Genome Atlas, we perform comprehensive molecular characterization of 929 ubiquitin-related genes and 95 deubiquitinase genes. Among them, we systematically identify top somatic driver candidates, including mutated FBXW7 with cancer-type-specific patterns and amplified MDM2 showing a mutually exclusive pattern with BRAF mutations. Ubiquitin pathway genes tend to be upregulated in cancer mediated by diverse mechanisms. By integrating pan-cancer multiomic data, we identify a group of tumor samples that exhibit worse prognosis. These samples are consistently associated with the upregulation of cell-cycle and DNA repair pathways, characterized by mutated TP53, MYC/TERT amplification, and APC/PTEN deletion. Our analysis highlights the importance of the ubiquitin pathway in cancer development and lays a foundation for developing relevant therapeutic strategies. : Ge et al. analyze a cohort of 9,125 TCGA samples across 33 cancer types to provide a comprehensive characterization of the ubiquitin pathway. They detect somatic driver candidates in the ubiquitin pathway and identify a cluster of patients with poor survival, highlighting the importance of this pathway in cancer development. Keywords: ubiquitin pathway, pan-cancer analysis, The Cancer Genome Atlas, tumor subtype, cancer prognosis, therapeutic targets, biomarker, FBXW7

  7. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    when applied to the manually curated training set. Applying this method to the data representing around a quarter of the fraction space for water soluble proteins in D. vulgaris, we obtained 854 reliable pair wise interactions. Further, we have developed algorithms to analyze and assign significance to protein interaction data from bait pull-down experiments and integrate these data with other systems biology data through associative biclustering in a parallel computing environment. We will 'fill-in' missing information in these interaction data using a 'Transitive Closure' algorithm and subsequently use 'Between Commonality Decomposition' algorithm to discover complexes within these large graphs of protein interactions. To characterize the metabolic activities of proteins and their complexes we are developing algorithms to deconvolute pure mass spectra, estimate chemical formula for m/z values, and fit isotopic fine structure to metabolomics data. We have discovered that in comparison to isotopic pattern fitting methods restricting the chemical formula by these two dimensions actually facilitates unique solutions for chemical formula generators. To understand how microbial functions are regulated we have developed complementary algorithms for reconstructing gene regulatory networks (GRNs). Whereas the network inference algorithms cMonkey and Inferelator developed enable de novo reconstruction of predictive models for GRNs from diverse systems biology data, the RegPrecise and RegPredict framework developed uses evolutionary comparisons of genomes from closely related organisms to reconstruct conserved regulons. We have integrated the two complementary algorithms to rapidly generate comprehensive models for gene regulation of understudied organisms. Our preliminary analyses of these reconstructed GRNs have revealed novel regulatory mechanisms and cis-regulatory motifs, as well asothers that are conserved across species. Finally, we are

  8. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR

    DEFF Research Database (Denmark)

    Jakočiūnas, Tadas; Jensen, Emil D.; Jensen, Michael Krogh

    2018-01-01

    and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking......Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome...... engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. Cas...

  9. Systematic analysis of short internal indels and their impact on protein folding

    Directory of Open Access Journals (Sweden)

    Guo Jun-tao

    2010-08-01

    Full Text Available Abstract Background Protein sequence insertions/deletions (indels can be introduced during evolution or through alternative splicing (AS. Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB. Results We compiled a non-redundant dataset of short internal indels (2-40 amino acids from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations of 2Å or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs

  10. The Integrated Microbial Genomes (IMG) System: An Expanding Comparative Analysis Resource

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2009-09-13

    The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at .

  11. Stakeholder engagement: a key component of integrating genomic information into electronic health records.

    Science.gov (United States)

    Hartzler, Andrea; McCarty, Catherine A; Rasmussen, Luke V; Williams, Marc S; Brilliant, Murray; Bowton, Erica A; Clayton, Ellen Wright; Faucett, William A; Ferryman, Kadija; Field, Julie R; Fullerton, Stephanie M; Horowitz, Carol R; Koenig, Barbara A; McCormick, Jennifer B; Ralston, James D; Sanderson, Saskia C; Smith, Maureen E; Trinidad, Susan Brown

    2013-10-01

    Integrating genomic information into clinical care and the electronic health record can facilitate personalized medicine through genetically guided clinical decision support. Stakeholder involvement is critical to the success of these implementation efforts. Prior work on implementation of clinical information systems provides broad guidance to inform effective engagement strategies. We add to this evidence-based recommendations that are specific to issues at the intersection of genomics and the electronic health record. We describe stakeholder engagement strategies employed by the Electronic Medical Records and Genomics Network, a national consortium of US research institutions funded by the National Human Genome Research Institute to develop, disseminate, and apply approaches that combine genomic and electronic health record data. Through select examples drawn from sites of the Electronic Medical Records and Genomics Network, we illustrate a continuum of engagement strategies to inform genomic integration into commercial and homegrown electronic health records across a range of health-care settings. We frame engagement as activities to consult, involve, and partner with key stakeholder groups throughout specific phases of health information technology implementation. Our aim is to provide insights into engagement strategies to guide genomic integration based on our unique network experiences and lessons learned within the broader context of implementation research in biomedical informatics. On the basis of our collective experience, we describe key stakeholder practices, challenges, and considerations for successful genomic integration to support personalized medicine.

  12. Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.

    Science.gov (United States)

    Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu

    2015-01-01

    The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.

  13. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR.

    Science.gov (United States)

    Jakočiūnas, Tadas; Jensen, Emil D; Jensen, Michael K; Keasling, Jay D

    2018-01-01

    Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. CasEMBLR capitalizes on the CRISPR/Cas9 technology to generate double-strand breaks in genomic loci, thus prompting native homologous recombination (HR) machinery to integrate exogenously derived homology templates. As proof-of-principle for microbial cell factory development, CasEMBLR was used for one-step assembly and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking out two genes. This new method complements and improves the field of genome engineering in S. cerevisiae by providing a more flexible platform for rapid and precise strain building.

  14. Developing market class specific InDel markers from next generation sequence data in Phaseolus vulgaris L.

    Directory of Open Access Journals (Sweden)

    Samira eMafi Moghaddam

    2014-05-01

    Full Text Available Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L. is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6x to 5.1x coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2,687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate.

  15. Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates.

    Science.gov (United States)

    Onogi, Akio; Watanabe, Maya; Mochizuki, Toshihiro; Hayashi, Takeshi; Nakagawa, Hiroshi; Hasegawa, Toshihiro; Iwata, Hiroyoshi

    2016-04-01

    It is suggested that accuracy in predicting plant phenotypes can be improved by integrating genomic prediction with crop modelling in a single hierarchical model. Accurate prediction of phenotypes is important for plant breeding and management. Although genomic prediction/selection aims to predict phenotypes on the basis of whole-genome marker information, it is often difficult to predict phenotypes of complex traits in diverse environments, because plant phenotypes are often influenced by genotype-environment interaction. A possible remedy is to integrate genomic prediction with crop/ecophysiological modelling, which enables us to predict plant phenotypes using environmental and management information. To this end, in the present study, we developed a novel method for integrating genomic prediction with phenological modelling of Asian rice (Oryza sativa, L.), allowing the heading date of untested genotypes in untested environments to be predicted. The method simultaneously infers the phenological model parameters and whole-genome marker effects on the parameters in a Bayesian framework. By cultivating backcross inbred lines of Koshihikari × Kasalath in nine environments, we evaluated the potential of the proposed method in comparison with conventional genomic prediction, phenological modelling, and two-step methods that applied genomic prediction to phenological model parameters inferred from Nelder-Mead or Markov chain Monte Carlo algorithms. In predicting heading dates of untested lines in untested environments, the proposed and two-step methods tended to provide more accurate predictions than the conventional genomic prediction methods, particularly in environments where phenotypes from environments similar to the target environment were unavailable for training genomic prediction. The proposed method showed greater accuracy in prediction than the two-step methods in all cross-validation schemes tested, suggesting the potential of the integrated approach in

  16. Characterization of Equine Infectious Anemia Virus Integration in the Horse Genome

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2015-06-01

    Full Text Available Human immunodeficiency virus (HIV-1 has a unique integration profile in the human genome relative to murine and avian retroviruses. Equine infectious anemia virus (EIAV is another well-studied lentivirus that can also be used as a promising retro-transfection vector, but its integration into its native host has not been characterized. In this study, we mapped 477 integration sites of the EIAV strain EIAVFDDV13 in fetal equine dermal (FED cells during in vitro infection. Published integration sites of EIAV and HIV-1 in the human genome were also analyzed as references. Our results demonstrated that EIAVFDDV13 tended to integrate into genes and AT-rich regions, and it avoided integrating into transcription start sites (TSS, which is consistent with EIAV and HIV-1 integration in the human genome. Notably, the integration of EIAVFDDV13 favored long interspersed elements (LINEs and DNA transposons in the horse genome, whereas the integration of HIV-1 favored short interspersed elements (SINEs in the human genome. The chromosomal environment near LINEs or DNA transposons potentially influences viral transcription and may be related to the unique EIAV latency states in equids. The data on EIAV integration in its natural host will facilitate studies on lentiviral infection and lentivirus-based therapeutic vectors.

  17. Community standards for genomic resources, genetic conservation, and data integration

    Science.gov (United States)

    Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson

    2017-01-01

    Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...

  18. GIGGLE: a search engine for large-scale integrated genome analysis

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061

  19. GIGGLE: a search engine for large-scale integrated genome analysis.

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-02-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  20. The three-dimensional genome organization of Drosophila melanogaster through data integration.

    Science.gov (United States)

    Li, Qingjiao; Tjong, Harianto; Li, Xiao; Gong, Ke; Zhou, Xianghong Jasmine; Chiolo, Irene; Alber, Frank

    2017-07-31

    Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.

  1. Protecting genomic integrity in somatic cells and embryonic stem cells

    International Nuclear Information System (INIS)

    Hong, Y.; Cervantes, R.B.; Tichy, E.; Tischfield, J.A.; Stambrook, P.J.

    2007-01-01

    Mutation frequencies at some loci in mammalian somatic cells in vivo approach 10 -4 . The majority of these events occur as a consequence of loss of heterozygosity (LOH) due to mitotic recombination. Such high levels of DNA damage in somatic cells, which can accumulate with age, will cause injury and, after a latency period, may lead to somatic disease and ultimately death. This high level of DNA damage is untenable for germ cells, and by extrapolation for embryonic stem (ES) cells, that must recreate the organism. ES cells cannot tolerate such a high frequency of damage since mutations will immediately impact the altered cell, and subsequently the entire organism. Most importantly, the mutations may be passed on to future generations. ES cells, therefore, must have robust mechanisms to protect the integrity of their genomes. We have examined two such mechanisms. Firstly, we have shown that mutation frequencies and frequencies of mitotic recombination in ES cells are about 100-fold lower than in adult somatic cells or in isogenic mouse embryonic fibroblasts (MEFs). A second complementary protective mechanism eliminates those ES cells that have acquired a mutational burden, thereby maintaining a pristine population. Consistent with this hypothesis, ES cells lack a G1 checkpoint, and the two known signaling pathways that mediate the checkpoint are compromised. The checkpoint kinase, Chk2, which participates in both pathways is sequestered at centrosomes in ES cells and does not phosphorylate its substrates (i.e. p53 and Cdc25A) that must be modified to produce a G1 arrest. Ectopic expression of Chk2 does not rescue the p53-mediated pathway, but does restore the pathway mediated by Cdc25A. Wild type ES cells exposed to ionizing radiation do not accumulate in G1 but do so in S-phase and in G2. ES cells that ectopically express Chk2 undergo cell cycle arrest in G1 as well as G2, and appear to be protected from apoptosis

  2. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas.

    Science.gov (United States)

    Brat, Daniel J; Verhaak, Roel G W; Aldape, Kenneth D; Yung, W K Alfred; Salama, Sofie R; Cooper, Lee A D; Rheinbay, Esther; Miller, C Ryan; Vitucci, Mark; Morozova, Olena; Robertson, A Gordon; Noushmehr, Houtan; Laird, Peter W; Cherniack, Andrew D; Akbani, Rehan; Huse, Jason T; Ciriello, Giovanni; Poisson, Laila M; Barnholtz-Sloan, Jill S; Berger, Mitchel S; Brennan, Cameron; Colen, Rivka R; Colman, Howard; Flanders, Adam E; Giannini, Caterina; Grifford, Mia; Iavarone, Antonio; Jain, Rajan; Joseph, Isaac; Kim, Jaegil; Kasaian, Katayoon; Mikkelsen, Tom; Murray, Bradley A; O'Neill, Brian Patrick; Pachter, Lior; Parsons, Donald W; Sougnez, Carrie; Sulman, Erik P; Vandenberg, Scott R; Van Meir, Erwin G; von Deimling, Andreas; Zhang, Hailei; Crain, Daniel; Lau, Kevin; Mallery, David; Morris, Scott; Paulauskis, Joseph; Penny, Robert; Shelton, Troy; Sherman, Mark; Yena, Peggy; Black, Aaron; Bowen, Jay; Dicostanzo, Katie; Gastier-Foster, Julie; Leraas, Kristen M; Lichtenberg, Tara M; Pierson, Christopher R; Ramirez, Nilsa C; Taylor, Cynthia; Weaver, Stephanie; Wise, Lisa; Zmuda, Erik; Davidsen, Tanja; Demchok, John A; Eley, Greg; Ferguson, Martin L; Hutter, Carolyn M; Mills Shaw, Kenna R; Ozenberger, Bradley A; Sheth, Margi; Sofia, Heidi J; Tarnuzzer, Roy; Wang, Zhining; Yang, Liming; Zenklusen, Jean Claude; Ayala, Brenda; Baboud, Julien; Chudamani, Sudha; Jensen, Mark A; Liu, Jia; Pihl, Todd; Raman, Rohini; Wan, Yunhu; Wu, Ye; Ally, Adrian; Auman, J Todd; Balasundaram, Miruna; Balu, Saianand; Baylin, Stephen B; Beroukhim, Rameen; Bootwalla, Moiz S; Bowlby, Reanne; Bristow, Christopher A; Brooks, Denise; Butterfield, Yaron; Carlsen, Rebecca; Carter, Scott; Chin, Lynda; Chu, Andy; Chuah, Eric; Cibulskis, Kristian; Clarke, Amanda; Coetzee, Simon G; Dhalla, Noreen; Fennell, Tim; Fisher, Sheila; Gabriel, Stacey; Getz, Gad; Gibbs, Richard; Guin, Ranabir; Hadjipanayis, Angela; Hayes, D Neil; Hinoue, Toshinori; Hoadley, Katherine; Holt, Robert A; Hoyle, Alan P; Jefferys, Stuart R; Jones, Steven; Jones, Corbin D; Kucherlapati, Raju; Lai, Phillip H; Lander, Eric; Lee, Semin; Lichtenstein, Lee; Ma, Yussanne; Maglinte, Dennis T; Mahadeshwar, Harshad S; Marra, Marco A; Mayo, Michael; Meng, Shaowu; Meyerson, Matthew L; Mieczkowski, Piotr A; Moore, Richard A; Mose, Lisle E; Mungall, Andrew J; Pantazi, Angeliki; Parfenov, Michael; Park, Peter J; Parker, Joel S; Perou, Charles M; Protopopov, Alexei; Ren, Xiaojia; Roach, Jeffrey; Sabedot, Thaís S; Schein, Jacqueline; Schumacher, Steven E; Seidman, Jonathan G; Seth, Sahil; Shen, Hui; Simons, Janae V; Sipahimalani, Payal; Soloway, Matthew G; Song, Xingzhi; Sun, Huandong; Tabak, Barbara; Tam, Angela; Tan, Donghui; Tang, Jiabin; Thiessen, Nina; Triche, Timothy; Van Den Berg, David J; Veluvolu, Umadevi; Waring, Scot; Weisenberger, Daniel J; Wilkerson, Matthew D; Wong, Tina; Wu, Junyuan; Xi, Liu; Xu, Andrew W; Yang, Lixing; Zack, Travis I; Zhang, Jianhua; Aksoy, B Arman; Arachchi, Harindra; Benz, Chris; Bernard, Brady; Carlin, Daniel; Cho, Juok; DiCara, Daniel; Frazer, Scott; Fuller, Gregory N; Gao, JianJiong; Gehlenborg, Nils; Haussler, David; Heiman, David I; Iype, Lisa; Jacobsen, Anders; Ju, Zhenlin; Katzman, Sol; Kim, Hoon; Knijnenburg, Theo; Kreisberg, Richard Bailey; Lawrence, Michael S; Lee, William; Leinonen, Kalle; Lin, Pei; Ling, Shiyun; Liu, Wenbin; Liu, Yingchun; Liu, Yuexin; Lu, Yiling; Mills, Gordon; Ng, Sam; Noble, Michael S; Paull, Evan; Rao, Arvind; Reynolds, Sheila; Saksena, Gordon; Sanborn, Zack; Sander, Chris; Schultz, Nikolaus; Senbabaoglu, Yasin; Shen, Ronglai; Shmulevich, Ilya; Sinha, Rileen; Stuart, Josh; Sumer, S Onur; Sun, Yichao; Tasman, Natalie; Taylor, Barry S; Voet, Doug; Weinhold, Nils; Weinstein, John N; Yang, Da; Yoshihara, Kosuke; Zheng, Siyuan; Zhang, Wei; Zou, Lihua; Abel, Ty; Sadeghi, Sara; Cohen, Mark L; Eschbacher, Jenny; Hattab, Eyas M; Raghunathan, Aditya; Schniederjan, Matthew J; Aziz, Dina; Barnett, Gene; Barrett, Wendi; Bigner, Darell D; Boice, Lori; Brewer, Cathy; Calatozzolo, Chiara; Campos, Benito; Carlotti, Carlos Gilberto; Chan, Timothy A; Cuppini, Lucia; Curley, Erin; Cuzzubbo, Stefania; Devine, Karen; DiMeco, Francesco; Duell, Rebecca; Elder, J Bradley; Fehrenbach, Ashley; Finocchiaro, Gaetano; Friedman, William; Fulop, Jordonna; Gardner, Johanna; Hermes, Beth; Herold-Mende, Christel; Jungk, Christine; Kendler, Ady; Lehman, Norman L; Lipp, Eric; Liu, Ouida; Mandt, Randy; McGraw, Mary; Mclendon, Roger; McPherson, Christopher; Neder, Luciano; Nguyen, Phuong; Noss, Ardene; Nunziata, Raffaele; Ostrom, Quinn T; Palmer, Cheryl; Perin, Alessandro; Pollo, Bianca; Potapov, Alexander; Potapova, Olga; Rathmell, W Kimryn; Rotin, Daniil; Scarpace, Lisa; Schilero, Cathy; Senecal, Kelly; Shimmel, Kristen; Shurkhay, Vsevolod; Sifri, Suzanne; Singh, Rosy; Sloan, Andrew E; Smolenski, Kathy; Staugaitis, Susan M; Steele, Ruth; Thorne, Leigh; Tirapelli, Daniela P C; Unterberg, Andreas; Vallurupalli, Mahitha; Wang, Yun; Warnick, Ronald; Williams, Felicia; Wolinsky, Yingli; Bell, Sue; Rosenberg, Mara; Stewart, Chip; Huang, Franklin; Grimsby, Jonna L; Radenbaugh, Amie J; Zhang, Jianan

    2015-06-25

    Diffuse low-grade and intermediate-grade gliomas (which together make up the lower-grade gliomas, World Health Organization grades II and III) have highly variable clinical behavior that is not adequately predicted on the basis of histologic class. Some are indolent; others quickly progress to glioblastoma. The uncertainty is compounded by interobserver variability in histologic diagnosis. Mutations in IDH, TP53, and ATRX and codeletion of chromosome arms 1p and 19q (1p/19q codeletion) have been implicated as clinically relevant markers of lower-grade gliomas. We performed genomewide analyses of 293 lower-grade gliomas from adults, incorporating exome sequence, DNA copy number, DNA methylation, messenger RNA expression, microRNA expression, and targeted protein expression. These data were integrated and tested for correlation with clinical outcomes. Unsupervised clustering of mutations and data from RNA, DNA-copy-number, and DNA-methylation platforms uncovered concordant classification of three robust, nonoverlapping, prognostically significant subtypes of lower-grade glioma that were captured more accurately by IDH, 1p/19q, and TP53 status than by histologic class. Patients who had lower-grade gliomas with an IDH mutation and 1p/19q codeletion had the most favorable clinical outcomes. Their gliomas harbored mutations in CIC, FUBP1, NOTCH1, and the TERT promoter. Nearly all lower-grade gliomas with IDH mutations and no 1p/19q codeletion had mutations in TP53 (94%) and ATRX inactivation (86%). The large majority of lower-grade gliomas without an IDH mutation had genomic aberrations and clinical behavior strikingly similar to those found in primary glioblastoma. The integration of genomewide data from multiple platforms delineated three molecular classes of lower-grade gliomas that were more concordant with IDH, 1p/19q, and TP53 status than with histologic class. Lower-grade gliomas with an IDH mutation either had 1p/19q codeletion or carried a TP53 mutation. Most

  3. Bayesian phylogeny analysis of vertebrate serpins illustrates evolutionary conservation of the intron and indels based six groups classification system from lampreys for ∼500 MY

    Directory of Open Access Journals (Sweden)

    Abhishek Kumar

    2015-06-01

    Full Text Available The serpin superfamily is characterized by proteins that fold into a conserved tertiary structure and exploits a sophisticated and irreversible suicide-mechanism of inhibition. Vertebrate serpins are classified into six groups (V1–V6, based on three independent biological features—genomic organization, diagnostic amino acid sites and rare indels. However, this classification system was based on the limited number of mammalian genomes available. In this study, several non-mammalian genomes are used to validate this classification system using the powerful Bayesian phylogenetic method. This method supports the intron and indel based vertebrate classification and proves that serpins have been maintained from lampreys to humans for about 500 MY. Lampreys have fewer than 10 serpins, which expand into 36 serpins in humans. The two expanding groups V1 and V2 have SERPINB1/SERPINB6 and SERPINA8/SERPIND1 as the ancestral serpins, respectively. Large clusters of serpins are formed by local duplications of these serpins in tetrapod genomes. Interestingly, the ancestral HCII/SERPIND1 locus (nested within PIK4CA possesses group V4 serpin (A2APL1, homolog of α2-AP/SERPINF2 of lampreys; hence, pointing to the fact that group V4 might have originated from group V2. Additionally in this study, details of the phylogenetic history and genomic characteristics of vertebrate serpins are revisited.

  4. Quantitative and qualitative proteome characteristics extracted from in-depth integrated genomics and proteomics analysis

    NARCIS (Netherlands)

    Low, T.Y.; van Heesch, S.; van den Toorn, H.; Giansanti, P.; Cristobal, A.; Toonen, P.; Schafer, S.; Hubner, N.; van Breukelen, B.; Mohammed, S.; Cuppen, E.; Heck, A.J.R.; Guryev, V.

    2013-01-01

    Quantitative and qualitative protein characteristics are regulated at genomic, transcriptomic, and posttranscriptional levels. Here, we integrated in-depth transcriptome and proteome analyses of liver tissues from two rat strains to unravel the interactions within and between these layers. We

  5. Brassica ASTRA: an integrated database for Brassica genomic research.

    Science.gov (United States)

    Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-01-01

    Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.

  6. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    NEALSON, KENNETH H.

    2013-10-15

    products of dissimilatory iron reduction. Geochim. Cosmochim. Acta. 74:574-583. 10. Karpinets, T.V., A.Y Obraztsova, Y. Wang, D.D. Schmoyer, G.H. Kora, B.H. Park, M.H. Serres, M.F. Ropmine, M.L. Land, T.B. Kothe, J.K. Fredrickson, K.H. Nealson, and E.C. Uberbacher 2010. Conserved synteny at the protein family level reveals genes underlying Shewanella species? cold tolerance and predicts their novel phenotypes. Funct. Integr. Genomics 10: 97 ? 110. (DOI 10.1007/s10143-009-0142-y) 11. Bretschger, O., A.C.M. Cheung, F. Mansfeld, and K.H. Nealson. 2010. Comparative microbial fuel cell evaluations of Shewanella spp. Electroanalysis 22: 883-894. 12. McLean, J.S., G. Wanger, Y.A. Gorby, M. Wainstein, J. McQuaid, Shun?ichi Ishii, O. Bretschger, H. Beyanal, K.H. Nealson. 2010. Quantification of electron transfer rates to a solid phase electron acceptor through the stages of biofilm formation from single cells to multicellular communities. Env. Sci. Technol. 44:2721-2717. 13. El-Naggar, M., G. Wanger, K.M. Leung, T.D. Yuzvinsky, G. Southam, J. Yang, W.M. Lau, K.H. Nealson, and Y.A. Gorby. 2010. Electrical Transport Along Bacterial Nanowires from Shewanella oneidensis MR-1 Proc. Nat. Acad. Sci. USA 107:18127-18131. 14. Biffinger, J.C., L.A. Fitzgerald, R. Ray, B.J. Little, S.E. Lizewski, E.R. Petersen, B.R. Ringeisen, W.C. Sanders, P.E. Sheehan, J.J. Pietron, J.W. Baldwin, L.J. Nadeau, G.R. Johnson, M. Ribbens, S.E. Finkel, K.H. Nealson. 2010. The utility of Shewanella japonica for microbial fuel cells. Bioresource Technol. 102:290-297. 15. Rodionov, D. , C. Yang, X. Li, I. Rodionova, Y. Wang, A.Y. Obraztsova, O. P. Zagnitko, R. Overbeek, M. F. Romine, S. Reed, J.K. Fredrickson, K.H. Nealson, A.L. Osterman. 2010. Genomic encyclopedia of sugar utilization pathways in the Shewanella genus. BMC Genomics 2010, 11:494 16. Kan, J., L. Hsu, A.C.M. Cheung, M. Pirbazari, and K.H. Nealson. 2011. Current production by bacterial communities in microbial fuel cells enriched from wastewater sludge

  7. InDel polymorphisms in quantitative posttransplant chi merism evaluation

    Directory of Open Access Journals (Sweden)

    I. M. Barkhatov

    2016-01-01

    Full Text Available Reduction of minimal residual disease to undetectable levels is the key criterion for efficiency of allogeneic hematopoietic stem cell transplantation (alloHSCT, along with engraftment of transplanted cells with complete replacement of recipient hematopoiesis, i. e., full posttransplant chimerism. Among different approaches, molecular genetic techniques are preferable, being based on the analysis of highly polymorphic DNA sequences (short tandem repeats, STRs. However, this approach, despite its high specificity, has a limited sensitivity. In this regard, it seems appropriate to introduce more sensitive diagnostic solutions, in particular, analysis of insertion/deletion (InDel polymorphisms, followed by real-time detection of PCR products. The data obtained upon analysis of several genetic markers have shown higher sensitivity of this method. However, the deviations in the range of 10 to 90 % in evaluation of the cell ratios indicates the feasibility of using this approach just to evaluate the residual populations of recipient cells.

  8. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    Science.gov (United States)

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B

    2013-01-01

    The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  9. INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

    Directory of Open Access Journals (Sweden)

    Intikhab Alam

    Full Text Available The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.We developed a data warehouse system (INDIGO that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments.We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

  10. Modeling the integration of bacterial rRNA fragments into the human cancer genome.

    Science.gov (United States)

    Sieber, Karsten B; Gajer, Pawel; Dunning Hotopp, Julie C

    2016-03-21

    Cancer is a disease driven by the accumulation of genomic alterations, including the integration of exogenous DNA into the human somatic genome. We previously identified in silico evidence of DNA fragments from a Pseudomonas-like bacteria integrating into the 5'-UTR of four proto-oncogenes in stomach cancer sequencing data. The functional and biological consequences of these bacterial DNA integrations remain unknown. Modeling of these integrations suggests that the previously identified sequences cover most of the sequence flanking the junction between the bacterial and human DNA. Further examination of these reads reveals that these integrations are rich in guanine nucleotides and the integrated bacterial DNA may have complex transcript secondary structures. The models presented here lay the foundation for future experiments to test if bacterial DNA integrations alter the transcription of the human genes.

  11. Integrated genomics of Mucorales reveals novel therapeutic targets

    Science.gov (United States)

    Mucormycosis is a life-threatening infection caused by Mucorales fungi. We sequenced 30 fungal genomes and performed transcriptomics with three representative Rhizopus and Mucor strains with human airway epithelial cells during fungal invasion to reveal key host and fungal determinants contributing ...

  12. An Integrated Genetic and Cytogenetic Map of the Cucumber Genome

    Science.gov (United States)

    The Cucurbitaceae includes important crops as cucumber, melon, watermelon, and squash and pumpkin. However, few genetic and genomic resources are available for plant improvement. Some cucurbit species such as cucumber have a narrow genetic base, which impedes construction of saturated molecular li...

  13. Integrated genome-based studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Tiedje, James M. [Michigan State Univ., East Lansing, MI (United States); Konstantinidis, Kostas [Michigan State Univ., East Lansing, MI (United States); Worden, Mark [Michigan State Univ., East Lansing, MI (United States)

    2014-01-08

    The aim of the work reported is to study Shewanella population genomics, and to understand the evolution, ecophysiology, and speciation of Shewanella. The tasks supporting this aim are: to study genetic and ecophysiological bases defining the core and diversification of Shewanella species; to determine gene content patterns along redox gradients; and to Investigate the evolutionary processes, patterns and mechanisms of Shewanella.

  14. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma

    DEFF Research Database (Denmark)

    Sung, Wing-Kin; Zheng, Hancheng; Li, Shuyu

    2012-01-01

    To survey hepatitis B virus (HBV) integration in liver cancer genomes, we conducted massively parallel sequencing of 81 HBV-positive and 7 HBV-negative hepatocellular carcinomas (HCCs) and adjacent normal tissues. We found that HBV integration is observed more frequently in the tumors (86.4%) than...

  15. Integrated genome-based studies of Shewanella ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Segre Daniel; Beg Qasim

    2012-02-14

    This project was a component of the Shewanella Federation and, as such, contributed to the overall goal of applying the genomic tools to better understand eco-physiology and speciation of respiratory-versatile members of Shewanella genus. Our role at Boston University was to perform bioreactor and high throughput gene expression microarrays, and combine dynamic flux balance modeling with experimentally obtained transcriptional and gene expression datasets from different growth conditions. In the first part of project, we designed the S. oneidensis microarray probes for Affymetrix Inc. (based in California), then we identified the pathways of carbon utilization in the metal-reducing marine bacterium Shewanella oneidensis MR-1, using our newly designed high-density oligonucleotide Affymetrix microarray on Shewanella cells grown with various carbon sources. Next, using a combination of experimental and computational approaches, we built algorithm and methods to integrate the transcriptional and metabolic regulatory networks of S. oneidensis. Specifically, we combined mRNA microarray and metabolite measurements with statistical inference and dynamic flux balance analysis (dFBA) to study the transcriptional response of S. oneidensis MR-1 as it passes through exponential, stationary, and transition phases. By measuring time-dependent mRNA expression levels during batch growth of S. oneidensis MR-1 under two radically different nutrient compositions (minimal lactate and nutritionally rich LB medium), we obtain detailed snapshots of the regulatory strategies used by this bacterium to cope with gradually changing nutrient availability. In addition to traditional clustering, which provides a first indication of major regulatory trends and transcription factors activities, we developed and implemented a new computational approach for Dynamic Detection of Transcriptional Triggers (D2T2). This new method allows us to infer a putative topology of transcriptional dependencies

  16. GMATA: An Integrated Software Package for Genome-Scale SSR Mining, Marker Development and Viewing.

    Science.gov (United States)

    Wang, Xuewen; Wang, Le

    2016-01-01

    Simple sequence repeats (SSRs), also referred to as microsatellites, are highly variable tandem DNAs that are widely used as genetic markers. The increasing availability of whole-genome and transcript sequences provides information resources for SSR marker development. However, efficient software is required to efficiently identify and display SSR information along with other gene features at a genome scale. We developed novel software package Genome-wide Microsatellite Analyzing Tool Package (GMATA) integrating SSR mining, statistical analysis and plotting, marker design, polymorphism screening and marker transferability, and enabled simultaneously display SSR markers with other genome features. GMATA applies novel strategies for SSR analysis and primer design in large genomes, which allows GMATA to perform faster calculation and provides more accurate results than existing tools. Our package is also capable of processing DNA sequences of any size on a standard computer. GMATA is user friendly, only requires mouse clicks or types inputs on the command line, and is executable in multiple computing platforms. We demonstrated the application of GMATA in plants genomes and reveal a novel distribution pattern of SSRs in 15 grass genomes. The most abundant motifs are dimer GA/TC, the A/T monomer and the GCG/CGC trimer, rather than the rich G/C content in DNA sequence. We also revealed that SSR count is a linear to the chromosome length in fully assembled grass genomes. GMATA represents a powerful application tool that facilitates genomic sequence analyses. GAMTA is freely available at http://sourceforge.net/projects/gmata/?source=navbar.

  17. Group sparse canonical correlation analysis for genomic data integration.

    Science.gov (United States)

    Lin, Dongdong; Zhang, Jigang; Li, Jingyao; Calhoun, Vince D; Deng, Hong-Wen; Wang, Yu-Ping

    2013-08-12

    The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature

  18. Childhood Acute Lymphoblastic Leukemia: Integrating Genomics into Therapy

    Science.gov (United States)

    Tasian, Sarah K; Loh, Mignon L; Hunger, Stephen P

    2015-01-01

    Acute lymphoblastic leukemia (ALL), the most common malignancy of childhood, is a genetically complex entity that remains a major cause of childhood cancer-related mortality. Major advances in genomic and epigenomic profiling during the past decade have appreciably enhanced knowledge of the biology of de novo and relapsed ALL and have facilitated more precise risk stratification of patients. These achievements have also provided critical insights regarding potentially targetable lesions for development of new therapeutic approaches in the era of precision medicine. This review delineates the current genetic landscape of childhood ALL with emphasis upon patient outcomes with contemporary treatment regimens, as well as therapeutic implications of newly identified genomic alterations in specific subsets of ALL. PMID:26194091

  19. Site-Specific Integration of Exogenous Genes Using Genome Editing Technologies in Zebrafish

    Directory of Open Access Journals (Sweden)

    Atsuo Kawahara

    2016-05-01

    Full Text Available The zebrafish (Danio rerio is an ideal vertebrate model to investigate the developmental molecular mechanism of organogenesis and regeneration. Recent innovation in genome editing technologies, such as zinc finger nucleases (ZFNs, transcription activator-like effector nucleases (TALENs and the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR associated protein 9 (Cas9 system, have allowed researchers to generate diverse genomic modifications in whole animals and in cultured cells. The CRISPR/Cas9 and TALEN techniques frequently induce DNA double-strand breaks (DSBs at the targeted gene, resulting in frameshift-mediated gene disruption. As a useful application of genome editing technology, several groups have recently reported efficient site-specific integration of exogenous genes into targeted genomic loci. In this review, we provide an overview of TALEN- and CRISPR/Cas9-mediated site-specific integration of exogenous genes in zebrafish.

  20. International regulatory landscape and integration of corrective genome editing into in vitro fertilization.

    Science.gov (United States)

    Araki, Motoko; Ishii, Tetsuya

    2014-11-24

    Genome editing technology, including zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR)/Cas, has enabled far more efficient genetic engineering even in non-human primates. This biotechnology is more likely to develop into medicine for preventing a genetic disease if corrective genome editing is integrated into assisted reproductive technology, represented by in vitro fertilization. Although rapid advances in genome editing are expected to make germline gene correction feasible in a clinical setting, there are many issues that still need to be addressed before this could occur. We herein examine current status of genome editing in mammalian embryonic stem cells and zygotes and discuss potential issues in the international regulatory landscape regarding human germline gene modification. Moreover, we address some ethical and social issues that would be raised when each country considers whether genome editing-mediated germline gene correction for preventive medicine should be permitted.

  1. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    Science.gov (United States)

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  2. Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22

    Directory of Open Access Journals (Sweden)

    Volfovsky Natalia

    2009-01-01

    Full Text Available Abstract Background Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes. Results Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb region, (ii those with at least one exact copy found nearby, and (iii those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429 were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans. Conclusion Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are

  3. Integration sites of Epstein-Barr virus genome on chromosomes of human lymphoblastoid cell lines

    Energy Technology Data Exchange (ETDEWEB)

    Wuu, K.D.; Chen, Y.J.; Wang-Wuu, S. [Institute of Genetics, Taipei (Taiwan, Province of China)

    1994-09-01

    Epstein-Barr virus (EBV) is the pathogen of infectious mononucleosis. The viral genome is present in more than 95% of the African cases of Burkitt lymphoma and it is usually maintained in episomal form in the tumor cells. Viral integration has been described only for Nanalwa which is a Burkitt lymphoma cell line lacking episomes. In order to examine the role of EBV in the immortalization of human Blymphocytes, we investigated whether the EBV integration into the human genome is essential. If the integration does occur, we would like to know whether the integration is randomly distributed or whether the viral DNA integrates preferentially at certain sites. Fourteen in vitro immortalized human lymphoblastoid cell lines (LCLs) were examined by fluorescence in situ hybridization (FISH) with a biotinylated EBV BamHI w DNA fragment as probe. The episomal form of EBV DNA was found in all cells of these cell lines, while only about 65% of the cells have the integrated viral DNA. This might suggest that integration is not a pre-requisite for cell immortalization. Although all chromosomes, except Y, have been found with integrated viral genome, chromsomes 1 and 5 are the most frequent EBV DNA carrier (p<0.05). Nine chromosome bands, namely, 1p31, 1q31, 2q32, 3q13, 3q26, 5q14, 6q24, 7q31 and 12q21, are preferential targets for EBV integration (p<0.001). Eighty percent of the total 938 EBV hybridization signals were found to be at G-band-positive area. This suggests that the mechanism of EBV integration might be different from that of the retroviruses, which specifically integrate to G-band-negative areas. Thus, we conclude that the integration of EBV to host genome is non-random and it may have something to do with the structure of chromosome and DNA sequences.

  4. Genetic Characterization and Comparative Genome Analysis of Brucella melitensis Isolates from India

    Directory of Open Access Journals (Sweden)

    Sarwar Azam

    2016-01-01

    Full Text Available Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis.

  5. Figure 4 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Gene-list view of genomic data. The gene-list view allows users to compare data across a set of loci. The data in this figure includes copy number, mutation, and clinical data from 202 glioblastoma samples from TCGA. Adapted from Figure 7; Thorvaldsdottir H et al. 2012

  6. Figure 2 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Grouping and sorting genomic data in IGV. The IGV user interface displaying 202 glioblastoma samples from TCGA. Samples are grouped by tumor subtype (second annotation column) and data type (first annotation column) and sorted by copy number of the EGFR locus (middle column). Adapted from Figure 1; Robinson et al. 2011

  7. Figure 5 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Split-Screen View. The split-screen view is useful for exploring relationships of genomic features that are independent of chromosomal location. Color is used here to indicate mate pairs that map to different chromosomes, chromosomes 1 and 6, suggesting a translocation event. Adapted from Figure 8; Thorvaldsdottir H et al. 2012

  8. An Integrative Bioinformatics Framework for Genome-scale Multiple Level Network Reconstruction of Rice

    Directory of Open Access Journals (Sweden)

    Liu Lili

    2013-06-01

    Full Text Available Understanding how metabolic reactions translate the genome of an organism into its phenotype is a grand challenge in biology. Genome-wide association studies (GWAS statistically connect genotypes to phenotypes, without any recourse to known molecular interactions, whereas a molecular mechanistic description ties gene function to phenotype through gene regulatory networks (GRNs, protein-protein interactions (PPIs and molecular pathways. Integration of different regulatory information levels of an organism is expected to provide a good way for mapping genotypes to phenotypes. However, the lack of curated metabolic model of rice is blocking the exploration of genome-scale multi-level network reconstruction. Here, we have merged GRNs, PPIs and genome-scale metabolic networks (GSMNs approaches into a single framework for rice via omics’ regulatory information reconstruction and integration. Firstly, we reconstructed a genome-scale metabolic model, containing 4,462 function genes, 2,986 metabolites involved in 3,316 reactions, and compartmentalized into ten subcellular locations. Furthermore, 90,358 pairs of protein-protein interactions, 662,936 pairs of gene regulations and 1,763 microRNA-target interactions were integrated into the metabolic model. Eventually, a database was developped for systematically storing and retrieving the genome-scale multi-level network of rice. This provides a reference for understanding genotype-phenotype relationship of rice, and for analysis of its molecular regulatory network.

  9. A DNMT3A2-HDAC2 Complex Is Essential for Genomic Imprinting and Genome Integrity in Mouse Oocytes

    Directory of Open Access Journals (Sweden)

    Pengpeng Ma

    2015-11-01

    Full Text Available Maternal genomic imprints are established during oogenesis. Histone deacetylases (HDACs 1 and 2 are required for oocyte development in mouse, but their role in genomic imprinting is unknown. We find that Hdac1:Hdac2−/− double-mutant growing oocytes exhibit global DNA hypomethylation and fail to establish imprinting marks for Igf2r, Peg3, and Srnpn. Global hypomethylation correlates with increased retrotransposon expression and double-strand DNA breaks. Nuclear-associated DNMT3A2 is reduced in double-mutant oocytes, and injecting these oocytes with Hdac2 partially restores DNMT3A2 nuclear staining. DNMT3A2 co-immunoprecipitates with HDAC2 in mouse embryonic stem cells. Partial loss of nuclear DNMT3A2 and HDAC2 occurs in Sin3a−/− oocytes, which exhibit decreased DNA methylation of imprinting control regions for Igf2r and Srnpn, but not Peg3. These results suggest seminal roles of HDAC1/2 in establishing maternal genomic imprints and maintaining genomic integrity in oocytes mediated in part through a SIN3A complex that interacts with DNMT3A2.

  10. Porcine SOX9 Gene Expression Is Influenced by an 18 bp Indel in the 5'-Untranslated Region.

    Directory of Open Access Journals (Sweden)

    Bertram Brenig

    Full Text Available Sex determining region Y-box 9 (SOX9 is an important regulator of sex and skeletal development and is expressed in a variety of embryonal and adult tissues. Loss or gain of function resulting from mutations within the coding region or chromosomal aberrations of the SOX9 locus lead to a plethora of detrimental phenotypes in humans and animals. One of these phenotypes is the so-called male-to-female or female-to-male sex-reversal which has been observed in several mammals including pig, dog, cat, goat, horse, and deer. In 38,XX sex-reversal French Large White pigs, a genome-wide association study suggested SOX9 as the causal gene, although no functional mutations were identified in affected animals. However, besides others an 18 bp indel had been detected in the 5'-untranslated region of the SOX9 gene by comparing affected animals and controls. We have identified the same indel (Δ18 between position +247 bp and +266 bp downstream the transcription start site of the porcine SOX9 gene in four other pig breeds; i.e., German Large White, Laiwu Black, Bamei, and Erhualian. These animals have been genotyped in an attempt to identify candidate genes for porcine inguinal and/or scrotal hernia. Because the 18 bp segment in the wild type 5'-UTR harbours a highly conserved cAMP-response element (CRE half-site, we analysed its role in SOX9 expression in vitro. Competition and immunodepletion electromobility shift assays demonstrate that the CRE half-site is specifically recognized by CREB. Both binding of CREB to the wild type as well as the absence of the CRE half-site in Δ18 reduced expression efficiency in HEK293T, PK-15, and ATDC5 cells significantly. Transfection experiments of wild type and Δ18 SOX9 promoter luciferase constructs show a significant reduction of RNA and protein levels depending on the presence or absence of the 18 bp segment. Hence, the data presented here demonstrate that the 18 bp indel in the porcine SOX9 5'-UTR is of functional

  11. Roles of Werner syndrome protein in protection of genome integrity

    DEFF Research Database (Denmark)

    Rossi, Marie L; Ghosh, Avik K; Bohr, Vilhelm A

    2010-01-01

    Werner syndrome protein (WRN) is one of a family of five human RecQ helicases implicated in the maintenance of genome stability. The conserved RecQ family also includes RecQ1, Bloom syndrome protein (BLM), RecQ4, and RecQ5 in humans, as well as Sgs1 in Saccharomyces cerevisiae, Rqh1...... in Schizosaccharomyces pombe, and homologs in Caenorhabditis elegans, Xenopus laevis, and Drosophila melanogaster. Defects in three of the RecQ helicases, RecQ4, BLM, and WRN, cause human pathologies linked with cancer predisposition and premature aging. Mutations in the WRN gene are the causative factor of Werner...

  12. The Plant Genome Integrative Explorer Resource: PlantGenIE.org.

    Science.gov (United States)

    Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R

    2015-12-01

    Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  13. An integrated CRISPR Bombyx mori genome editing system with improved efficiency and expanded target sites.

    Science.gov (United States)

    Ma, Sanyuan; Liu, Yue; Liu, Yuanyuan; Chang, Jiasong; Zhang, Tong; Wang, Xiaogang; Shi, Run; Lu, Wei; Xia, Xiaojuan; Zhao, Ping; Xia, Qingyou

    2017-04-01

    Genome editing enabled unprecedented new opportunities for targeted genomic engineering of a wide variety of organisms ranging from microbes, plants, animals and even human embryos. The serial establishing and rapid applications of genome editing tools significantly accelerated Bombyx mori (B. mori) research during the past years. However, the only CRISPR system in B. mori was the commonly used SpCas9, which only recognize target sites containing NGG PAM sequence. In the present study, we first improve the efficiency of our previous established SpCas9 system by 3.5 folds. The improved high efficiency was also observed at several loci in both BmNs cells and B. mori embryos. Then to expand the target sites, we showed that two newly discovered CRISPR system, SaCas9 and AsCpf1, could also induce highly efficient site-specific genome editing in BmNs cells, and constructed an integrated CRISPR system. Genome-wide analysis of targetable sites was further conducted and showed that the integrated system cover 69,144,399 sites in B. mori genome, and one site could be found in every 6.5 bp. The efficiency and resolution of this CRISPR platform will probably accelerate both fundamental researches and applicable studies in B. mori, and perhaps other insects. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Human papillomavirus genome integration in squamous carcinogenesis: what have next-generation sequencing studies taught us?

    Science.gov (United States)

    Groves, Ian J; Coleman, Nicholas

    2018-05-01

    Human papillomavirus (HPV) infection is associated with ∼5% of all human cancers, including a range of squamous cell carcinomas. Persistent infection by high-risk HPVs (HRHPVs) is associated with the integration of virus genomes (which are usually stably maintained as extrachromosomal episomes) into host chromosomes. Although HRHPV integration rates differ across human sites of infection, this process appears to be an important event in HPV-associated neoplastic progression, leading to deregulation of virus oncogene expression, host gene expression modulation, and further genomic instability. However, the mechanisms by which HRHPV integration occur and by which the subsequent gene expression changes take place are incompletely understood. The advent of next-generation sequencing (NGS) of both RNA and DNA has allowed powerful interrogation of the association of HRHPVs with human disease, including precise determination of the sites of integration and the genomic rearrangements at integration loci. In turn, these data have indicated that integration occurs through two main mechanisms: looping integration and direct insertion. Improved understanding of integration sites is allowing further investigation of the factors that provide a competitive advantage to some integrants during disease progression. Furthermore, advanced approaches to the generation of genome-wide samples have given novel insights into the three-dimensional interactions within the nucleus, which could act as another layer of epigenetic control of both virus and host transcription. It is hoped that further advances in NGS techniques and analysis will not only allow the examination of further unanswered questions regarding HPV infection, but also direct new approaches to treating HPV-associated human disease. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John

  15. PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.

    Science.gov (United States)

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X

    2017-01-01

    Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.

  16. Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.

    Science.gov (United States)

    Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup

    2011-09-01

    The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.

  17. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  18. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species.

    Science.gov (United States)

    Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C

    2016-01-01

    Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  19. Prolonged Integration Site Selection of a Lentiviral Vector in the Genome of Human Keratinocytes.

    Science.gov (United States)

    Qian, Wei; Wang, Yong; Li, Rui-Fu; Zhou, Xin; Liu, Jing; Peng, Dai-Zhi

    2017-03-03

    BACKGROUND Lentiviral vectors have been successfully used for human skin cell gene transfer studies. Defining the selection of integration sites for retroviral vectors in the host genome is crucial in risk assessment analysis of gene therapy. However, genome-wide analyses of lentiviral integration sites in human keratinocytes, especially after prolonged growth, are poorly understood. MATERIAL AND METHODS In this study, 874 unique lentiviral vector integration sites in human HaCaT keratinocytes after long-term culture were identified and analyzed with the online tool GTSG-QuickMap and SPSS software. RESULTS The data indicated that lentiviral vectors showed integration site preferences for genes and gene-rich regions. CONCLUSIONS This study will likely assist in determining the relative risks of the lentiviral vector system and in the design of a safe lentiviral vector system in the gene therapy of skin diseases.

  20. Improved bacteriophage genome data is necessary for integrating viral and bacterial ecology.

    Science.gov (United States)

    Bibby, Kyle

    2014-02-01

    The recent rise in "omics"-enabled approaches has lead to improved understanding in many areas of microbial ecology. However, despite the importance that viruses play in a broad microbial ecology context, viral ecology remains largely not integrated into high-throughput microbial ecology studies. A fundamental hindrance to the integration of viral ecology into omics-enabled microbial ecology studies is the lack of suitable reference bacteriophage genomes in reference databases-currently, only 0.001% of bacteriophage diversity is represented in genome sequence databases. This commentary serves to highlight this issue and to promote bacteriophage genome sequencing as a valuable scientific undertaking to both better understand bacteriophage diversity and move towards a more holistic view of microbial ecology.

  1. Analysis of the indel at the ARMS2 3′UTR in age-related macular degeneration

    Science.gov (United States)

    Wang, Gaofeng; Spencer, Kylee L.; Scott, William K.; Whitehead, Patrice; Court, Brenda L.; Ayala-Haedo, Juan; Mayo, Ping; Schwartz, Stephen G.; Kovach, Jaclyn L.; Gallins, Paul; Polk, Monica; Agarwal, Anita; Postel, Eric A.; Haines, Jonathan L.; Pericak-Vance, Margaret A.

    2010-01-01

    Controversy remains as to which gene at the chromosome 10q26 locus confers risk for age-related macular degeneration (AMD) and statistical genetic analysis is confounded by the strong linkage disequilibrium (LD) across the region. Functional analysis of related genetic variations could solve this puzzle. Recently Fritsche et al. reported that AMD is associated with unstable ARMS2 transcripts possibly caused by a complex insertion/deletion (indel; consisting of a 443 bp deletion and an adjacent 54 bp insertion) in its 3′UTR (untranslated region). To validate this indel, we sequenced our samples. We found that this indel is even more complex and is composed of two side-by-side indels separated by 17 bp: (1) 9 bp deletion with 10bp insertion; (2) 417 bp deletion with 27 bp insertion. The indel is significantly associated with the risk of AMD, but is also in strong LD with the non-synonymous single nucleotide polymorphism (SNP) rs10490924 (A69S). We also found that ARMS2 is expressed not only in placenta and retina but also in multiple human tissues. Using quantitative PCR, we found no correlation between the indel and ARMS2 mRNA level in human retina and blood samples. The lack of functional effects of the 3′UTR indel, the amino acid substitution of rs10490924 (A69S) and strong LD between them suggest that A69S, not the indel is the variant that confers risk of AMD. To our knowledge, it is the first time it's been shown that ARMS2 is widely expressed in human tissues. Conclusively, the indel at 3′UTR of ARMS2 actually contains two side-by-side indels. The indels are associated with risk of AMD, but not correlated with ARMS2 mRNA level. PMID:20182747

  2. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models

    DEFF Research Database (Denmark)

    King, Zachary A.; Lu, Justin; Dräger, Andreas

    2016-01-01

    Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repo...

  3. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer

    NARCIS (Netherlands)

    Peifer, Martin; Fernandez-Cuesta, Lynnette; Sos, Martin L.; George, Julie; Seidel, Danila; Kasper, Lawryn H.; Plenker, Dennis; Leenders, Frauke; Sun, Ruping; Zander, Thomas; Menon, Roopika; Koker, Mirjam; Dahmen, Ilona; Mueller, Christian; Di Cerbo, Vincenzo; Schildhaus, Hans-Ulrich; Altmueller, Janine; Baessmann, Ingelore; Becker, Christian; de Wilde, Bram; Vandesompele, Jo; Boehm, Diana; Ansen, Sascha; Gabler, Franziska; Wilkening, Ines; Heynck, Stefanie; Heuckmann, Johannes M.; Lu, Xin; Carter, Scott L.; Cibulskis, Kristian; Banerji, Shantanu; Getz, Gad; Park, Kwon-Sik; Rauh, Daniel; Gruetter, Christian; Fischer, Matthias; Pasqualucci, Laura; Wright, Gavin; Wainer, Zoe; Russell, Prudence; Petersen, Iver; Chen, Yuan; Stoelben, Erich; Ludwig, Corinna; Schnabel, Philipp; Hoffmann, Hans; Muley, Thomas; Brockmann, Michael; Engel-Riedel, Walburga; Muscarella, Lucia A.; Fazio, Vito M.; Groen, Harry; Timens, Wim; Sietsma, Hannie; Thunnissen, Erik; Smit, Egbert; Heideman, Danielle A. M.; Snijders, Peter J. F.; Cappuzzo, Federico; Ligorio, Claudia; Damiani, Stefania; Field, John; Solberg, Steinar; Brustugun, Odd Terje; Lund-Iversen, Marius; Saenger, Joerg; Clement, Joachim H.; Soltermann, Alex; Moch, Holger; Weder, Walter; Solomon, Benjamin; Soria, Jean-Charles; Validire, Pierre; Besse, Benjamin; Brambilla, Elisabeth; Brambilla, Christian; Lantuejoul, Sylvie; Lorimier, Philippe; Schneider, Peter M.; Hallek, Michael; Pao, William; Meyerson, Matthew; Sage, Julien; Shendure, Jay; Schneider, Robert; Buettner, Reinhard; Wolf, Juergen; Nuernberg, Peter; Perner, Sven; Heukamp, Lukas C.; Brindle, Paul K.; Haas, Stefan; Thomas, Roman K.

    2012-01-01

    Small-cell lung cancer (SCLC) is an aggressive lung tumor subtype with poor prognosis(1-3). We sequenced 29 SCLC exomes, 2 genomes and 15 transcriptomes and found an extremely high mutation rate of 7.4 +/- 1 protein-changing mutations per million base pairs. Therefore, we conducted integrated

  4. Filling the knowledge gap: Integrating quantitative genetics and genomics in graduate education and outreach

    Science.gov (United States)

    The genomics revolution provides vital tools to address global food security. Yet to be incorporated into livestock breeding, molecular techniques need to be integrated into a quantitative genetics framework. Within the U.S., with shrinking faculty numbers with the requisite skills, the capacity to ...

  5. Quantitative and Qualitative Proteome Characteristics Extracted from In-Depth Integrated Genomics and Proteomics Analysis

    NARCIS (Netherlands)

    Low, Teck Yew; van Heesch, Sebastiaan; van den Toorn, Henk; Giansanti, Piero; Cristobal, Alba; Toonen, Pim; Schafer, Sebastian; Huebner, Norbert; van Breukelen, Bas; Mohammed, Shabaz; Cuppen, Edwin; Heck, Albert J. R.; Guryev, Victor

    2013-01-01

    Quantitative and qualitative protein characteristics are regulated at genomic, transcriptomic, and post-transcriptional levels. Here, we integrated in-depth transcriptome and proteome analyses of liver tissues from two rat strains to unravel the interactions within and between these layers. We

  6. Integrative Genomic Analysis of Cholangiocarcinoma Identifies Distinct IDH-Mutant Molecular Profiles

    DEFF Research Database (Denmark)

    Farshidfar, Farshad; Zheng, Siyuan; Gingras, Marie-Claude

    2017-01-01

    Cholangiocarcinoma (CCA) is an aggressive malignancy of the bile ducts, with poor prognosis and limited treatment options. Here, we describe the integrated analysis of somatic mutations, RNA expression, copy number, and DNA methylation by The Cancer Genome Atlas of a set of predominantly intrahep...

  7. Nucleotide excision repair : a multi-step mechanism required to maintain genome integrity

    NARCIS (Netherlands)

    Moser, Jill

    2010-01-01

    DNA is continuously exposed to exogenous and genotoxic insults including ionizing and ultraviolet radiation as well as chemical agents. DNA damage can compromise the integrity of the genome and have potentially deleterious effects. Ultraviolet light (UV) can induce the formation of helix distorting

  8. Bos taurus strain:dairy beef (cattle): 1000 Bull Genomes Run 2, Bovine Whole Genome Sequence

    NARCIS (Netherlands)

    Bouwman, A.C.; Daetwyler, H.D.; Chamberlain, Amanda J.; Ponce, Carla Hurtado; Sargolzaei, Mehdi; Schenkel, Flavio S.; Sahana, Goutam; Govignon-Gion, Armelle; Boitard, Simon; Dolezal, Marlies; Pausch, Hubert; Brøndum, Rasmus F.; Bowman, Phil J.; Thomsen, Bo; Guldbrandtsen, Bernt; Lund, Mogens S.; Servin, Bertrand; Garrick, Dorian J.; Reecy, James M.; Vilkki, Johanna; Bagnato, Alessandro; Wang, Min; Hoff, Jesse L.; Schnabel, Robert D.; Taylor, Jeremy F.; Vinkhuyzen, Anna A.E.; Panitz, Frank; Bendixen, Christian; Holm, Lars-Erik; Gredler, Birgit; Hozé, Chris; Boussaha, Mekki; Sanchez, Marie Pierre; Rocha, Dominique; Capitan, Aurelien; Tribout, Thierry; Barbat, Anne; Croiseau, Pascal; Drögemüller, Cord; Jagannathan, Vidhya; Vander Jagt, Christy; Crowley, John J.; Bieber, Anna; Purfield, Deirdre C.; Berry, Donagh P.; Emmerling, Reiner; Götz, Kay Uwe; Frischknecht, Mirjam; Russ, Ingolf; Sölkner, Johann; Tassell, van Curtis P.; Fries, Ruedi; Stothard, Paul; Veerkamp, R.F.; Boichard, Didier; Goddard, Mike E.; Hayes, Ben J.

    2014-01-01

    Whole genome sequence data (BAM format) of 234 bovine individuals aligned to UMD3.1. The aim of the study was to identify genetic variants (SNPs and indels) for downstream analysis such as imputation, GWAS, and detection of lethal recessives. Additional sequences for later 1000 bull genomes runs can

  9. Dynamics of Indel Profiles Induced by Various CRISPR/Cas9 Delivery Methods

    DEFF Research Database (Denmark)

    Kosicki, Michael; Rajan, Sandeep S; Lorenzetti, Flaminia C

    2017-01-01

    The introduction of CRISPR/Cas9 gene editing in mammalian cells is a scientific breakthrough, which has greatly affected basic research and gene therapy. The simplicity and general access to CRISPR/Cas9 reagents has in an unprecedented manner "democratized" gene targeting in biomedical research...... approach. In this study we review the most commonly used indel detection methods and using a robust, sensitive, and cost efficient Indel Detection by Amplicon Analysis method, we have investigated the impact of the most commonly used CRISPR/Cas9 delivery formats, including lentivirus transduction, plasmid...

  10. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

    Science.gov (United States)

    Belyi, Vladimir A; Levine, Arnold J; Skalka, Anna Marie

    2010-07-29

    Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected), later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important biological

  11. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes.

    Directory of Open Access Journals (Sweden)

    Vladimir A Belyi

    2010-07-01

    Full Text Available Vertebrate genomes contain numerous copies of retroviral sequences, acquired over the course of evolution. Until recently they were thought to be the only type of RNA viruses to be so represented, because integration of a DNA copy of their genome is required for their replication. In this study, an extensive sequence comparison was conducted in which 5,666 viral genes from all known non-retroviral families with single-stranded RNA genomes were matched against the germline genomes of 48 vertebrate species, to determine if such viruses could also contribute to the vertebrate genetic heritage. In 19 of the tested vertebrate species, we discovered as many as 80 high-confidence examples of genomic DNA sequences that appear to be derived, as long ago as 40 million years, from ancestral members of 4 currently circulating virus families with single strand RNA genomes. Surprisingly, almost all of the sequences are related to only two families in the Order Mononegavirales: the Bornaviruses and the Filoviruses, which cause lethal neurological disease and hemorrhagic fevers, respectively. Based on signature landmarks some, and perhaps all, of the endogenous virus-like DNA sequences appear to be LINE element-facilitated integrations derived from viral mRNAs. The integrations represent genes that encode viral nucleocapsid, RNA-dependent-RNA-polymerase, matrix and, possibly, glycoproteins. Integrations are generally limited to one or very few copies of a related viral gene per species, suggesting that once the initial germline integration was obtained (or selected, later integrations failed or provided little advantage to the host. The conservation of relatively long open reading frames for several of the endogenous sequences, the virus-like protein regions represented, and a potential correlation between their presence and a species' resistance to the diseases caused by these pathogens, are consistent with the notion that their products provide some important

  12. Neutral genomic microevolution of a recently emerged pathogen, Salmonella enterica serovar Agona.

    Directory of Open Access Journals (Sweden)

    Zhemin Zhou

    2013-04-01

    Full Text Available Salmonella enterica serovar Agona has caused multiple food-borne outbreaks of gastroenteritis since it was first isolated in 1952. We analyzed the genomes of 73 isolates from global sources, comparing five distinct outbreaks with sporadic infections as well as food contamination and the environment. Agona consists of three lineages with minimal mutational diversity: only 846 single nucleotide polymorphisms (SNPs have accumulated in the non-repetitive, core genome since Agona evolved in 1932 and subsequently underwent a major population expansion in the 1960s. Homologous recombination with other serovars of S. enterica imported 42 recombinational tracts (360 kb in 5/143 nodes within the genealogy, which resulted in 3,164 additional SNPs. In contrast to this paucity of genetic diversity, Agona is highly diverse according to pulsed-field gel electrophoresis (PFGE, which is used to assign isolates to outbreaks. PFGE diversity reflects a highly dynamic accessory genome associated with the gain or loss (indels of 51 bacteriophages, 10 plasmids, and 6 integrative conjugational elements (ICE/IMEs, but did not correlate uniquely with outbreaks. Unlike the core genome, indels occurred repeatedly in independent nodes (homoplasies, resulting in inaccurate PFGE genealogies. The accessory genome contained only few cargo genes relevant to infection, other than antibiotic resistance. Thus, most of the genetic diversity within this recently emerged pathogen reflects changes in the accessory genome, or is due to recombination, but these changes seemed to reflect neutral processes rather than Darwinian selection. Each outbreak was caused by an independent clade, without universal, outbreak-associated genomic features, and none of the variable genes in the pan-genome seemed to be associated with an ability to cause outbreaks.

  13. Evolution of endogenous non-retroviral genes integrated into plant genomes

    Directory of Open Access Journals (Sweden)

    Hyosub Chu

    2014-08-01

    Full Text Available Numerous comparative genome analyses have revealed the wide extent of horizontal gene transfer (HGT in living organisms, which contributes to their evolution and genetic diversity. Viruses play important roles in HGT. Endogenous viral elements (EVEs are defined as viral DNA sequences present within the genomes of non-viral organisms. In eukaryotic cells, the majority of EVEs are derived from RNA viruses using reverse transcription. In contrast, endogenous non-retroviral elements (ENREs are poorly studied. However, the increasing availability of genomic data and the rapid development of bioinformatics tools have enabled the identification of several ENREs in various eukaryotic organisms. To date, a small number of ENREs integrated into plant genomes have been identified. Of the known non-retroviruses, most identified ENREs are derived from double-strand (ds RNA viruses, followed by single-strand (ss DNA and ssRNA viruses. At least eight virus families have been identified. Of these, viruses in the family Partitiviridae are dominant, followed by viruses of the families Chrysoviridae and Geminiviridae. The identified ENREs have been primarily identified in eudicots, followed by monocots. In this review, we briefly discuss the current view on non-retroviral sequences integrated into plant genomes that are associated with plant-virus evolution and their possible roles in antiviral resistance.

  14. TIGER: Toolbox for integrating genome-scale metabolic models, expression data, and transcriptional regulatory networks

    Directory of Open Access Journals (Sweden)

    Jensen Paul A

    2011-09-01

    Full Text Available Abstract Background Several methods have been developed for analyzing genome-scale models of metabolism and transcriptional regulation. Many of these methods, such as Flux Balance Analysis, use constrained optimization to predict relationships between metabolic flux and the genes that encode and regulate enzyme activity. Recently, mixed integer programming has been used to encode these gene-protein-reaction (GPR relationships into a single optimization problem, but these techniques are often of limited generality and lack a tool for automating the conversion of rules to a coupled regulatory/metabolic model. Results We present TIGER, a Toolbox for Integrating Genome-scale Metabolism, Expression, and Regulation. TIGER converts a series of generalized, Boolean or multilevel rules into a set of mixed integer inequalities. The package also includes implementations of existing algorithms to integrate high-throughput expression data with genome-scale models of metabolism and transcriptional regulation. We demonstrate how TIGER automates the coupling of a genome-scale metabolic model with GPR logic and models of transcriptional regulation, thereby serving as a platform for algorithm development and large-scale metabolic analysis. Additionally, we demonstrate how TIGER's algorithms can be used to identify inconsistencies and improve existing models of transcriptional regulation with examples from the reconstructed transcriptional regulatory network of Saccharomyces cerevisiae. Conclusion The TIGER package provides a consistent platform for algorithm development and extending existing genome-scale metabolic models with regulatory networks and high-throughput data.

  15. The nucleolus—guardian of cellular homeostasis and genome integrity.

    Science.gov (United States)

    Grummt, Ingrid

    2013-12-01

    All organisms sense and respond to conditions that stress their homeostasis by downregulating the synthesis of rRNA and ribosome biogenesis, thus designating the nucleolus as the central hub in coordinating the cellular stress response. One of the most intriguing roles of the nucleolus, long regarded as a mere ribosome-producing factory, is its participation in monitoring cellular stress signals and transmitting them to the RNA polymerase I (Pol I) transcription machinery. As rRNA synthesis is a most energy-consuming process, switching off transcription of rRNA genes is an effective way of saving the energy required to maintain cellular homeostasis during acute stress. The Pol I transcription machinery is the key convergence point that collects and integrates a vast array of information from cellular signaling cascades to regulate ribosome production which, in turn, guides cell growth and proliferation. This review focuses on the mechanisms that link cell physiology to rDNA silencing, a prerequisite for nucleolar integrity and cell survival.

  16. Typing of 30 insertion/deletions in Danes using the first commercial indel kit-Mentype(®) DIPplex

    DEFF Research Database (Denmark)

    Friis, Susanne Lunøe; Børsting, Claus; Rockenbauer, Eszter

    2012-01-01

    and all amplicon lengths were shorter than 160bp. Full indel profiles were generated from as little as 100pg of DNA. A total of 117 individuals from Danish paternity cases were successfully typed. No deviation from Hardy-Weinberg equilibrium was observed for any of the indels. The combined mean match...

  17. An integrated clinical and genomic information system for cancer precision medicine.

    Science.gov (United States)

    Jang, Yeongjun; Choi, Taekjin; Kim, Jongho; Park, Jisub; Seo, Jihae; Kim, Sangok; Kwon, Yeajee; Lee, Seungjae; Lee, Sanghyuk

    2018-04-20

    Increasing affordability of next-generation sequencing (NGS) has created an opportunity for realizing genomically-informed personalized cancer therapy as a path to precision oncology. However, the complex nature of genomic information presents a huge challenge for clinicians in interpreting the patient's genomic alterations and selecting the optimum approved or investigational therapy. An elaborate and practical information system is urgently needed to support clinical decision as well as to test clinical hypotheses quickly. Here, we present an integrated clinical and genomic information system (CGIS) based on NGS data analyses. Major components include modules for handling clinical data, NGS data processing, variant annotation and prioritization, drug-target-pathway analysis, and population cohort explorer. We built a comprehensive knowledgebase of genes, variants, drugs by collecting annotated information from public and in-house resources. Structured reports for molecular pathology are generated using standardized terminology in order to help clinicians interpret genomic variants and utilize them for targeted cancer therapy. We also implemented many features useful for testing hypotheses to develop prognostic markers from mutation and gene expression data. Our CGIS software is an attempt to provide useful information for both clinicians and scientists who want to explore genomic information for precision oncology.

  18. Integrated genomics of ovarian xenograft tumor progression and chemotherapy response

    International Nuclear Information System (INIS)

    Stuckey, Ashley; Brodsky, Alexander S; Fischer, Andrew; Miller, Daniel H; Hillenmeyer, Sara; Kim, Kyu K; Ritz, Anna; Singh, Rakesh K; Raphael, Benjamin J; Brard, Laurent

    2011-01-01

    Ovarian cancer is the most deadly gynecological cancer with a very poor prognosis. Xenograft mouse models have proven to be one very useful tool in testing candidate therapeutic agents and gene function in vivo. In this study we identify genes and gene networks important for the efficacy of a pre-clinical anti-tumor therapeutic, MT19c. In order to understand how ovarian xenograft tumors may be growing and responding to anti-tumor therapeutics, we used genome-wide mRNA expression and DNA copy number measurements to identify key genes and pathways that may be critical for SKOV-3 xenograft tumor progression. We compared SKOV-3 xenografts treated with the ergocalciferol derived, MT19c, to untreated tumors collected at multiple time points. Cell viability assays were used to test the function of the PPARγ agonist, Rosiglitazone, on SKOV-3 cell growth. These data indicate that a number of known survival and growth pathways including Notch signaling and general apoptosis factors are differentially expressed in treated vs. untreated xenografts. As tumors grow, cell cycle and DNA replication genes show increased expression, consistent with faster growth. The steroid nuclear receptor, PPARγ, was significantly up-regulated in MT19c treated xenografts. Surprisingly, stimulation of PPARγ with Rosiglitazone reduced the efficacy of MT19c and cisplatin suggesting that PPARγ is regulating a survival pathway in SKOV-3 cells. To identify which genes may be important for tumor growth and treatment response, we observed that MT19c down-regulates some high copy number genes and stimulates expression of some low copy number genes suggesting that these genes are particularly important for SKOV-3 xenograft growth and survival. We have characterized the time dependent responses of ovarian xenograft tumors to the vitamin D analog, MT19c. Our results suggest that PPARγ promotes survival for some ovarian tumor cells. We propose that a combination of regulated expression and copy number

  19. Integration of expression data in genome-scale metabolic network reconstructions

    Directory of Open Access Journals (Sweden)

    Anna S. Blazier

    2012-08-01

    Full Text Available With the advent of high-throughput technologies, the field of systems biology has amassed an abundance of omics data, quantifying thousands of cellular components across a variety of scales, ranging from mRNA transcript levels to metabolite quantities. Methods are needed to not only integrate this omics data but to also use this data to heighten the predictive capabilities of computational models. Several recent studies have successfully demonstrated how flux balance analysis (FBA, a constraint-based modeling approach, can be used to integrate transcriptomic data into genome-scale metabolic network reconstructions to generate predictive computational models. In this review, we summarize such FBA-based methods for integrating expression data into genome-scale metabolic network reconstructions, highlighting their advantages as well as their limitations.

  20. Identifying candidate driver genes by integrative ovarian cancer genomics data

    Science.gov (United States)

    Lu, Xinguo; Lu, Jibo

    2017-08-01

    Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.

  1. Genome-wide analysis reveals the extent of EAV-HP integration in domestic chicken.

    Science.gov (United States)

    Wragg, David; Mason, Andrew S; Yu, Le; Kuo, Richard; Lawal, Raman A; Desta, Takele Taye; Mwacharo, Joram M; Cho, Chang-Yeon; Kemp, Steve; Burt, David W; Hanotte, Olivier

    2015-10-14

    EAV-HP is an ancient retrovirus pre-dating Gallus speciation, which continues to circulate in modern chicken populations, and led to the emergence of avian leukosis virus subgroup J causing significant economic losses to the poultry industry. We mapped EAV-HP integration sites in Ethiopian village chickens, a Silkie, Taiwan Country chicken, red junglefowl Gallus gallus and several inbred experimental lines using whole-genome sequence data. An average of 75.22 ± 9.52 integration sites per bird were identified, which collectively group into 279 intervals of which 5 % are common to 90 % of the genomes analysed and are suggestive of pre-domestication integration events. More than a third of intervals are specific to individual genomes, supporting active circulation of EAV-HP in modern chickens. Interval density is correlated with chromosome length (P < 2.31(-6)), and 27 % of intervals are located within 5 kb of a transcript. Functional annotation clustering of genes reveals enrichment for immune-related functions (P < 0.05). Our results illustrate a non-random distribution of EAV-HP in the genome, emphasising the importance it may have played in the adaptation of the species, and provide a platform from which to extend investigations on the co-evolutionary significance of endogenous retroviral genera with their hosts.

  2. Annotating novel genes by integrating synthetic lethals and genomic information

    Directory of Open Access Journals (Sweden)

    Faty Mahamadou

    2008-01-01

    Full Text Available Abstract Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process.

  3. Efficient genome-wide genotyping strategies and data integration in crop plants.

    Science.gov (United States)

    Torkamaneh, Davoud; Boyle, Brian; Belzile, François

    2018-03-01

    Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.

  4. Drosophila Sld5 is essential for normal cell cycle progression and maintenance of genomic integrity

    Energy Technology Data Exchange (ETDEWEB)

    Gouge, Catherine A. [Department of Biology, East Carolina University East Carolina University, Greenville, NC 27858 (United States); Christensen, Tim W., E-mail: christensent@ecu.edu [Department of Biology, East Carolina University East Carolina University, Greenville, NC 27858 (United States)

    2010-09-10

    Research highlights: {yields} Drosophila Sld5 interacts with Psf1, PPsf2, and Mcm10. {yields} Haploinsufficiency of Sld5 leads to M-phase delay and genomic instability. {yields} Sld5 is also required for normal S phase progression. -- Abstract: Essential for the normal functioning of a cell is the maintenance of genomic integrity. Failure in this process is often catastrophic for the organism, leading to cell death or mis-proliferation. Central to genomic integrity is the faithful replication of DNA during S phase. The GINS complex has recently come to light as a critical player in DNA replication through stabilization of MCM2-7 and Cdc45 as a member of the CMG complex which is likely responsible for the processivity of helicase activity during S phase. The GINS complex is made up of 4 members in a 1:1:1:1 ratio: Psf1, Psf2, Psf3, And Sld5. Here we present the first analysis of the function of the Sld5 subunit in a multicellular organism. We show that Drosophila Sld5 interacts with Psf1, Psf2, and Mcm10 and that mutations in Sld5 lead to M and S phase delays with chromosomes exhibiting hallmarks of genomic instability.

  5. Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences.

    Science.gov (United States)

    Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A

    2016-10-15

    Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. RGmatch: matching genomic regions to proximal genes in omics data integration

    Directory of Open Access Journals (Sweden)

    Pedro Furió-Tarí

    2016-11-01

    Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.

  7. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  8. FISH Oracle 2: a web server for integrative visualization of genomic data in cancer research.

    Science.gov (United States)

    Mader, Malte; Simon, Ronald; Kurtz, Stefan

    2014-03-31

    A comprehensive view on all relevant genomic data is instrumental for understanding the complex patterns of molecular alterations typically found in cancer cells. One of the most effective ways to rapidly obtain an overview of genomic alterations in large amounts of genomic data is the integrative visualization of genomic events. We developed FISH Oracle 2, a web server for the interactive visualization of different kinds of downstream processed genomics data typically available in cancer research. A powerful search interface and a fast visualization engine provide a highly interactive visualization for such data. High quality image export enables the life scientist to easily communicate their results. A comprehensive data administration allows to keep track of the available data sets. We applied FISH Oracle 2 to published data and found evidence that, in colorectal cancer cells, the gene TTC28 may be inactivated in two different ways, a fact that has not been published before. The interactive nature of FISH Oracle 2 and the possibility to store, select and visualize large amounts of downstream processed data support life scientists in generating hypotheses. The export of high quality images supports explanatory data visualization, simplifying the communication of new biological findings. A FISH Oracle 2 demo server and the software is available at http://www.zbh.uni-hamburg.de/fishoracle.

  9. Genome scale models of yeast: towards standardized evaluation and consistent omic integration

    DEFF Research Database (Denmark)

    Sanchez, Benjamin J.; Nielsen, Jens

    2015-01-01

    Genome scale models (GEMs) have enabled remarkable advances in systems biology, acting as functional databases of metabolism, and as scaffolds for the contextualization of high-throughput data. In the case of Saccharomyces cerevisiae (budding yeast), several GEMs have been published and are curre......Genome scale models (GEMs) have enabled remarkable advances in systems biology, acting as functional databases of metabolism, and as scaffolds for the contextualization of high-throughput data. In the case of Saccharomyces cerevisiae (budding yeast), several GEMs have been published...... in which all levels of omics data (from gene expression to flux) have been integrated in yeast GEMs. Relevant conclusions and current challenges for both GEM evaluation and omic integration are highlighted....

  10. The RNAPII-CTD Maintains Genome Integrity through Inhibition of Retrotransposon Gene Expression and Transposition.

    Directory of Open Access Journals (Sweden)

    Maria J Aristizabal

    2015-10-01

    Full Text Available RNA polymerase II (RNAPII contains a unique C-terminal domain that is composed of heptapeptide repeats and which plays important regulatory roles during gene expression. RNAPII is responsible for the transcription of most protein-coding genes, a subset of non-coding genes, and retrotransposons. Retrotransposon transcription is the first step in their multiplication cycle, given that the RNA intermediate is required for the synthesis of cDNA, the material that is ultimately incorporated into a new genomic location. Retrotransposition can have grave consequences to genome integrity, as integration events can change the gene expression landscape or lead to alteration or loss of genetic information. Given that RNAPII transcribes retrotransposons, we sought to investigate if the RNAPII-CTD played a role in the regulation of retrotransposon gene expression. Importantly, we found that the RNAPII-CTD functioned to maintaining genome integrity through inhibition of retrotransposon gene expression, as reducing CTD length significantly increased expression and transposition rates of Ty1 elements. Mechanistically, the increased Ty1 mRNA levels in the rpb1-CTD11 mutant were partly due to Cdk8-dependent alterations to the RNAPII-CTD phosphorylation status. In addition, Cdk8 alone contributed to Ty1 gene expression regulation by altering the occupancy of the gene-specific transcription factor Ste12. Loss of STE12 and TEC1 suppressed growth phenotypes of the RNAPII-CTD truncation mutant. Collectively, our results implicate Ste12 and Tec1 as general and important contributors to the Cdk8, RNAPII-CTD regulatory circuitry as it relates to the maintenance of genome integrity.

  11. The openness of pluripotent epigenome - Defining the genomic integrity of stemness for regenerative medicine

    Directory of Open Access Journals (Sweden)

    Xuejun H Parsons

    2014-02-01

    Full Text Available This article is an editorial, and it doesn't include an abstract. Full text of this article is available in HTML and PDF.Cite this article as: Parsons XH. The openness of pluripotent epigenome - Defining the genomic Integrity of stemness for regenerative medicine. Int J Cancer Ther Oncol 2014; 2(1:020114.DOI: http://dx.doi.org/10.14319/ijcto.0201.14

  12. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics

    OpenAIRE

    Verma, Mohit; Kumar, Vinay; Patel, Ravi K.; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database fea...

  13. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

    Directory of Open Access Journals (Sweden)

    Mohit Verma

    Full Text Available Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB, which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology search and comparative gene expression analysis. The current release of CTDB (v2.0 hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.

  14. The European Renal Genome Project: An Integrated Approach Towards Understanding the Genetics of Kidney Development and Disease

    OpenAIRE

    Willnow, TE; Antignac, C; Brändli, AW; Christensen, EI; Cox, RD; Davidson, D; Davies, JA; Devuyst, O; Eichele, G; Hastie, ND; Verroust, PJ; Schedl, A; Meij, IC

    2005-01-01

    Rapid progress in genome research creates a wealth of information on the functional annotation of mammalian genome sequences. However, as we accumulate large amounts of scientific information we are facing problems of how to integrate and relate the data produced by various genomic approaches. Here, we propose the novel concept of an organ atlas where diverse data from expression maps to histological findings to mutant phenotypes can be queried, compared and visualized in the context of a thr...

  15. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  16. Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases.

    Science.gov (United States)

    Lysenko, Artem; Lysenko, Atem; Hindle, Matthew Morritt; Taubert, Jan; Saqi, Mansoor; Rawlings, Christopher John

    2009-11-01

    The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

  17. Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes.

    Science.gov (United States)

    Yeo, Zhen Xuan; Wong, Joshua Chee Leong; Rozen, Steven G; Lee, Ann Siew Gek

    2014-06-24

    The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing. Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM. New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology

  18. An evolvable oestrogen receptor activity sensor: development of a modular system for integrating multiple genes into the yeast genome

    NARCIS (Netherlands)

    Fox, J.E.; Bridgham, J.T.; Bovee, T.F.H.; Thornton, J.W.

    2007-01-01

    To study a gene interaction network, we developed a gene-targeting strategy that allows efficient and stable genomic integration of multiple genetic constructs at distinct target loci in the yeast genome. This gene-targeting strategy uses a modular plasmid with a recyclable selectable marker and a

  19. VarB Plus: An Integrated Tool for Visualization of Genome Variation Datasets

    KAUST Repository

    Hidayah, Lailatul

    2012-07-01

    Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive

  20. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    Science.gov (United States)

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

    Science.gov (United States)

    Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo

    2015-01-01

    Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

  2. Genome editing using FACS enrichment of nuclease-expressing cells and indel detection by amplicon analysis

    DEFF Research Database (Denmark)

    Lonowski, Lindsey A; Narimatsu, Yoshiki; Riaz, Anjum

    2017-01-01

    , FACS enrichment of cells expressing nucleases linked to fluorescent proteins can be used to maximize knockout or knock-in editing efficiencies or to balance editing efficiency and toxic/off-target effects. The two methods can be combined to form a pipeline for cell-line editing that facilitates...

  3. MicrobesOnline: an integrated portal for comparative and functional genomics

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Joachimiak, Marcin; Price, Morgan; Bates, John; Baumohl, Jason; Chivian, Dylan; Friedland, Greg; Huang, Kathleen; Keller, Keith; Novichkov, Pavel; Dubchak, Inna; Alm, Eric; Arkin, Adam

    2011-07-14

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  4. LocusTrack: Integrated visualization of GWAS results and genomic annotation.

    Science.gov (United States)

    Cuellar-Partida, Gabriel; Renteria, Miguel E; MacGregor, Stuart

    2015-01-01

    Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservation, chromatin state, and other genomic and regulatory elements. Currently, LocusTrack can integrate annotation tracks from the UCSC genome-browser as well as from any tracks provided by the user. LocusTrack is an easy-to-use application and can be accessed at the following URL: http://gump.qimr.edu.au/general/gabrieC/LocusTrack/. Users can upload and manage GWAS results and select from and/or provide annotation tracks using simple and intuitive menus. LocusTrack scripts and associated data can be downloaded from the website and run locally.

  5. Integrative Genomic and Proteomic Analysis of the Response of Lactobacillus casei Zhang to Glucose Restriction.

    Science.gov (United States)

    Yu, Jie; Hui, Wenyan; Cao, Chenxia; Pan, Lin; Zhang, Heping; Zhang, Wenyi

    2018-03-02

    Nutrient starvation is an important survival challenge for bacteria during industrial production of functional foods. As next-generation sequencing technology has greatly advanced, we performed proteomic and genomic analysis to investigate the response of Lactobacillus casei Zhang to a glucose-restricted environment. L. casei Zhang strains were permitted to evolve in glucose-restricted or normal medium from a common ancestor over a 3 year period, and they were sampled at 1000, 2000, 3000, 4000, 5000, 6000, 7000, and 8000 generations and subjected to proteomic and genomic analyses. Genomic resequencing data revealed different point mutations and other mutational events in each selected generation of L. casei Zhang under glucose restriction stress. The differentially expressed proteins induced by glucose restriction were mostly related to fructose and mannose metabolism, carbohydrate metabolic processes, lyase activity, and amino-acid-transporting ATPase activity. Integrative proteomic and genomic analysis revealed that the mutations protected L. casei Zhang against glucose starvation by regulating other cellular carbohydrate, fatty acid, and amino acid catabolism; phosphoenolpyruvate system pathway activation; glycogen synthesis; ATP consumption; pyruvate metabolism; and general stress-response protein expression. The results help reveal the mechanisms of adapting to glucose starvation and provide new strategies for enhancing the industrial utility of L. casei Zhang.

  6. MicrobesOnline: an integrated portal for comparative and functional genomics

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Joachimiak, Marcin P.; Price, Morgan N.; Bates, John T.; Baumohl, Jason K.; Chivian, Dylan; Friedland, Greg D.; Huang, Katherine H.; Keller, Keith; Novichkov, Pavel S.; Dubchak, Inna L.; Alm, Eric J.; Arkin, Adam P.

    2009-09-17

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  7. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

    Science.gov (United States)

    Schoof, Heiko; Ernst, Rebecca; Nazarov, Vladimir; Pfeifer, Lukas; Mewes, Hans-Werner; Mayer, Klaus F. X.

    2004-01-01

    Arabidopsis thaliana is the most widely studied model plant. Functional genomics is intensively underway in many laboratories worldwide. Beyond the basic annotation of the primary sequence data, the annotated genetic elements of Arabidopsis must be linked to diverse biological data and higher order information such as metabolic or regulatory pathways. The MIPS Arabidopsis thaliana database MAtDB aims to provide a comprehensive resource for Arabidopsis as a genome model that serves as a primary reference for research in plants and is suitable for transfer of knowledge to other plants, especially crops. The genome sequence as a common backbone serves as a scaffold for the integration of data, while, in a complementary effort, these data are enhanced through the application of state-of-the-art bioinformatics tools. This information is visualized on a genome-wide and a gene-by-gene basis with access both for web users and applications. This report updates the information given in a previous report and provides an outlook on further developments. The MAtDB web interface can be accessed at http://mips.gsf.de/proj/thal/db. PMID:14681437

  8. Integrative Genomics Reveals Mechanisms of Copy Number Alterations Responsible for Transcriptional Deregulation in Colorectal Cancer

    Science.gov (United States)

    Camps, Jordi; Nguyen, Quang Tri; Padilla-Nash, Hesed M.; Knutsen, Turid; McNeil, Nicole E.; Wangsa, Danny; Hummon, Amanda B.; Grade, Marian; Ried, Thomas; Difilippantonio, Michael J.

    2016-01-01

    To evaluate the mechanisms and consequences of chromosomal aberrations in colorectal cancer (CRC), we used a combination of spectral karyotyping, array comparative genomic hybridization (aCGH), and array-based global gene expression profiling on 31 primary carcinomas and 15 established cell lines. Importantly, aCGH showed that the genomic profiles of primary tumors are recapitulated in the cell lines. We revealed a preponderance of chromosome breakpoints at sites of copy number variants (CNVs) in the CRC cell lines, a novel mechanism of DNA breakage in cancer. The integration of gene expression and aCGH led to the identification of 157 genes localized within high-level copy number changes whose transcriptional deregulation was significantly affected across all of the samples, thereby suggesting that these genes play a functional role in CRC. Genomic amplification at 8q24 was the most recurrent event and led to the overexpression of MYC and FAM84B. Copy number dependent gene expression resulted in deregulation of known cancer genes such as APC, FGFR2, and ERBB2. The identification of only 36 genes whose localization near a breakpoint could account for their observed deregulated expression demonstrates that the major mechanism for transcriptional deregulation in CRC is genomic copy number changes resulting from chromosomal aberrations. PMID:19691111

  9. The South Asian genome.

    Directory of Open Access Journals (Sweden)

    John C Chambers

    Full Text Available The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.

  10. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

    Science.gov (United States)

    Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

    2014-03-20

    illuminate the impact of integrating genomic medicine into the clinical care of patients but also inform the design of future studies. ClinicalTrials.gov identifier NCT01736566.

  11. An 8bp indel in exon 1 of Ghrelin gene associated with chicken growth.

    Science.gov (United States)

    Fang, Meixia; Nie, Qinghua; Luo, Chenglong; Zhang, Dexiang; Zhang, Xiquan

    2007-04-01

    Ghrelin, acts as the endogenous ligand for growth hormone secretagogues receptor (GHS-R), is a novel growth hormone (GH) releasing peptide with reported effects on food intake in chickens. In this study, an 8 bp indel polymorphism in exon 1 of the chicken Ghrelin (cGHRL) gene was genotyped in a F(2) designed full-sib population to analyze its associations with chicken growth and carcass traits. Later, mRNA level in the proventriculus was determined by real-time PCR to reveal the expression feature of cGHRL gene. Result showed that this 8 bp indel was significantly associated with body weight at the age of 28 days (BW28) and 56 days (BW56), eviscerated weight (EW) and leg muscle weight (LMW) (PGhrelin on chicken growth were indicated by this study.

  12. Integration of HIV in the Human Genome: Which Sites Are Preferential? A Genetic and Statistical Assessment

    Science.gov (United States)

    Gonçalves, Juliana; Moreira, Elsa; Sequeira, Inês J.; Rodrigues, António S.; Rueff, José; Brás, Aldina

    2016-01-01

    Chromosomal fragile sites (FSs) are loci where gaps and breaks may occur and are preferential integration targets for some viruses, for example, Hepatitis B, Epstein-Barr virus, HPV16, HPV18, and MLV vectors. However, the integration of the human immunodeficiency virus (HIV) in Giemsa bands and in FSs is not yet completely clear. This study aimed to assess the integration preferences of HIV in FSs and in Giemsa bands using an in silico study. HIV integration positions from Jurkat cells were used and two nonparametric tests were applied to compare HIV integration in dark versus light bands and in FS versus non-FS (NFSs). The results show that light bands are preferential targets for integration of HIV-1 in Jurkat cells and also that it integrates with equal intensity in FSs and in NFSs. The data indicates that HIV displays different preferences for FSs compared to other viruses. The aim was to develop and apply an approach to predict the conditions and constraints of HIV insertion in the human genome which seems to adequately complement empirical data. PMID:27294106

  13. The Conjugative Relaxase TrwC Promotes Integration of Foreign DNA in the Human Genome.

    Science.gov (United States)

    González-Prieto, Coral; Gabriel, Richard; Dehio, Christoph; Schmidt, Manfred; Llosa, Matxalen

    2017-06-15

    Bacterial conjugation is a mechanism of horizontal DNA transfer. The relaxase TrwC of the conjugative plasmid R388 cleaves one strand of the transferred DNA at the oriT gene, covalently attaches to it, and leads the single-stranded DNA (ssDNA) into the recipient cell. In addition, TrwC catalyzes site-specific integration of the transferred DNA into its target sequence present in the genome of the recipient bacterium. Here, we report the analysis of the efficiency and specificity of the integrase activity of TrwC in human cells, using the type IV secretion system of the human pathogen Bartonella henselae to introduce relaxase-DNA complexes. Compared to Mob relaxase from plasmid pBGR1, we found that TrwC mediated a 10-fold increase in the rate of plasmid DNA transfer to human cells and a 100-fold increase in the rate of chromosomal integration of the transferred DNA. We used linear amplification-mediated PCR and plasmid rescue to characterize the integration pattern in the human genome. DNA sequence analysis revealed mostly reconstituted oriT sequences, indicating that TrwC is active and recircularizes transferred DNA in human cells. One TrwC-mediated site-specific integration event was detected, proving that TrwC is capable of mediating site-specific integration in the human genome, albeit with very low efficiency compared to the rate of random integration. Our results suggest that TrwC may stabilize the plasmid DNA molecules in the nucleus of the human cell, probably by recircularization of the transferred DNA strand. This stabilization would increase the opportunities for integration of the DNA by the host machinery. IMPORTANCE Different biotechnological applications, including gene therapy strategies, require permanent modification of target cells. Long-term expression is achieved either by extrachromosomal persistence or by integration of the introduced DNA. Here, we studied the utility of conjugative relaxase TrwC, a bacterial protein with site

  14. IVAG: An Integrative Visualization Application for Various Types of Genomic Data Based on R-Shiny and the Docker Platform.

    Science.gov (United States)

    Lee, Tae-Rim; Ahn, Jin Mo; Kim, Gyuhee; Kim, Sangsoo

    2017-12-01

    Next-generation sequencing (NGS) technology has become a trend in the genomics research area. There are many software programs and automated pipelines to analyze NGS data, which can ease the pain for traditional scientists who are not familiar with computer programming. However, downstream analyses, such as finding differentially expressed genes or visualizing linkage disequilibrium maps and genome-wide association study (GWAS) data, still remain a challenge. Here, we introduce a dockerized web application written in R using the Shiny platform to visualize pre-analyzed RNA sequencing and GWAS data. In addition, we have integrated a genome browser based on the JBrowse platform and an automated intermediate parsing process required for custom track construction, so that users can easily build and navigate their personal genome tracks with in-house datasets. This application will help scientists perform series of downstream analyses and obtain a more integrative understanding about various types of genomic data by interactively visualizing them with customizable options.

  15. BiologicalNetworks 2.0 - an integrative view of genome biology data

    Directory of Open Access Journals (Sweden)

    Ponomarenko Julia

    2010-12-01

    Full Text Available Abstract Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other and their relations (interactions, co-expression, co-citations, and other. The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org.

  16. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Science.gov (United States)

    Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

    2016-01-01

    MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  17. Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

    Directory of Open Access Journals (Sweden)

    Hongbo Shi

    Full Text Available MicroRNAs (miRNAs play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.

  18. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    Science.gov (United States)

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  19. Selective Gene Delivery for Integrating Exogenous DNA into Plastid and Mitochondrial Genomes Using Peptide-DNA Complexes.

    Science.gov (United States)

    Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji

    2018-05-14

    Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.

  20. MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes.

    Science.gov (United States)

    Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine

    2017-01-04

    The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

    Science.gov (United States)

    Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

    2013-01-01

    Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID

  2. Genome-Wide Analysis of Transposon and Retroviral Insertions Reveals Preferential Integrations in Regions of DNA Flexibility.

    Science.gov (United States)

    Vrljicak, Pavle; Tao, Shijie; Varshney, Gaurav K; Quach, Helen Ngoc Bao; Joshi, Adita; LaFave, Matthew C; Burgess, Shawn M; Sampath, Karuna

    2016-04-07

    DNA transposons and retroviruses are important transgenic tools for genome engineering. An important consideration affecting the choice of transgenic vector is their insertion site preferences. Previous large-scale analyses of Ds transposon integration sites in plants were done on the basis of reporter gene expression or germ-line transmission, making it difficult to discern vertebrate integration preferences. Here, we compare over 1300 Ds transposon integration sites in zebrafish with Tol2 transposon and retroviral integration sites. Genome-wide analysis shows that Ds integration sites in the presence or absence of marker selection are remarkably similar and distributed throughout the genome. No strict motif was found, but a preference for structural features in the target DNA associated with DNA flexibility (Twist, Tilt, Rise, Roll, Shift, and Slide) was observed. Remarkably, this feature is also found in transposon and retroviral integrations in maize and mouse cells. Our findings show that structural features influence the integration of heterologous DNA in genomes, and have implications for targeted genome engineering. Copyright © 2016 Vrljicak et al.

  3. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

    Directory of Open Access Journals (Sweden)

    In Young Choi

    2013-12-01

    Full Text Available The advances in electronic medical records (EMRs and bioinformatics (BI represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

  4. Integration of Genome Scale Metabolic Networks and Gene Regulation of Metabolic Enzymes With Physiologically Based Pharmacokinetics.

    Science.gov (United States)

    Maldonado, Elaina M; Leoncikas, Vytautas; Fisher, Ciarán P; Moore, J Bernadette; Plant, Nick J; Kierzek, Andrzej M

    2017-11-01

    The scope of physiologically based pharmacokinetic (PBPK) modeling can be expanded by assimilation of the mechanistic models of intracellular processes from systems biology field. The genome scale metabolic networks (GSMNs) represent a whole set of metabolic enzymes expressed in human tissues. Dynamic models of the gene regulation of key drug metabolism enzymes are available. Here, we introduce GSMNs and review ongoing work on integration of PBPK, GSMNs, and metabolic gene regulation. We demonstrate example models. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  5. Inferring genetic architecture of complex traits using Bayesian integrative analysis of genome and transcriptiome data

    DEFF Research Database (Denmark)

    Ehsani, Alireza; Sørensen, Peter; Pomp, Daniel

    2012-01-01

    Background To understand the genetic architecture of complex traits and bridge the genotype-phenotype gap, it is useful to study intermediate -omics data, e.g. the transcriptome. The present study introduces a method for simultaneous quantification of the contributions from single nucleotide......-modal distribution of genomic values collapses, when gene expressions are added to the model Conclusions With increased availability of various -omics data, integrative approaches are promising tools for understanding the genetic architecture of complex traits. Partitioning of explained variances at the chromosome...

  6. Dual CRISPR-Cas9 Cleavage Mediated Gene Excision and Targeted Integration in Yarrowia lipolytica.

    Science.gov (United States)

    Gao, Difeng; Smith, Spencer; Spagnuolo, Michael; Rodriguez, Gabriel; Blenner, Mark

    2018-05-29

    CRISPR-Cas9 technology has been successfully applied in Yarrowia lipolytica for targeted genomic editing including gene disruption and integration; however, disruptions by existing methods typically result from small frameshift mutations caused by indels within the coding region, which usually resulted in unnatural protein. In this study, a dual cleavage strategy directed by paired sgRNAs is developed for gene knockout. This method allows fast and robust gene excision, demonstrated on six genes of interest. The targeted regions for excision vary in length from 0.3 kb up to 3.5 kb and contain both non-coding and coding regions. The majority of the gene excisions are repaired by perfect nonhomologous end-joining without indel. Based on this dual cleavage system, two targeted markerless integration methods are developed by providing repair templates. While both strategies are effective, homology mediated end joining (HMEJ) based method are twice as efficient as homology recombination (HR) based method. In both cases, dual cleavage leads to similar or improved gene integration efficiencies compared to gene excision without integration. This dual cleavage strategy will be useful for not only generating more predictable and robust gene knockout, but also for efficient targeted markerless integration, and simultaneous knockout and integration in Y. lipolytica. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. In vitro analysis of integrated global high-resolution DNA methylation profiling with genomic imbalance and gene expression in osteosarcoma.

    Directory of Open Access Journals (Sweden)

    Bekim Sadikovic

    Full Text Available Genetic and epigenetic changes contribute to deregulation of gene expression and development of human cancer. Changes in DNA methylation are key epigenetic factors regulating gene expression and genomic stability. Recent progress in microarray technologies resulted in developments of high resolution platforms for profiling of genetic, epigenetic and gene expression changes. OS is a pediatric bone tumor with characteristically high level of numerical and structural chromosomal changes. Furthermore, little is known about DNA methylation changes in OS. Our objective was to develop an integrative approach for analysis of high-resolution epigenomic, genomic, and gene expression profiles in order to identify functional epi/genomic differences between OS cell lines and normal human osteoblasts. A combination of Affymetrix Promoter Tilling Arrays for DNA methylation, Agilent array-CGH platform for genomic imbalance and Affymetrix Gene 1.0 platform for gene expression analysis was used. As a result, an integrative high-resolution approach for interrogation of genome-wide tumour-specific changes in DNA methylation was developed. This approach was used to provide the first genomic DNA methylation maps, and to identify and validate genes with aberrant DNA methylation in OS cell lines. This first integrative analysis of global cancer-related changes in DNA methylation, genomic imbalance, and gene expression has provided comprehensive evidence of the cumulative roles of epigenetic and genetic mechanisms in deregulation of gene expression networks.

  8. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  9. Cancer prevention, the need to preserve the integrity of the genome at all cost.

    Science.gov (United States)

    Okafor, M T; Nwagha, T U; Anusiem, C; Okoli, U A; Nubila, N I; Al-Alloosh, F; Udenyia, I J

    2018-05-01

    The entire genetic information carried by an organism makes up its genome. Genes have a diverse number of functions. They code different proteins for normal proliferation of cells. However, changes in the base sequence of genes affect their protein by-products which act as messengers for normal cellular functions such as proliferation and repairs. Salient processes for maintaining the integrity of the genome are hinged on intricate mechanisms put in place for the evolution to tackle genomic stresses. To discuss how cells sense and repair damage to their deoxyribonucleic acid (DNA) as well as to highlight how defects in the genes involved in DNA repair contribute to cancer development. Methodology: Online searches on the following databases such as Google Scholar, PubMed, Biomed Central, and SciELO were done. Attempt was made to review articles with keywords such as cancer, cell cycle, tumor suppressor genes, and DNA repair. The cell cycle, tumor suppression genes, DNA repair mechanism, as well as their contribution to cancer development, were discussed and reviewed. Knowledge on how cells detect and repair DNA damage through an array of mechanisms should allay our anxiety as regards cancer development. More studies on DNA damage detection and repair processes are important toward a holistic approach to cancer treatment.

  10. metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research.

    Science.gov (United States)

    Lyne, Mike; Smith, Richard N; Lyne, Rachel; Aleksic, Jelena; Hu, Fengyuan; Kalderimis, Alex; Stepan, Radek; Micklem, Gos

    2013-01-01

    Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first

  11. Chromosomally Integrated Human Herpesvirus 6: Models of Viral Genome Release from the Telomere and Impacts on Human Health.

    Science.gov (United States)

    Wood, Michael L; Royle, Nicola J

    2017-07-12

    Human herpesvirus 6A and 6B, alongside some other herpesviruses, have the striking capacity to integrate into telomeres, the terminal repeated regions of chromosomes. The chromosomally integrated forms, ciHHV-6A and ciHHV-6B, are proposed to be a state of latency and it has been shown that they can both be inherited if integration occurs in the germ line. The first step in full viral reactivation must be the release of the integrated viral genome from the telomere and here we propose various models of this release involving transcription of the viral genome, replication fork collapse, and t-circle mediated release. In this review, we also discuss the relationship between ciHHV-6 and the telomere carrying the insertion, particularly how the presence and subsequent partial or complete release of the ciHHV-6 genome may affect telomere dynamics and the risk of disease.

  12. NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome.

    Science.gov (United States)

    Chandrani, P; Kulkarni, V; Iyer, P; Upadhyay, P; Chaubal, R; Das, P; Mulherkar, R; Singh, R; Dutt, A

    2015-06-09

    Human papilloma virus (HPV) accounts for the most common cause of all virus-associated human cancers. Here, we describe the first graphic user interface (GUI)-based automated tool 'HPVDetector', for non-computational biologists, exclusively for detection and annotation of the HPV genome based on next-generation sequencing data sets. We developed a custom-made reference genome that comprises of human chromosomes along with annotated genome of 143 HPV types as pseudochromosomes. The tool runs on a dual mode as defined by the user: a 'quick mode' to identify presence of HPV types and an 'integration mode' to determine genomic location for the site of integration. The input data can be a paired-end whole-exome, whole-genome or whole-transcriptome data set. The HPVDetector is available in public domain for download: http://www.actrec.gov.in/pi-webpages/AmitDutt/HPVdetector/HPVDetector.html. On the basis of our evaluation of 116 whole-exome, 23 whole-transcriptome and 2 whole-genome data, we were able to identify presence of HPV in 20 exomes and 4 transcriptomes of cervical and head and neck cancer tumour samples. Using the inbuilt annotation module of HPVDetector, we found predominant integration of viral gene E7, a known oncogene, at known 17q21, 3q27, 7q35, Xq28 and novel sites of integration in the human genome. Furthermore, co-infection with high-risk HPVs such as 16 and 31 were found to be mutually exclusive compared with low-risk HPV71. HPVDetector is a simple yet precise and robust tool for detecting HPV from tumour samples using variety of next-generation sequencing platforms including whole genome, whole exome and transcriptome. Two different modes (quick detection and integration mode) along with a GUI widen the usability of HPVDetector for biologists and clinicians with minimal computational knowledge.

  13. Microenvironmental Heterogeneity Parallels Breast Cancer Progression: A Histology-Genomic Integration Analysis.

    Directory of Open Access Journals (Sweden)

    Rachael Natrajan

    2016-02-01

    Full Text Available The intra-tumor diversity of cancer cells is under intense investigation; however, little is known about the heterogeneity of the tumor microenvironment that is key to cancer progression and evolution. We aimed to assess the degree of microenvironmental heterogeneity in breast cancer and correlate this with genomic and clinical parameters.We developed a quantitative measure of microenvironmental heterogeneity along three spatial dimensions (3-D in solid tumors, termed the tumor ecosystem diversity index (EDI, using fully automated histology image analysis coupled with statistical measures commonly used in ecology. This measure was compared with disease-specific survival, key mutations, genome-wide copy number, and expression profiling data in a retrospective study of 510 breast cancer patients as a test set and 516 breast cancer patients as an independent validation set. In high-grade (grade 3 breast cancers, we uncovered a striking link between high microenvironmental heterogeneity measured by EDI and a poor prognosis that cannot be explained by tumor size, genomics, or any other data types. However, this association was not observed in low-grade (grade 1 and 2 breast cancers. The prognostic value of EDI was superior to known prognostic factors and was enhanced with the addition of TP53 mutation status (multivariate analysis test set, p = 9 × 10-4, hazard ratio = 1.47, 95% CI 1.17-1.84; validation set, p = 0.0011, hazard ratio = 1.78, 95% CI 1.26-2.52. Integration with genome-wide profiling data identified losses of specific genes on 4p14 and 5q13 that were enriched in grade 3 tumors with high microenvironmental diversity that also substratified patients into poor prognostic groups. Limitations of this study include the number of cell types included in the model, that EDI has prognostic value only in grade 3 tumors, and that our spatial heterogeneity measure was dependent on spatial scale and tumor size.To our knowledge, this is the first

  14. New approaches to assessing the effects of mutagenic agents on the integrity of the human genome

    International Nuclear Information System (INIS)

    Elespuru, R.K.; Sankaranarayanan, K.

    2007-01-01

    Heritable genetic alterations, although individually rare, have a substantial collective health impact. Approximately 20% of these are new mutations of unknown cause. Assessment of the effect of exposures to DNA damaging agents, i.e. mutagenic chemicals and radiations, on the integrity of the human genome and on the occurrence of genetic disease remains a daunting challenge. Recent insights may explain why previous examination of human exposures to ionizing radiation, as in Hiroshima and Nagasaki, failed to reveal heritable genetic effects. New opportunities to assess the heritable genetic damaging effects of environmental mutagens are afforded by: (1) integration of knowledge on the molecular nature of genetic disorders and the molecular effects of mutagens; (2) the development of more practical assays for germline mutagenesis; (3) the likely use of population-based genetic screening in personalized medicine

  15. Polymorphic integrations of an endogenous gammaretrovirus in the mule deer genome.

    Science.gov (United States)

    Elleder, Daniel; Kim, Oekyung; Padhi, Abinash; Bankert, Jason G; Simeonov, Ivan; Schuster, Stephan C; Wittekindt, Nicola E; Motameny, Susanne; Poss, Mary

    2012-03-01

    Endogenous retroviruses constitute a significant genomic fraction in all mammalian species. Typically they are evolutionarily old and fixed in the host species population. Here we report on a novel endogenous gammaretrovirus (CrERVγ; for cervid endogenous gammaretrovirus) in the mule deer (Odocoileus hemionus) that is insertionally polymorphic among individuals from the same geographical location, suggesting that it has a more recent evolutionary origin. Using PCR-based methods, we identified seven CrERVγ proviruses and demonstrated that they show various levels of insertional polymorphism in mule deer individuals. One CrERVγ provirus was detected in all mule deer sampled but was absent from white-tailed deer, indicating that this virus originally integrated after the split of the two species, which occurred approximately one million years ago. There are, on average, 100 CrERVγ copies in the mule deer genome based on quantitative PCR analysis. A CrERVγ provirus was sequenced and contained intact open reading frames (ORFs) for three virus genes. Transcripts were identified covering the entire provirus. CrERVγ forms a distinct branch of the gammaretrovirus phylogeny, with the closest relatives of CrERVγ being endogenous gammaretroviruses from sheep and pig. We demonstrated that white-tailed deer (Odocoileus virginianus) and elk (Cervus canadensis) DNA contain proviruses that are closely related to mule deer CrERVγ in a conserved region of pol; more distantly related sequences can be identified in the genome of another member of the Cervidae, the muntjac (Muntiacus muntjak). The discovery of a novel transcriptionally active and insertionally polymorphic retrovirus in mammals could provide a useful model system to study the dynamic interaction between the host genome and an invading retrovirus.

  16. DNA double-strand break response in stem cells: mechanisms to maintain genomic integrity.

    Science.gov (United States)

    Nagaria, Pratik; Robert, Carine; Rassool, Feyruz V

    2013-02-01

    Embryonic stem cells (ESCs) represent the point of origin of all cells in a given organism and must protect their genomes from both endogenous and exogenous genotoxic stress. DNA double-strand breaks (DSBs) are one of the most lethal forms of damage, and failure to adequately repair DSBs would not only compromise the ability of SCs to self-renew and differentiate, but will also lead to genomic instability and disease. Herein, we describe the mechanisms by which ESCs respond to DSB-inducing agents such as reactive oxygen species (ROS) and ionizing radiation, compared to somatic cells. We will also discuss whether the DSB response is fully reprogrammed in induced pluripotent stem cells (iPSCs) and the role of the DNA damage response (DDR) in the reprogramming of these cells. ESCs have distinct mechanisms to protect themselves against DSBs and oxidative stress compared to somatic cells. The response to damage and stress is crucial for the maintenance of self-renewal and differentiation capacity in SCs. iPSCs appear to reprogram some of the responses to genotoxic stress. However, it remains to be determined if iPSCs also retain some DDR characteristics of the somatic cells of origin. The mechanisms regulating the genomic integrity in ESCs and iPSCs are critical for its safe use in regenerative medicine and may shed light on the pathways and factors that maintain genomic stability, preventing diseases such as cancer. This article is part of a Special Issue entitled Biochemistry of Stem Cells. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. An integrative genomic and transcriptomic analysis reveals potential targets associated with cell proliferation in uterine leiomyomas.

    Directory of Open Access Journals (Sweden)

    Priscila Daniele Ramos Cirilo

    Full Text Available Uterine Leiomyomas (ULs are the most common benign tumours affecting women of reproductive age. ULs represent a major problem in public health, as they are the main indication for hysterectomy. Approximately 40-50% of ULs have non-random cytogenetic abnormalities, and half of ULs may have copy number alterations (CNAs. Gene expression microarrays studies have demonstrated that cell proliferation genes act in response to growth factors and steroids. However, only a few genes mapping to CNAs regions were found to be associated with ULs.We applied an integrative analysis using genomic and transcriptomic data to identify the pathways and molecular markers associated with ULs. Fifty-one fresh frozen specimens were evaluated by array CGH (JISTIC and gene expression microarrays (SAM. The CONEXIC algorithm was applied to integrate the data.The integrated analysis identified the top 30 significant genes (P<0.01, which comprised genes associated with cancer, whereas the protein-protein interaction analysis indicated a strong association between FANCA and BRCA1. Functional in silico analysis revealed target molecules for drugs involved in cell proliferation, including FGFR1 and IGFBP5. Transcriptional and protein analyses showed that FGFR1 (P = 0.006 and P<0.01, respectively and IGFBP5 (P = 0.0002 and P = 0.006, respectively were up-regulated in the tumours when compared with the adjacent normal myometrium.The integrative genomic and transcriptomic approach indicated that FGFR1 and IGFBP5 amplification, as well as the consequent up-regulation of the protein products, plays an important role in the aetiology of ULs and thus provides data for potential drug therapies development to target genes associated with cellular proliferation in ULs.

  18. Evolutionary time-scale of the begomoviruses: evidence from integrated sequences in the Nicotiana genome.

    Directory of Open Access Journals (Sweden)

    Pierre Lefeuvre

    Full Text Available Despite having single stranded DNA genomes that are replicated by host DNA polymerases, viruses in the family Geminiviridae are apparently evolving as rapidly as some RNA viruses. The observed substitution rates of geminiviruses in the genera Begomovirus and Mastrevirus are so high that the entire family could conceivably have originated less than a million years ago (MYA. However, the existence of geminivirus related DNA (GRD integrated within the genomes of various Nicotiana species suggests that the geminiviruses probably originated >10 MYA. Some have even suggested that a distinct New-World (NW lineage of begomoviruses may have arisen following the separation by continental drift of African and American proto-begomoviruses ∼110 MYA. We evaluate these various geminivirus origin hypotheses using Bayesian coalescent-based approaches to date firstly the Nicotiana GRD integration events, and then the divergence of the NW and Old-World (OW begomoviruses. Besides rejecting the possibility of a<2 MYA OW-NW begomovirus split, we could also discount that it may have occurred concomitantly with the breakup of Gondwanaland 110 MYA. Although we could only confidently narrow the date of the split down to between 2 and 80 MYA, the most plausible (and best supported date for the split is between 20 and 30 MYA--a time when global cooling ended the dispersal of temperate species between Asia and North America via the Beringian land bridge.

  19. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    Science.gov (United States)

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  20. Inter-replicon Gene Flow Contributes to Transcriptional Integration in the Sinorhizobium meliloti Multipartite Genome

    Directory of Open Access Journals (Sweden)

    George C. diCenzo

    2018-05-01

    Full Text Available Integration of newly acquired genes into existing regulatory networks is necessary for successful horizontal gene transfer (HGT. Ten percent of bacterial species contain at least two DNA replicons over 300 kilobases in size, with the secondary replicons derived predominately through HGT. The Sinorhizobium meliloti genome is split between a 3.7 Mb chromosome, a 1.7 Mb chromid consisting largely of genes acquired through ancient HGT, and a 1.4 Mb megaplasmid consisting primarily of recently acquired genes. Here, RNA-sequencing is used to examine the transcriptional consequences of massive, synthetic genome reduction produced through the removal of the megaplasmid and/or the chromid. Removal of the pSymA megaplasmid influenced the transcription of only six genes. In contrast, removal of the chromid influenced expression of ∼8% of chromosomal genes and ∼4% of megaplasmid genes. This was mediated in part by the loss of the ETR DNA region whose presence on pSymB is due to a translocation from the chromosome. No obvious functional bias among the up-regulated genes was detected, although genes with putative homologs on the chromid were enriched. Down-regulated genes were enriched in motility and sensory transduction pathways. Four transcripts were examined further, and in each case the transcriptional change could be traced to loss of specific pSymB regions. In particularly, a chromosomal transporter was induced due to deletion of bdhA likely mediated through 3-hydroxybutyrate accumulation. These data provide new insights into the evolution of the multipartite bacterial genome, and more generally into the integration of horizontally acquired genes into the transcriptome.

  1. Inter-replicon Gene Flow Contributes to Transcriptional Integration in the Sinorhizobium meliloti Multipartite Genome.

    Science.gov (United States)

    diCenzo, George C; Wellappili, Deelaka; Golding, G Brian; Finan, Turlough M

    2018-05-04

    Integration of newly acquired genes into existing regulatory networks is necessary for successful horizontal gene transfer (HGT). Ten percent of bacterial species contain at least two DNA replicons over 300 kilobases in size, with the secondary replicons derived predominately through HGT. The Sinorhizobium meliloti genome is split between a 3.7 Mb chromosome, a 1.7 Mb chromid consisting largely of genes acquired through ancient HGT, and a 1.4 Mb megaplasmid consisting primarily of recently acquired genes. Here, RNA-sequencing is used to examine the transcriptional consequences of massive, synthetic genome reduction produced through the removal of the megaplasmid and/or the chromid. Removal of the pSymA megaplasmid influenced the transcription of only six genes. In contrast, removal of the chromid influenced expression of ∼8% of chromosomal genes and ∼4% of megaplasmid genes. This was mediated in part by the loss of the ETR DNA region whose presence on pSymB is due to a translocation from the chromosome. No obvious functional bias among the up-regulated genes was detected, although genes with putative homologs on the chromid were enriched. Down-regulated genes were enriched in motility and sensory transduction pathways. Four transcripts were examined further, and in each case the transcriptional change could be traced to loss of specific pSymB regions. In particularly, a chromosomal transporter was induced due to deletion of bdhA likely mediated through 3-hydroxybutyrate accumulation. These data provide new insights into the evolution of the multipartite bacterial genome, and more generally into the integration of horizontally acquired genes into the transcriptome. Copyright © 2018 diCenzo, et al.

  2. Genome-wide conserved non-coding microsatellite (CNMS) marker-based integrative genetical genomics for quantitative dissection of seed weight in chickpea.

    Science.gov (United States)

    Bajaj, Deepak; Saxena, Maneesha S; Kujur, Alice; Das, Shouvik; Badoni, Saurabh; Tripathi, Shailesh; Upadhyaya, Hari D; Gowda, C L L; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K; Parida, Swarup K

    2015-03-01

    Phylogenetic footprinting identified 666 genome-wide paralogous and orthologous CNMS (conserved non-coding microsatellite) markers from 5'-untranslated and regulatory regions (URRs) of 603 protein-coding chickpea genes. The (CT)n and (GA)n CNMS carrying CTRMCAMV35S and GAGA8BKN3 regulatory elements, respectively, are abundant in the chickpea genome. The mapped genic CNMS markers with robust amplification efficiencies (94.7%) detected higher intraspecific polymorphic potential (37.6%) among genotypes, implying their immense utility in chickpea breeding and genetic analyses. Seventeen differentially expressed CNMS marker-associated genes showing strong preferential and seed tissue/developmental stage-specific expression in contrasting genotypes were selected to narrow down the gene targets underlying seed weight quantitative trait loci (QTLs)/eQTLs (expression QTLs) through integrative genetical genomics. The integration of transcript profiling with seed weight QTL/eQTL mapping, molecular haplotyping, and association analyses identified potential molecular tags (GAGA8BKN3 and RAV1AAT regulatory elements and alleles/haplotypes) in the LOB-domain-containing protein- and KANADI protein-encoding transcription factor genes controlling the cis-regulated expression for seed weight in the chickpea. This emphasizes the potential of CNMS marker-based integrative genetical genomics for the quantitative genetic dissection of complex seed weight in chickpea. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  3. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome

    Science.gov (United States)

    Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.

    2002-01-01

    Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263

  4. The Next Generation Precision Medical Record - A Framework for Integrating Genomes and Wearable Sensors with Medical Records

    OpenAIRE

    Batra, Prag; Singh, Enakshi; Bog, Anja; Wright, Mark; Ashley, Euan; Waggott, Daryl

    2016-01-01

    Current medical records are rigid with regards to emerging big biomedical data. Examples of poorly integrated big data that already exist in clinical practice include whole genome sequencing and wearable sensors for real time monitoring. Genome sequencing enables conventional diagnostic interrogation and forms the fundamental baseline for precision health throughout a patients lifetime. Mobile sensors enable tailored monitoring regimes for both reducing risk through precision health intervent...

  5. Network analysis of epidermal growth factor signaling using integrated genomic, proteomic and phosphorylation data.

    Directory of Open Access Journals (Sweden)

    Katrina M Waters

    Full Text Available To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response.

  6. Network Analysis of Epidermal Growth Factor Signaling using Integrated Genomic, Proteomic and Phosphorylation Data

    Energy Technology Data Exchange (ETDEWEB)

    Waters, Katrina M.; Liu, Tao; Quesenberry, Ryan D.; Willse, Alan R.; Bandyopadhyay, Somnath; Kathmann, Loel E.; Weber, Thomas J.; Smith, Richard D.; Wiley, H. S.; Thrall, Brian D.

    2012-03-29

    To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF) using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response.

  7. Minding the gap: Frequency of indels in mtDNA control region sequence data and influence on population genetic analyses

    Science.gov (United States)

    Pearce, J.M.

    2006-01-01

    Insertions and deletions (indels) result in sequences of various lengths when homologous gene regions are compared among individuals or species. Although indels are typically phylogenetically informative, occurrence and incorporation of these characters as gaps in intraspecific population genetic data sets are rarely discussed. Moreover, the impact of gaps on estimates of fixation indices, such as FST, has not been reviewed. Here, I summarize the occurrence and population genetic signal of indels among 60 published studies that involved alignments of multiple sequences from the mitochondrial DNA (mtDNA) control region of vertebrate taxa. Among 30 studies observing indels, an average of 12% of both variable and parsimony-informative sites were composed of these sites. There was no consistent trend between levels of population differentiation and the number of gap characters in a data block. Across all studies, the average influence on estimates of ??ST was small, explaining only an additional 1.8% of among population variance (range 0.0-8.0%). Studies most likely to observe an increase in ??ST with the inclusion of gap characters were those with control region DNA appears small, dependent upon total number of variable sites in the data block, and related to species-specific characteristics and the spatial distribution of mtDNA lineages that contain indels. ?? 2006 Blackwell Publishing Ltd.

  8. GenomeCAT: a versatile tool for the analysis and integrative visualization of DNA copy number variants.

    Science.gov (United States)

    Tebel, Katrin; Boldt, Vivien; Steininger, Anne; Port, Matthias; Ebert, Grit; Ullmann, Reinhard

    2017-01-06

    The analysis of DNA copy number variants (CNV) has increasing impact in the field of genetic diagnostics and research. However, the interpretation of CNV data derived from high resolution array CGH or NGS platforms is complicated by the considerable variability of the human genome. Therefore, tools for multidimensional data analysis and comparison of patient cohorts are needed to assist in the discrimination of clinically relevant CNVs from others. We developed GenomeCAT, a standalone Java application for the analysis and integrative visualization of CNVs. GenomeCAT is composed of three modules dedicated to the inspection of single cases, comparative analysis of multidimensional data and group comparisons aiming at the identification of recurrent aberrations in patients sharing the same phenotype, respectively. Its flexible import options ease the comparative analysis of own results derived from microarray or NGS platforms with data from literature or public depositories. Multidimensional data obtained from different experiment types can be merged into a common data matrix to enable common visualization and analysis. All results are stored in the integrated MySQL database, but can also be exported as tab delimited files for further statistical calculations in external programs. GenomeCAT offers a broad spectrum of visualization and analysis tools that assist in the evaluation of CNVs in the context of other experiment data and annotations. The use of GenomeCAT does not require any specialized computer skills. The various R packages implemented for data analysis are fully integrated into GenomeCATs graphical user interface and the installation process is supported by a wizard. The flexibility in terms of data import and export in combination with the ability to create a common data matrix makes the program also well suited as an interface between genomic data from heterogeneous sources and external software tools. Due to the modular architecture the functionality of

  9. MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data

    Science.gov (United States)

    Vallenet, David; Belda, Eugeni; Calteau, Alexandra; Cruveiller, Stéphane; Engelen, Stefan; Lajus, Aurélie; Le Fèvre, François; Longin, Cyrille; Mornico, Damien; Roche, David; Rouy, Zoé; Salvignol, Gregory; Scarpelli, Claude; Thil Smith, Adam Alexander; Weiman, Marion; Médigue, Claudine

    2013-01-01

    MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest. PMID:23193269

  10. Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.

    Science.gov (United States)

    Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul

    2016-07-01

    Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.

  11. Preserving genome integrity: the DdrA protein of Deinococcus radiodurans R1.

    Science.gov (United States)

    Harris, Dennis R; Tanaka, Masashi; Saveliev, Sergei V; Jolivet, Edmond; Earl, Ashlee M; Cox, Michael M; Battista, John R

    2004-10-01

    The bacterium Deinococcus radiodurans can withstand extraordinary levels of ionizing radiation, reflecting an equally extraordinary capacity for DNA repair. The hypothetical gene product DR0423 has been implicated in the recovery of this organism from DNA damage, indicating that this protein is a novel component of the D. radiodurans DNA repair system. DR0423 is a homologue of the eukaryotic Rad52 protein. Following exposure to ionizing radiation, DR0423 expression is induced relative to an untreated control, and strains carrying a deletion of the DR0423 gene exhibit increased sensitivity to ionizing radiation. When recovering from ionizing-radiation-induced DNA damage in the absence of nutrients, wild-type D. radiodurans reassembles its genome while the mutant lacking DR0423 function does not. In vitro, the purified DR0423 protein binds to single-stranded DNA with an apparent affinity for 3' ends, and protects those ends from nuclease degradation. We propose that DR0423 is part of a DNA end-protection system that helps to preserve genome integrity following exposure to ionizing radiation. We designate the DR0423 protein as DNA damage response A protein.

  12. Preserving genome integrity: the DdrA protein of Deinococcus radiodurans R1.

    Directory of Open Access Journals (Sweden)

    Dennis R Harris

    2004-10-01

    Full Text Available The bacterium Deinococcus radiodurans can withstand extraordinary levels of ionizing radiation, reflecting an equally extraordinary capacity for DNA repair. The hypothetical gene product DR0423 has been implicated in the recovery of this organism from DNA damage, indicating that this protein is a novel component of the D. radiodurans DNA repair system. DR0423 is a homologue of the eukaryotic Rad52 protein. Following exposure to ionizing radiation, DR0423 expression is induced relative to an untreated control, and strains carrying a deletion of the DR0423 gene exhibit increased sensitivity to ionizing radiation. When recovering from ionizing-radiation-induced DNA damage in the absence of nutrients, wild-type D. radiodurans reassembles its genome while the mutant lacking DR0423 function does not. In vitro, the purified DR0423 protein binds to single-stranded DNA with an apparent affinity for 3' ends, and protects those ends from nuclease degradation. We propose that DR0423 is part of a DNA end-protection system that helps to preserve genome integrity following exposure to ionizing radiation. We designate the DR0423 protein as DNA damage response A protein.

  13. Integrating the genomic architecture of human nucleolar organizer regions with the biophysical properties of nucleoli.

    Science.gov (United States)

    Mangan, Hazel; Gailín, Michael Ó; McStay, Brian

    2017-12-01

    Nucleoli are the sites of ribosome biogenesis and the largest membraneless subnuclear structures. They are intimately linked with growth and proliferation control and function as sensors of cellular stress. Nucleoli form around arrays of ribosomal gene (rDNA) repeats also called nucleolar organizer regions (NORs). In humans, NORs are located on the short arms of all five human acrocentric chromosomes. Multiple NORs contribute to the formation of large heterochromatin-surrounded nucleoli observed in most human cells. Here we will review recent findings about their genomic architecture. The dynamic nature of nucleoli began to be appreciated with the advent of photodynamic experiments using fluorescent protein fusions. We review more recent data on nucleoli in Xenopus germinal vesicles (GVs) which has revealed a liquid droplet-like behavior that facilitates nucleolar fusion. Further analysis in both XenopusGVs and Drosophila embryos indicates that the internal organization of nucleoli is generated by a combination of liquid-liquid phase separation and active processes involving rDNA. We will attempt to integrate these recent findings with the genomic architecture of human NORs to advance our understanding of how nucleoli form and respond to stress in human cells. © 2017 Federation of European Biochemical Societies.

  14. Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

    Science.gov (United States)

    Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro

    2013-01-01

    The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293

  15. Reframed Genome-Scale Metabolic Model to Facilitate Genetic Design and Integration with Expression Data.

    Science.gov (United States)

    Gu, Deqing; Jian, Xingxing; Zhang, Cheng; Hua, Qiang

    2017-01-01

    Genome-scale metabolic network models (GEMs) have played important roles in the design of genetically engineered strains and helped biologists to decipher metabolism. However, due to the complex gene-reaction relationships that exist in model systems, most algorithms have limited capabilities with respect to directly predicting accurate genetic design for metabolic engineering. In particular, methods that predict reaction knockout strategies leading to overproduction are often impractical in terms of gene manipulations. Recently, we proposed a method named logical transformation of model (LTM) to simplify the gene-reaction associations by introducing intermediate pseudo reactions, which makes it possible to generate genetic design. Here, we propose an alternative method to relieve researchers from deciphering complex gene-reactions by adding pseudo gene controlling reactions. In comparison to LTM, this new method introduces fewer pseudo reactions and generates a much smaller model system named as gModel. We showed that gModel allows two seldom reported applications: identification of minimal genomes and design of minimal cell factories within a modified OptKnock framework. In addition, gModel could be used to integrate expression data directly and improve the performance of the E-Fmin method for predicting fluxes. In conclusion, the model transformation procedure will facilitate genetic research based on GEMs, extending their applications.

  16. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R.; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-01-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  17. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi

    2016-12-24

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  18. Inactivating UBE2M impacts the DNA damage response and genome integrity involving multiple cullin ligases.

    Directory of Open Access Journals (Sweden)

    Scott Cukras

    Full Text Available Protein neddylation is involved in a wide variety of cellular processes. Here we show that the DNA damage response is perturbed in cells inactivated with an E2 Nedd8 conjugating enzyme UBE2M, measured by RAD51 foci formation kinetics and cell based DNA repair assays. UBE2M knockdown increases DNA breakages and cellular sensitivity to DNA damaging agents, further suggesting heightened genomic instability and defective DNA repair activity. Investigating the downstream Cullin targets of UBE2M revealed that silencing of Cullin 1, 2, and 4 ligases incurred significant DNA damage. In particular, UBE2M knockdown, or defective neddylation of Cullin 2, leads to a blockade in the G1 to S progression and is associated with delayed S-phase dependent DNA damage response. Cullin 4 inactivation leads to an aberrantly high DNA damage response that is associated with increased DNA breakages and sensitivity of cells to DNA damaging agents, suggesting a DNA repair defect is associated. siRNA interrogation of key Cullin substrates show that CDT1, p21, and Claspin are involved in elevated DNA damage in the UBE2M knockdown cells. Therefore, UBE2M is required to maintain genome integrity by activating multiple Cullin ligases throughout the cell cycle.

  19. Inactivating UBE2M impacts the DNA damage response and genome integrity involving multiple cullin ligases.

    Science.gov (United States)

    Cukras, Scott; Morffy, Nicholas; Ohn, Takbum; Kee, Younghoon

    2014-01-01

    Protein neddylation is involved in a wide variety of cellular processes. Here we show that the DNA damage response is perturbed in cells inactivated with an E2 Nedd8 conjugating enzyme UBE2M, measured by RAD51 foci formation kinetics and cell based DNA repair assays. UBE2M knockdown increases DNA breakages and cellular sensitivity to DNA damaging agents, further suggesting heightened genomic instability and defective DNA repair activity. Investigating the downstream Cullin targets of UBE2M revealed that silencing of Cullin 1, 2, and 4 ligases incurred significant DNA damage. In particular, UBE2M knockdown, or defective neddylation of Cullin 2, leads to a blockade in the G1 to S progression and is associated with delayed S-phase dependent DNA damage response. Cullin 4 inactivation leads to an aberrantly high DNA damage response that is associated with increased DNA breakages and sensitivity of cells to DNA damaging agents, suggesting a DNA repair defect is associated. siRNA interrogation of key Cullin substrates show that CDT1, p21, and Claspin are involved in elevated DNA damage in the UBE2M knockdown cells. Therefore, UBE2M is required to maintain genome integrity by activating multiple Cullin ligases throughout the cell cycle.

  20. Integrated genomics and proteomics of the Torpedo californica electric organ: concordance with the mammalian neuromuscular junction

    Directory of Open Access Journals (Sweden)

    Mate Suzanne E

    2011-05-01

    Full Text Available Abstract Background During development, the branchial mesoderm of Torpedo californica transdifferentiates into an electric organ capable of generating high voltage discharges to stun fish. The organ contains a high density of cholinergic synapses and has served as a biochemical model for the membrane specialization of myofibers, the neuromuscular junction (NMJ. We studied the genome and proteome of the electric organ to gain insight into its composition, to determine if there is concordance with skeletal muscle and the NMJ, and to identify novel synaptic proteins. Results Of 435 proteins identified, 300 mapped to Torpedo cDNA sequences with ≥2 peptides. We identified 14 uncharacterized proteins in the electric organ that are known to play a role in acetylcholine receptor clustering or signal transduction. In addition, two human open reading frames, C1orf123 and C6orf130, showed high sequence similarity to electric organ proteins. Our profile lists several proteins that are highly expressed in skeletal muscle or are muscle specific. Synaptic proteins such as acetylcholinesterase, acetylcholine receptor subunits, and rapsyn were present in the electric organ proteome but absent in the skeletal muscle proteome. Conclusions Our integrated genomic and proteomic analysis supports research describing a muscle-like profile of the organ. We show that it is a repository of NMJ proteins but we present limitations on its use as a comprehensive model of the NMJ. Finally, we identified several proteins that may become candidates for signaling proteins not previously characterized as components of the NMJ.

  1. Integrative Genomic Analysis of Cholangiocarcinoma Identifies Distinct IDH-Mutant Molecular Profiles

    Directory of Open Access Journals (Sweden)

    Farshad Farshidfar

    2017-03-01

    Full Text Available Cholangiocarcinoma (CCA is an aggressive malignancy of the bile ducts, with poor prognosis and limited treatment options. Here, we describe the integrated analysis of somatic mutations, RNA expression, copy number, and DNA methylation by The Cancer Genome Atlas of a set of predominantly intrahepatic CCA cases and propose a molecular classification scheme. We identified an IDH mutant-enriched subtype with distinct molecular features including low expression of chromatin modifiers, elevated expression of mitochondrial genes, and increased mitochondrial DNA copy number. Leveraging the multi-platform data, we observed that ARID1A exhibited DNA hypermethylation and decreased expression in the IDH mutant subtype. More broadly, we found that IDH mutations are associated with an expanded histological spectrum of liver tumors with molecular features that stratify with CCA. Our studies reveal insights into the molecular pathogenesis and heterogeneity of cholangiocarcinoma and provide classification information of potential therapeutic significance.

  2. Integrating population genetics and conservation biology in the era of genomics.

    Science.gov (United States)

    Ouborg, N Joop

    2010-02-23

    As one of the final activities of the ESF-CONGEN Networking programme, a conference entitled 'Integrating Population Genetics and Conservation Biology' was held at Trondheim, Norway, from 23 to 26 May 2009. Conference speakers and poster presenters gave a display of the state-of-the-art developments in the field of conservation genetics. Over the five-year running period of the successful ESF-CONGEN Networking programme, much progress has been made in theoretical approaches, basic research on inbreeding depression and other genetic processes associated with habitat fragmentation and conservation issues, and with applying principles of conservation genetics in the conservation of many species. Future perspectives were also discussed in the conference, and it was concluded that conservation genetics is evolving into conservation genomics, while at the same time basic and applied research on threatened species and populations from a population genetic point of view continues to be emphasized.

  3. Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis.

    Science.gov (United States)

    Zhou, Yan; Wang, Pei; Wang, Xianlong; Zhu, Ji; Song, Peter X-K

    2017-01-01

    The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology-sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer. © 2016 WILEY PERIODICALS, INC.

  4. DNA damage response and spindle assembly checkpoint function throughout the cell cycle to ensure genomic integrity.

    Directory of Open Access Journals (Sweden)

    Katherine S Lawrence

    2015-04-01

    Full Text Available Errors in replication or segregation lead to DNA damage, mutations, and aneuploidies. Consequently, cells monitor these events and delay progression through the cell cycle so repair precedes division. The DNA damage response (DDR, which monitors DNA integrity, and the spindle assembly checkpoint (SAC, which responds to defects in spindle attachment/tension during metaphase of mitosis and meiosis, are critical for preventing genome instability. Here we show that the DDR and SAC function together throughout the cell cycle to ensure genome integrity in C. elegans germ cells. Metaphase defects result in enrichment of SAC and DDR components to chromatin, and both SAC and DDR are required for metaphase delays. During persistent metaphase arrest following establishment of bi-oriented chromosomes, stability of the metaphase plate is compromised in the absence of DDR kinases ATR or CHK1 or SAC components, MAD1/MAD2, suggesting SAC functions in metaphase beyond its interactions with APC activator CDC20. In response to DNA damage, MAD2 and the histone variant CENPA become enriched at the nuclear periphery in a DDR-dependent manner. Further, depletion of either MAD1 or CENPA results in loss of peripherally associated damaged DNA. In contrast to a SAC-insensitive CDC20 mutant, germ cells deficient for SAC or CENPA cannot efficiently repair DNA damage, suggesting that SAC mediates DNA repair through CENPA interactions with the nuclear periphery. We also show that replication perturbations result in relocalization of MAD1/MAD2 in human cells, suggesting that the role of SAC in DNA repair is conserved.

  5. DNA-PKcs, ATM, and ATR Interplay Maintains Genome Integrity during Neurogenesis.

    Science.gov (United States)

    Enriquez-Rios, Vanessa; Dumitrache, Lavinia C; Downing, Susanna M; Li, Yang; Brown, Eric J; Russell, Helen R; McKinnon, Peter J

    2017-01-25

    The DNA damage response (DDR) orchestrates a network of cellular processes that integrates cell-cycle control and DNA repair or apoptosis, which serves to maintain genome stability. DNA-PKcs (the catalytic subunit of the DNA-dependent kinase, encoded by PRKDC), ATM (ataxia telangiectasia, mutated), and ATR (ATM and Rad3-related) are related PI3K-like protein kinases and central regulators of the DDR. Defects in these kinases have been linked to neurodegenerative or neurodevelopmental syndromes. In all cases, the key neuroprotective function of these kinases is uncertain. It also remains unclear how interactions between the three DNA damage-responsive kinases coordinate genome stability, particularly in a physiological context. Here, we used a genetic approach to identify the neural function of DNA-PKcs and the interplay between ATM and ATR during neurogenesis. We found that DNA-PKcs loss in the mouse sensitized neuronal progenitors to apoptosis after ionizing radiation because of excessive DNA damage. DNA-PKcs was also required to prevent endogenous DNA damage accumulation throughout the adult brain. In contrast, ATR coordinated the DDR during neurogenesis to direct apoptosis in cycling neural progenitors, whereas ATM regulated apoptosis in both proliferative and noncycling cells. We also found that ATR controls a DNA damage-induced G 2 /M checkpoint in cortical progenitors, independent of ATM and DNA-PKcs. These nonoverlapping roles were further confirmed via sustained murine embryonic or cortical development after all three kinases were simultaneously inactivated. Thus, our results illustrate how DNA-PKcs, ATM, and ATR have unique and essential roles during the DDR, collectively ensuring comprehensive genome maintenance in the nervous system. The DNA damage response (DDR) is essential for prevention of a broad spectrum of different human neurologic diseases. However, a detailed understanding of the DDR at a physiological level is lacking. In contrast to many in

  6. Integration of heterologous DNA into the genome of Paracoccus denitrificans is mediated by a family of IS1248-related elements and a second type of integrative recombination event

    NARCIS (Netherlands)

    Van Spanning, R J; Reijnders, W N; Stouthamer, A.H.

    All members of the IS1248 family residing in the genome of Paracoccus denitrificans have been isolated by using a set of insertion sequence entrapment vectors. The family consists of five closely related members that integrate the entrapment vectors at distinct sites. One of these, IS1248b, was

  7. Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1

    Energy Technology Data Exchange (ETDEWEB)

    Verhaak, Roel GW; Hoadley, Katherine A; Purdom, Elizabeth; Wang, Victoria; Qi, Yuan; Wilkerson, Matthew D; Miller, C Ryan; Ding, Li; Golub, Todd; Mesirov, Jill P; Alexe, Gabriele; Lawrence, Michael; O' Kelly, Michael; Tamayo, Pablo; Weir, Barbara A; Gabriel, Stacey; Winckler, Wendy; Gupta, Supriya; Jakkula, Lakshmi; Feiler, Heidi S; Hodgson, J Graeme; James, C David; Sarkaria, Jann N; Brennan, Cameron; Kahn, Ari; Spellman, Paul T; Wilson, Richard K; Speed, Terence P; Gray, Joe W; Meyerson, Matthew; Getz, Gad; Perou, Charles M; Hayes, D Neil; Network, The Cancer Genome Atlas Research

    2009-09-03

    The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefit in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.

  8. The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

    Directory of Open Access Journals (Sweden)

    Adam Alexander Thil Smith

    2012-05-01

    Full Text Available Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes, a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short. The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.

  9. Phylogeny and molecular signatures (conserved proteins and indels that are specific for the Bacteroidetes and Chlorobi species

    Directory of Open Access Journals (Sweden)

    Lorenzini Emily

    2007-05-01

    Full Text Available Abstract Background The Bacteroidetes and Chlorobi species constitute two main groups of the Bacteria that are closely related in phylogenetic trees. The Bacteroidetes species are widely distributed and include many important periodontal pathogens. In contrast, all Chlorobi are anoxygenic obligate photoautotrophs. Very few (or no biochemical or molecular characteristics are known that are distinctive characteristics of these bacteria, or are commonly shared by them. Results Systematic blast searches were performed on each open reading frame in the genomes of Porphyromonas gingivalis W83, Bacteroides fragilis YCH46, B. thetaiotaomicron VPI-5482, Gramella forsetii KT0803, Chlorobium luteolum (formerly Pelodictyon luteolum DSM 273 and Chlorobaculum tepidum (formerly Chlorobium tepidum TLS to search for proteins that are uniquely present in either all or certain subgroups of Bacteroidetes and Chlorobi. These studies have identified > 600 proteins for which homologues are not found in other organisms. This includes 27 and 51 proteins that are specific for most of the sequenced Bacteroidetes and Chlorobi genomes, respectively; 52 and 38 proteins that are limited to species from the Bacteroidales and Flavobacteriales orders, respectively, and 5 proteins that are common to species from these two orders; 185 proteins that are specific for the Bacteroides genus. Additionally, 6 proteins that are uniquely shared by species from the Bacteroidetes and Chlorobi phyla (one of them also present in the Fibrobacteres have also been identified. This work also describes two large conserved inserts in DNA polymerase III (DnaE and alanyl-tRNA synthetase that are distinctive characteristics of the Chlorobi species and a 3 aa deletion in ClpB chaperone that is mainly found in various Bacteroidales, Flavobacteriales and Flexebacteraceae, but generally not found in the homologs from other organisms. Phylogenetic analyses of the Bacteroidetes and Chlorobi species is also

  10. EasyCloneMulti: A Set of Vectors for Simultaneous and Multiple Genomic Integrations in Saccharomyces cerevisiae

    DEFF Research Database (Denmark)

    Maury, Jerome; Germann, Susanne Manuela; Jacobsen, Simo Abdessamad

    2016-01-01

    Saccharomyces cerevisiae is widely used in the biotechnology industry for production of ethanol, recombinant proteins, food ingredients and other chemicals. In order to generate highly producing and stable strains, genome integration of genes encoding metabolic pathway enzymes is the preferred...... of integrative vectors, EasyCloneMulti, that enables multiple and simultaneous integration of genes in S. cerevisiae. By creating vector backbones that combine consensus sequences that aim at targeting subsets of Ty sequences and a quickly degrading selective marker, integrations at multiple genomic loci...... and a range of expression levels were obtained, as assessed with the green fluorescent protein (GFP) reporter system. The EasyCloneMulti vector set was applied to balance the expression of the rate-controlling step in the β-alanine pathway for biosynthesis of 3-hydroxypropionic acid (3HP). The best 3HP...

  11. Integration of least angle regression with empirical Bayes for multi-locus genome-wide association studies

    Science.gov (United States)

    Multi-locus genome-wide association studies has become the state-of-the-art procedure to identify quantitative trait loci (QTL) associated with traits simultaneously. However, implementation of multi-locus model is still difficult. In this study, we integrated least angle regression with empirical B...

  12. A robust network of double-strand break repair pathways governs genome integrity during C. elegans development.

    NARCIS (Netherlands)

    Pontier, D.B.; Tijsterman, M.

    2009-01-01

    To preserve genomic integrity, various mechanisms have evolved to repair DNA double-strand breaks (DSBs). Depending on cell type or cell cycle phase, DSBs can be repaired error-free, by homologous recombination, or with concomitant loss of sequence information, via nonhomologous end-joining (NHEJ)

  13. Integrating Diverse Types of Genomic Data to Identify Genes that Underlie Adverse Pregnancy Phenotypes.

    Directory of Open Access Journals (Sweden)

    Jibril Hirbo

    Full Text Available Progress in understanding complex genetic diseases has been bolstered by synthetic approaches that overlay diverse data types and analyses to identify functionally important genes. Pre-term birth (PTB, a major complication of pregnancy, is a leading cause of infant mortality worldwide. A major obstacle in addressing PTB is that the mechanisms controlling parturition and birth timing remain poorly understood. Integrative approaches that overlay datasets derived from comparative genomics with function-derived ones have potential to advance our understanding of the genetics of birth timing, and thus provide insights into the genes that may contribute to PTB. We intersected data from fast evolving coding and non-coding gene regions in the human and primate lineage with data from genes expressed in the placenta, from genes that show enriched expression only in the placenta, as well as from genes that are differentially expressed in four distinct PTB clinical subtypes. A large fraction of genes that are expressed in placenta, and differentially expressed in PTB clinical subtypes (23-34% are fast evolving, and are associated with functions that include adhesion neurodevelopmental and immune processes. Functional categories of genes that express fast evolution in coding regions differ from those linked to fast evolution in non-coding regions. Finally, there is a surprising lack of overlap between fast evolving genes that are differentially expressed in four PTB clinical subtypes. Integrative approaches, especially those that incorporate evolutionary perspectives, can be successful in identifying potential genetic contributions to complex genetic diseases, such as PTB.

  14. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration

    KAUST Repository

    Suzuki, Keiichiro

    2016-11-15

    Targeted genome editing via engineered nucleases is an exciting area of biomedical research and holds potential for clinical applications. Despite rapid advances in the field, in vivo targeted transgene integration is still infeasible because current tools are inefficient1, especially for non-dividing cells, which compose most adult tissues. This poses a barrier for uncovering fundamental biological principles and developing treatments for a broad range of genetic disorders2. Based on clustered regularly interspaced short palindromic repeat/Cas9 (CRISPR/Cas9)3, 4 technology, here we devise a homology-independent targeted integration (HITI) strategy, which allows for robust DNA knock-in in both dividing and non-dividing cells in vitro and, more importantly, in vivo (for example, in neurons of postnatal mammals). As a proof of concept of its therapeutic potential, we demonstrate the efficacy of HITI in improving visual function using a rat model of the retinal degeneration condition retinitis pigmentosa. The HITI method presented here establishes new avenues for basic research and targeted gene therapies.

  15. Papillomavirus genomes in human cervical carcinoma: Analysis of their integration and transcriptional activity

    International Nuclear Information System (INIS)

    Matulic, M.; Soric, J.

    1994-01-01

    Eighty-four biopsies derived from cervical tissues were analyzed for the presence of human papillomavirus (HPV) DNA types 6, 16 and 18 using Southern blot hybridization. HPV 6 was found in none of the cervical biopsies, and HPV types 16 and 18 were found in 44% of them. The rate of HPV 16/18 positive samples increased proportionally to the severity of the lesion. In normal tissue there were no positive samples, in mild and moderate dysplasia HPV 16/18 was present in 20% and in severe dysplasia and invasive carcinomas in 37 and 50%, respectively. In biopsies from 13 cases with squamous cell carcinoma of the uterine cervix and CIN III lesions HPV 16 was integrated within the host genome. It was concluded that the virus could be integrated at variable, presumably randomly selected chromosomal loci and with different number of copies. Transcription of HPV 16 and 18 was detected in one cervical cancer in HeLa cells, respectively. These results imply that HPV types 16 and 18 play an etiological role in the carcinogenesis of human cervical epithelial cells. (author)

  16. Discovery of Cellular Proteins Required for the Early Steps of HCV Infection Using Integrative Genomics

    Science.gov (United States)

    Yang, Jae-Seong; Kwon, Oh Sung; Kim, Sanguk; Jang, Sung Key

    2013-01-01

    Successful viral infection requires intimate communication between virus and host cell, a process that absolutely requires various host proteins. However, current efforts to discover novel host proteins as therapeutic targets for viral infection are difficult. Here, we developed an integrative-genomics approach to predict human genes involved in the early steps of hepatitis C virus (HCV) infection. By integrating HCV and human protein associations, co-expression data, and tight junction-tetraspanin web specific networks, we identified host proteins required for the early steps in HCV infection. Moreover, we validated the roles of newly identified proteins in HCV infection by knocking down their expression using small interfering RNAs. Specifically, a novel host factor CD63 was shown to directly interact with HCV E2 protein. We further demonstrated that an antibody against CD63 blocked HCV infection, indicating that CD63 may serve as a new therapeutic target for HCV-related diseases. The candidate gene list provides a source for identification of new therapeutic targets. PMID:23593195

  17. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics

    Science.gov (United States)

    Cooper, Laurel; Meier, Austin; Laporte, Marie-Angélique; Elser, Justin L; Mungall, Chris; Sinn, Brandon T; Cavaliere, Dario; Carbon, Seth; Dunn, Nathan A; Smith, Barry; Qu, Botong; Preece, Justin; Zhang, Eugene; Todorovic, Sinisa; Gkoutos, Georgios; Doonan, John H; Stevenson, Dennis W; Arnaud, Elizabeth

    2018-01-01

    Abstract The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository. PMID:29186578

  18. Genetic counselors' views and experiences with the clinical integration of genome sequencing.

    Science.gov (United States)

    Machini, Kalotina; Douglas, Jessica; Braxton, Alicia; Tsipis, Judith; Kramer, Kate

    2014-08-01

    In recent years, new sequencing technologies known as next generation sequencing (NGS) have provided scientists the ability to rapidly sequence all known coding as well as non-coding sequences in the human genome. As the two emerging approaches, whole exome (WES) and whole genome (WGS) sequencing, have started to be integrated in the clinical arena, we sought to survey health care professionals who are likely to be involved in the implementation process now and/or in the future (e.g., genetic counselors, geneticists and nurse practitioners). Two hundred twenty-one genetic counselors- one third of whom currently offer WES/WGS-participated in an anonymous online survey. The aims of the survey were first, to identify barriers to the implementation of WES/WGS, as perceived by survey participants; second, to provide the first systematic report of current practices regarding the integration of WES/WGS in clinic and/or research across the US and Canada and to illuminate the roles and challenges of genetic counselors participating in this process; and third to evaluate the impact of WES/WGS on patient care. Our results showed that genetic counseling practices with respect to WES/WGS are consistent with the criteria set forth in the ACMG 2012 policy statement, which highlights indications for testing, reporting, and pre/post test considerations. Our respondents described challenges related to offering WES/WGS, which included billing issues, the duration and content of the consent process, result interpretation and disclosure of incidental findings and variants of unknown significance. In addition, respondents indicated that specialty area (i.e., prenatal and cancer), lack of clinical utility of WES/WGS and concerns about interpretation of test results were factors that prevented them from offering this technology to patients. Finally, study participants identified the aspects of their professional training which have been most beneficial in aiding with the integration of

  19. Damaging the Integrated HIV Proviral DNA with TALENs.

    Directory of Open Access Journals (Sweden)

    Christy L Strong

    Full Text Available HIV-1 integrates its proviral DNA genome into the host genome, presenting barriers for virus eradication. Several new gene-editing technologies have emerged that could potentially be used to damage integrated proviral DNA. In this study, we use transcription activator-like effector nucleases (TALENs to target a highly conserved sequence in the transactivation response element (TAR of the HIV-1 proviral DNA. We demonstrated that TALENs cleave a DNA template with the HIV-1 proviral target site in vitro. A GFP reporter, under control of HIV-1 TAR, was efficiently inactivated by mutations introduced by transfection of TALEN plasmids. When infected cells containing the full-length integrated HIV-1 proviral DNA were transfected with TALENs, the TAR region accumulated indels. When one of these mutants was tested, the mutated HIV-1 proviral DNA was incapable of producing detectable Gag expression. TALEN variants engineered for degenerate recognition of select nucleotide positions also cleaved proviral DNA in vitro and the full-length integrated proviral DNA genome in living cells. These results suggest a possible design strategy for the therapeutic considerations of incomplete target sequence conservation and acquired resistance mutations. We have established a new strategy for damaging integrated HIV proviral DNA that may have future potential for HIV-1 proviral DNA eradication.

  20. Gene disruptions using P transposable elements: an integral component of the Drosophila genome project.

    OpenAIRE

    Spradling, A C; Stern, D M; Kiss, I; Roote, J; Laverty, T; Rubin, G M

    1995-01-01

    Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome. DNA flanking the insertions is sequenced, thereby placing an extensive series of genetic markers on the physical genomic map and a...

  1. Selectable tolerance to herbicides by mutated acetolactate synthase genes integrated into the chloroplast genome of tobacco.

    Science.gov (United States)

    Shimizu, Masanori; Goto, Maki; Hanai, Moeko; Shimizu, Tsutomu; Izawa, Norihiko; Kanamoto, Hirosuke; Tomizawa, Ken-Ichi; Yokota, Akiho; Kobayashi, Hirokazu

    2008-08-01

    Strategies employed for the production of genetically modified (GM) crops are premised on (1) the avoidance of gene transfer in the field; (2) the use of genes derived from edible organisms such as plants; (3) preventing the appearance of herbicide-resistant weeds; and (4) maintaining transgenes without obstructing plant cell propagation. To this end, we developed a novel vector system for chloroplast transformation with acetolactate synthase (ALS). ALS catalyzes the first step in the biosynthesis of the branched amino acids, and its enzymatic activity is inhibited by certain classes of herbicides. We generated a series of Arabidopsis (Arabidopsis thaliana) mutated ALS (mALS) genes and introduced constructs with mALS and the aminoglycoside 3'-adenyltransferase gene (aadA) into the tobacco (Nicotiana tabacum) chloroplast genome by particle bombardment. Transplastomic plants were selected using their resistance to spectinomycin. The effects of herbicides on transplastomic mALS activity were examined by a colorimetric assay using the leaves of transplastomic plants. We found that transplastomic G121A, A122V, and P197S plants were specifically tolerant to pyrimidinylcarboxylate, imidazolinon, and sulfonylurea/pyrimidinylcarboxylate herbicides, respectively. Transplastomic plants possessing mALSs were able to grow in the presence of various herbicides, thus affirming the relationship between mALSs and the associated resistance to herbicides. Our results show that mALS genes integrated into the chloroplast genome are useful sustainable markers that function to exclude plants other than those that are GM while maintaining transplastomic crops. This investigation suggests that the resistance management of weeds in the field amid growing GM crops is possible using (1) a series of mALSs that confer specific resistance to herbicides and (2) a strategy that employs herbicide rotation.

  2. Integration of association statistics over genomic regions using Bayesian adaptive regression splines

    Directory of Open Access Journals (Sweden)

    Zhang Xiaohua

    2003-11-01

    Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.

  3. Evidence for Integrity of Parental Genomes in the Diploid Hybridogenetic Water Frog Pelophylax esculentus by Genomic in situ Hybridization

    Czech Academy of Sciences Publication Activity Database

    Zalésna, A.; Choleva, Lukáš; Ogielska, M.; Rábová, Marie; Marec, František; Ráb, Petr

    2011-01-01

    Roč. 134, č. 3 (2011), s. 206-212 ISSN 1424-8581 R&D Projects: GA MŠk LC06073; GA ČR GA523/09/2106 Institutional research plan: CEZ:AV0Z50450515; CEZ:AV0Z50070508 Keywords : Amphibia * Chromosomes * Genomic in situ hybridization (GISH) Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 1.533, year: 2011

  4. Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles.

    Science.gov (United States)

    Stepanauskas, Ramunas; Fergusson, Elizabeth A; Brown, Joseph; Poulton, Nicole J; Tupper, Ben; Labonté, Jessica M; Becraft, Eric D; Brown, Julia M; Pachiadaki, Maria G; Povilaitis, Tadas; Thompson, Brian P; Mascena, Corianna J; Bellows, Wendy K; Lubys, Arvydas

    2017-07-20

    Microbial single-cell genomics can be used to provide insights into the metabolic potential, interactions, and evolution of uncultured microorganisms. Here we present WGA-X, a method based on multiple displacement amplification of DNA that utilizes a thermostable mutant of the phi29 polymerase. WGA-X enhances genome recovery from individual microbial cells and viral particles while maintaining ease of use and scalability. The greatest improvements are observed when amplifying high G+C content templates, such as those belonging to the predominant bacteria in agricultural soils. By integrating WGA-X with calibrated index-cell sorting and high-throughput genomic sequencing, we are able to analyze genomic sequences and cell sizes of hundreds of individual, uncultured bacteria, archaea, protists, and viral particles, obtained directly from marine and soil samples, in a single experiment. This approach may find diverse applications in microbiology and in biomedical and forensic studies of humans and other multicellular organisms.Single-cell genomics can be used to study uncultured microorganisms. Here, Stepanauskas et al. present a method combining improved multiple displacement amplification and FACS, to obtain genomic sequences and cell size information from uncultivated microbial cells and viral particles in environmental samples.

  5. Integrate genome-based assessment of safety for probiotic strains: Bacillus coagulans GBI-30, 6086 as a case study.

    Science.gov (United States)

    Salvetti, Elisa; Orrù, Luigi; Capozzi, Vittorio; Martina, Alessia; Lamontanara, Antonella; Keller, David; Cash, Howard; Felis, Giovanna E; Cattivelli, Luigi; Torriani, Sandra; Spano, Giuseppe

    2016-05-01

    Probiotics are microorganisms that confer beneficial effects on the host; nevertheless, before being allowed for human consumption, their safety must be verified with accurate protocols. In the genomic era, such procedures should take into account the genomic-based approaches. This study aims at assessing the safety traits of Bacillus coagulans GBI-30, 6086 integrating the most updated genomics-based procedures and conventional phenotypic assays. Special attention was paid to putative virulence factors (VF), antibiotic resistance (AR) genes and genes encoding enzymes responsible for harmful metabolites (i.e. biogenic amines, BAs). This probiotic strain was phenotypically resistant to streptomycin and kanamycin, although the genome analysis suggested that the AR-related genes were not easily transferrable to other bacteria, and no other genes with potential safety risks, such as those related to VF or BA production, were retrieved. Furthermore, no unstable elements that could potentially lead to genomic rearrangements were detected. Moreover, a workflow is proposed to allow the proper taxonomic identification of a microbial strain and the accurate evaluation of risk-related gene traits, combining whole genome sequencing analysis with updated bioinformatics tools and standard phenotypic assays. The workflow presented can be generalized as a guideline for the safety investigation of novel probiotic strains to help stakeholders (from scientists to manufacturers and consumers) to meet regulatory requirements and avoid misleading information.

  6. IMGMD: A platform for the integration and standardisation of In silico Microbial Genome-scale Metabolic Models.

    Science.gov (United States)

    Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming

    2017-04-07

    Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.

  7. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics

    DEFF Research Database (Denmark)

    Zhao, Wenming; Wang, Jing; He, Ximiao

    2004-01-01

    Rice is a major food staple for the world's population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate....... Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (http://rise.genomics.org.cn/). Udgivelsesdato: 2004-Jan-1...

  8. MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data.

    Science.gov (United States)

    Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David

    2017-09-12

    The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.

  9. The humankind genome: from genetic diversity to the origin of human diseases.

    Science.gov (United States)

    Belizário, Jose E

    2013-12-01

    Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease's etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.

  10. Integrated Genomics Reveals Convergent Transcriptomic Networks Underlying Chronic Obstructive Pulmonary Disease and Idiopathic Pulmonary Fibrosis.

    Science.gov (United States)

    Kusko, Rebecca L; Brothers, John F; Tedrow, John; Pandit, Kusum; Huleihel, Luai; Perdomo, Catalina; Liu, Gang; Juan-Guardela, Brenda; Kass, Daniel; Zhang, Sherry; Lenburg, Marc; Martinez, Fernando; Quackenbush, John; Sciurba, Frank; Limper, Andrew; Geraci, Mark; Yang, Ivana; Schwartz, David A; Beane, Jennifer; Spira, Avrum; Kaminski, Naftali

    2016-10-15

    Despite shared environmental exposures, idiopathic pulmonary fibrosis (IPF) and chronic obstructive pulmonary disease are usually studied in isolation, and the presence of shared molecular mechanisms is unknown. We applied an integrative genomic approach to identify convergent transcriptomic pathways in emphysema and IPF. We defined the transcriptional repertoire of chronic obstructive pulmonary disease, IPF, or normal histology lungs using RNA-seq (n = 87). Genes increased in both emphysema and IPF relative to control were enriched for the p53/hypoxia pathway, a finding confirmed in an independent cohort using both gene expression arrays and the nCounter Analysis System (n = 193). Immunohistochemistry confirmed overexpression of HIF1A, MDM2, and NFKBIB members of this pathway in tissues from patients with emphysema or IPF. Using reads aligned across splice junctions, we determined that alternative splicing of p53/hypoxia pathway-associated molecules NUMB and PDGFA occurred more frequently in IPF or emphysema compared with control and validated these findings by quantitative polymerase chain reaction and the nCounter Analysis System on an independent sample set (n = 193). Finally, by integrating parallel microRNA and mRNA-Seq data on the same samples, we identified MIR96 as a key novel regulatory hub in the p53/hypoxia gene-expression network and confirmed that modulation of MIR96 in vitro recapitulates the disease-associated gene-expression network. Our results suggest convergent transcriptional regulatory hubs in diseases as varied phenotypically as chronic obstructive pulmonary disease and IPF and suggest that these hubs may represent shared key responses of the lung to environmental stresses.

  11. Predicting co-complexed protein pairs using genomic and proteomic data integration

    Directory of Open Access Journals (Sweden)

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  12. Automated integration of genomic physical mapping data via parallel simulated annealing

    Energy Technology Data Exchange (ETDEWEB)

    Slezak, T.

    1994-06-01

    The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via simulated annealing performed on a network of 40+ Unix machines in parallel, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the parallel net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.

  13. Super DNAging-New insights into DNA integrity, genome stability and telomeres in the oldest old.

    Science.gov (United States)

    Franzke, Bernhard; Neubauer, Oliver; Wagner, Karl-Heinz

    2015-01-01

    Reductions in DNA integrity, genome stability, and telomere length are strongly associated with the aging process, age-related diseases as well as the age-related loss of muscle mass. However, in people reaching an age far beyond their statistical life expectancy the prevalence of diseases, such as cancer, cardiovascular disease, diabetes or dementia, is much lower compared to "averagely" aged humans. These inverse observations in nonagenarians (90-99 years), centenarians (100-109 years) and super-centenarians (110 years and older) require a closer look into dynamics underlying DNA damage within the oldest old of our society. Available data indicate improved DNA repair and antioxidant defense mechanisms in "super old" humans, which are comparable with much younger cohorts. Partly as a result of these enhanced endogenous repair and protective mechanisms, the oldest old humans appear to cope better with risk factors for DNA damage over their lifetime compared to subjects whose lifespan coincides with the statistical life expectancy. This model is supported by study results demonstrating superior chromosomal stability, telomere dynamics and DNA integrity in "successful agers". There is also compelling evidence suggesting that life-style related factors including regular physical activity, a well-balanced diet and minimized psycho-social stress can reduce DNA damage and improve chromosomal stability. The most conclusive picture that emerges from reviewing the literature is that reaching "super old" age appears to be primarily determined by hereditary/genetic factors, while a healthy lifestyle additionally contributes to achieving the individual maximum lifespan in humans. More research is required in this rapidly growing population of super old people. In particular, there is need for more comprehensive investigations including short- and long-term lifestyle interventions as well as investigations focusing on the mechanisms causing DNA damage, mutations, and telomere

  14. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.

    Science.gov (United States)

    Pappas, Derek J; Marin, Wesley; Hollenbach, Jill A; Mack, Steven J

    2016-03-01

    Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) is an integrated data-analysis pipeline designed for the standardized analysis of highly-polymorphic genetic data, specifically for the HLA and KIR genetic systems. Most modern genetic analysis programs are designed for the analysis of single nucleotide polymorphisms, but the highly polymorphic nature of HLA and KIR data require specialized methods of data analysis. BIGDAWG performs case-control data analyses of highly polymorphic genotype data characteristic of the HLA and KIR loci. BIGDAWG performs tests for Hardy-Weinberg equilibrium, calculates allele frequencies and bins low-frequency alleles for k×2 and 2×2 chi-squared tests, and calculates odds ratios, confidence intervals and p-values for each allele. When multi-locus genotype data are available, BIGDAWG estimates user-specified haplotypes and performs the same binning and statistical calculations for each haplotype. For the HLA loci, BIGDAWG performs the same analyses at the individual amino-acid level. Finally, BIGDAWG generates figures and tables for each of these comparisons. BIGDAWG obviates the error-prone reformatting needed to traffic data between multiple programs, and streamlines and standardizes the data-analysis process for case-control studies of highly polymorphic data. BIGDAWG has been implemented as the bigdawg R package and as a free web application at bigdawg.immunogenomics.org. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  15. Integrative genomic analysis identifies isoleucine and CodY as regulators of Listeria monocytogenes virulence.

    Directory of Open Access Journals (Sweden)

    Lior Lobel

    2012-09-01

    Full Text Available Intracellular bacterial pathogens are metabolically adapted to grow within mammalian cells. While these adaptations are fundamental to the ability to cause disease, we know little about the relationship between the pathogen's metabolism and virulence. Here we used an integrative Metabolic Analysis Tool that combines transcriptome data with genome-scale metabolic models to define the metabolic requirements of Listeria monocytogenes during infection. Twelve metabolic pathways were identified as differentially active during L. monocytogenes growth in macrophage cells. Intracellular replication requires de novo synthesis of histidine, arginine, purine, and branch chain amino acids (BCAAs, as well as catabolism of L-rhamnose and glycerol. The importance of each metabolic pathway during infection was confirmed by generation of gene knockout mutants in the respective pathways. Next, we investigated the association of these metabolic requirements in the regulation of L. monocytogenes virulence. Here we show that limiting BCAA concentrations, primarily isoleucine, results in robust induction of the master virulence activator gene, prfA, and the PrfA-regulated genes. This response was specific and required the nutrient responsive regulator CodY, which is known to bind isoleucine. Further analysis demonstrated that CodY is involved in prfA regulation, playing a role in prfA activation under limiting conditions of BCAAs. This study evidences an additional regulatory mechanism underlying L. monocytogenes virulence, placing CodY at the crossroads of metabolism and virulence.

  16. From Genome to Phenotype: An Integrative Approach to Evaluate the Biodiversity of Lactococcus lactis

    Science.gov (United States)

    Laroute, Valérie; Tormo, Hélène; Couderc, Christel; Mercier-Bonin, Muriel; Le Bourgeois, Pascal; Cocaign-Bousquet, Muriel; Daveran-Mingot, Marie-Line

    2017-01-01

    Lactococcus lactis is one of the most extensively used lactic acid bacteria for the manufacture of dairy products. Exploring the biodiversity of L. lactis is extremely promising both to acquire new knowledge and for food and health-driven applications. L. lactis is divided into four subspecies: lactis, cremoris, hordniae and tructae, but only subsp. lactis and subsp. cremoris are of industrial interest. Due to its various biotopes, Lactococcus subsp. lactis is considered the most diverse. The diversity of L. lactis subsp. lactis has been assessed at genetic, genomic and phenotypic levels. Multi-Locus Sequence Type (MLST) analysis of strains from different origins revealed that the subsp. lactis can be classified in two groups: “domesticated” strains with low genetic diversity, and “environmental” strains that are the main contributors of the genetic diversity of the subsp. lactis. As expected, the phenotype investigation of L. lactis strains reported here revealed highly diverse carbohydrate metabolism, especially in plant- and gut-derived carbohydrates, diacetyl production and stress survival. The integration of genotypic and phenotypic studies could improve the relevance of screening culture collections for the selection of strains dedicated to specific functions and applications. PMID:28534821

  17. Integration of Genome-Wide TF Binding and Gene Expression Data to Characterize Gene Regulatory Networks in Plant Development.

    Science.gov (United States)

    Chen, Dijun; Kaufmann, Kerstin

    2017-01-01

    Key transcription factors (TFs) controlling the morphogenesis of flowers and leaves have been identified in the model plant Arabidopsis thaliana. Recent genome-wide approaches based on chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) enable systematic identification of genome-wide TF binding sites (TFBSs) of these regulators. Here, we describe a computational pipeline for analyzing ChIP-seq data to identify TFBSs and to characterize gene regulatory networks (GRNs) with applications to the regulatory studies of flower development. In particular, we provide step-by-step instructions on how to download, analyze, visualize, and integrate genome-wide data in order to construct GRNs for beginners of bioinformatics. The practical guide presented here is ready to apply to other similar ChIP-seq datasets to characterize GRNs of interest.

  18. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. Evolution of plant virus movement proteins from the 30K superfamily and of their homologs integrated in plant genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mushegian, Arcady R., E-mail: mushegian2@gmail.com [Division of Molecular and Cellular Biosciences, National Science Foundation, 4201 Wilson Boulevard, Arlington, VA 22230 (United States); Elena, Santiago F., E-mail: sfelena@ibmcp.upv.es [Instituto de Biología Molecular y Celular de Plantas, CSIC-UPV, 46022 València (Spain); The Santa Fe Institute, Santa Fe, NM 87501 (United States)

    2015-02-15

    Homologs of Tobacco mosaic virus 30K cell-to-cell movement protein are encoded by diverse plant viruses. Mechanisms of action and evolutionary origins of these proteins remain obscure. We expand the picture of conservation and evolution of the 30K proteins, producing sequence alignment of the 30K superfamily with the broadest phylogenetic coverage thus far and illuminating structural features of the core all-beta fold of these proteins. Integrated copies of pararetrovirus 30K movement genes are prevalent in euphyllophytes, with at least one copy intact in nearly every examined species, and mRNAs detected for most of them. Sequence analysis suggests repeated integrations, pseudogenizations, and positive selection in those provirus genes. An unannotated 30K-superfamily gene in Arabidopsis thaliana genome is likely expressed as a fusion with the At1g37113 transcript. This molecular background of endopararetrovirus gene products in plants may change our view of virus infection and pathogenesis, and perhaps of cellular homeostasis in the hosts. - Highlights: • Sequence region shared by plant virus “30K” movement proteins has an all-beta fold. • Most euphyllophyte genomes contain integrated copies of pararetroviruses. • These integrated virus genomes often include intact movement protein genes. • Molecular evidence suggests that these “30K” genes may be selected for function.

  20. Statistical Viewer: a tool to upload and integrate linkage and association data as plots displayed within the Ensembl genome browser

    Directory of Open Access Journals (Sweden)

    Hauser Elizabeth R

    2005-04-01

    Full Text Available Abstract Background To facilitate efficient selection and the prioritization of candidate complex disease susceptibility genes for association analysis, increasingly comprehensive annotation tools are essential to integrate, visualize and analyze vast quantities of disparate data generated by genomic screens, public human genome sequence annotation and ancillary biological databases. We have developed a plug-in package for Ensembl called "Statistical Viewer" that facilitates the analysis of genomic features and annotation in the regions of interest defined by linkage analysis. Results Statistical Viewer is an add-on package to the open-source Ensembl Genome Browser and Annotation System that displays disease study-specific linkage and/or association data as 2 dimensional plots in new panels in the context of Ensembl's Contig View and Cyto View pages. An enhanced upload server facilitates the upload of statistical data, as well as additional feature annotation to be displayed in DAS tracts, in the form of Excel Files. The Statistical View panel, drawn directly under the ideogram, illustrates lod score values for markers from a study of interest that are plotted against their position in base pairs. A module called "Get Map" easily converts the genetic locations of markers to genomic coordinates. The graph is placed under the corresponding ideogram features a synchronized vertical sliding selection box that is seamlessly integrated into Ensembl's Contig- and Cyto- View pages to choose the region to be displayed in Ensembl's "Overview" and "Detailed View" panels. To resolve Association and Fine mapping data plots, a "Detailed Statistic View" plot corresponding to the "Detailed View" may be displayed underneath. Conclusion Features mapping to regions of linkage are accentuated when Statistic View is used in conjunction with the Distributed Annotation System (DAS to display supplemental laboratory information such as differentially expressed disease

  1. CoryneCenter – An online resource for the integrated analysis of corynebacterial genome and transcriptome data

    Directory of Open Access Journals (Sweden)

    Hüser Andrea T

    2007-11-01

    Full Text Available Abstract Background The introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics. Results To facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1 GenDB, an open source genome annotation system, (2 EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3 CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions. Conclusion CoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at http://www.CoryneCenter.de.

  2. Comprehensive profiling of retroviral integration sites using target enrichment methods from historical koala samples without an assembled reference genome

    Directory of Open Access Journals (Sweden)

    Pin Cui

    2016-03-01

    Full Text Available Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV is currently invading the germline of the koala (Phascolarctos cinereus and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small.

  3. Ectopic Expression of O Antigen in Bordetella pertussis by a Novel Genomic Integration System.

    Science.gov (United States)

    Ishigaki, Keisuke; Shinzawa, Naoaki; Nishikawa, Sayaka; Suzuki, Koichiro; Fukui-Miyazaki, Aya; Horiguchi, Yasuhiko

    2018-01-01

    We describe a novel genome integration system that enables the introduction of DNA fragments as large as 50 kbp into the chromosomes of recipient bacteria. This system, named BPI, comprises a bacterial artificial chromosome vector and phage-derived gene integration machinery. We introduced the wbm locus of Bordetella bronchiseptica , which is required for O antigen biosynthesis, into the chromosome of B. pertussis , which intrinsically lacks O antigen, using the BPI system. After the introduction of the wbm locus, B. pertussis presented an additional substance in the lipooligosaccharide fraction that was specifically recognized by the anti- B. bronchiseptica antibody but not the anti- B. pertussis antibody, indicating that B. pertussis expressed O antigen corresponding to that of B. bronchiseptica . O antigen-expressing B. pertussis was less sensitive to the bactericidal effects of serum and polymyxin B than the isogenic parental strain. In addition, an in vivo competitive infection assay showed that O antigen-expressing B. pertussis dominantly colonized the mouse respiratory tract over the parental strain. These results indicate that the BPI system provides a means to alter the phenotypes of bacteria by introducing large exogenous DNA fragments. IMPORTANCE Some bacterial phenotypes emerge through the cooperative functions of a number of genes residing within a large genetic locus. To transfer the phenotype of one bacterium to another, a means to introduce the large genetic locus into the recipient bacterium is needed. Therefore, we developed a novel system by combining the advantages of a bacterial artificial chromosome vector and phage-derived gene integration machinery. In this study, we succeeded for the first time in introducing a gene locus involved in O antigen biosynthesis of Bordetella bronchiseptica into the chromosome of B. pertussis , which intrinsically lacks O antigen, and using this system we analyzed phenotypic alterations in the resultant

  4. Development of an integrated genome informatics, data management and workflow infrastructure: A toolbox for the study of complex disease genetics

    Directory of Open Access Journals (Sweden)

    Burren Oliver S

    2004-01-01

    Full Text Available Abstract The genetic dissection of complex disease remains a significant challenge. Sample-tracking and the recording, processing and storage of high-throughput laboratory data with public domain data, require integration of databases, genome informatics and genetic analyses in an easily updated and scaleable format. To find genes involved in multifactorial diseases such as type 1 diabetes (T1D, chromosome regions are defined based on functional candidate gene content, linkage information from humans and animal model mapping information. For each region, genomic information is extracted from Ensembl, converted and loaded into ACeDB for manual gene annotation. Homology information is examined using ACeDB tools and the gene structure verified. Manually curated genes are extracted from ACeDB and read into the feature database, which holds relevant local genomic feature data and an audit trail of laboratory investigations. Public domain information, manually curated genes, polymorphisms, primers, linkage and association analyses, with links to our genotyping database, are shown in Gbrowse. This system scales to include genetic, statistical, quality control (QC and biological data such as expression analyses of RNA or protein, all linked from a genomics integrative display. Our system is applicable to any genetic study of complex disease, of either large or small scale.

  5. HiView: an integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants.

    Science.gov (United States)

    Xu, Zheng; Zhang, Guosheng; Duan, Qing; Chai, Shengjie; Zhang, Baqun; Wu, Cong; Jin, Fulai; Yue, Feng; Li, Yun; Hu, Ming

    2016-03-11

    Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits and diseases. However, most of them are located in the non-protein coding regions, and therefore it is challenging to hypothesize the functions of these non-coding GWAS variants. Recent large efforts such as the ENCODE and Roadmap Epigenomics projects have predicted a large number of regulatory elements. However, the target genes of these regulatory elements remain largely unknown. Chromatin conformation capture based technologies such as Hi-C can directly measure the chromatin interactions and have generated an increasingly comprehensive catalog of the interactome between the distal regulatory elements and their potential target genes. Leveraging such information revealed by Hi-C holds the promise of elucidating the functions of genetic variants in human diseases. In this work, we present HiView, the first integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants. HiView is able to display Hi-C data and statistical evidence for chromatin interactions in genomic regions surrounding any given GWAS variant, enabling straightforward visualization and interpretation. We believe that as the first GWAS variants-centered Hi-C genome browser, HiView is a useful tool guiding post-GWAS functional genomics studies. HiView is freely accessible at: http://www.unc.edu/~yunmli/HiView .

  6. Construction of reference chromosome-scale pseudomolecules for potato: integrating the potato genome with genetic and physical maps.

    Science.gov (United States)

    Sharma, Sanjeev Kumar; Bolser, Daniel; de Boer, Jan; Sønderkær, Mads; Amoros, Walter; Carboni, Martin Federico; D'Ambrosio, Juan Martín; de la Cruz, German; Di Genova, Alex; Douches, David S; Eguiluz, Maria; Guo, Xiao; Guzman, Frank; Hackett, Christine A; Hamilton, John P; Li, Guangcun; Li, Ying; Lozano, Roberto; Maass, Alejandro; Marshall, David; Martinez, Diana; McLean, Karen; Mejía, Nilo; Milne, Linda; Munive, Susan; Nagy, Istvan; Ponce, Olga; Ramirez, Manuel; Simon, Reinhard; Thomson, Susan J; Torres, Yerisf; Waugh, Robbie; Zhang, Zhonghua; Huang, Sanwen; Visser, Richard G F; Bachem, Christian W B; Sagredo, Boris; Feingold, Sergio E; Orjeda, Gisella; Veilleux, Richard E; Bonierbale, Merideth; Jacobs, Jeanne M E; Milbourne, Dan; Martin, David Michael Alan; Bryan, Glenn J

    2013-11-06

    The genome of potato, a major global food crop, was recently sequenced. The work presented here details the integration of the potato reference genome (DM) with a new sequence-tagged site marker-based linkage map and other physical and genetic maps of potato and the closely related species tomato. Primary anchoring of the DM genome assembly was accomplished by the use of a diploid segregating population, which was genotyped with several types of molecular genetic markers to construct a new ~936 cM linkage map comprising 2469 marker loci. In silico anchoring approaches used genetic and physical maps from the diploid potato genotype RH89-039-16 (RH) and tomato. This combined approach has allowed 951 superscaffolds to be ordered into pseudomolecules corresponding to the 12 potato chromosomes. These pseudomolecules represent 674 Mb (~93%) of the 723 Mb genome assembly and 37,482 (~96%) of the 39,031 predicted genes. The superscaffold order and orientation within the pseudomolecules are closely collinear with independently constructed high density linkage maps. Comparisons between marker distribution and physical location reveal regions of greater and lesser recombination, as well as regions exhibiting significant segregation distortion. The work presented here has led to a greatly improved ordering of the potato reference genome superscaffolds into chromosomal "pseudomolecules".

  7. Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.

    Science.gov (United States)

    Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang

    2015-03-01

    The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  8. LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Rottger, Richard; Hauschild, Anne-Christin

    2017-01-01

    Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features ...

  9. An integrated map of genetic variation from 1.092 human genomes

    DEFF Research Database (Denmark)

    Abecasis, Goncalo R.; Auton, Adam; Brooks, Lisa D.

    2012-01-01

    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination ...

  10. tigaR: integrative significance analysis of temporal differential gene expression induced by genomic abnormalities

    NARCIS (Netherlands)

    Miok, V.; Wilting, S.M.; van de Wiel, M.A.; Jaspers, A.; van Noort, P.I.; Brakenhoff, R.H.; Snijders, P.J.F.; Steenbergen, R.D.M.; van Wieringen, W.N.

    2014-01-01

    Background: To determine which changes in the host cell genome are crucial for cervical carcinogenesis, a longitudinal in vitro model system of HPV-transformed keratinocytes was profiled in a genome-wide manner. Four cell lines affected with either HPV16 or HPV18 were assayed at 8 sequential time

  11. Stable integration of recombinant adeno-associated virus vector genomes after transduction of murine hematopoietic stem cells.

    Science.gov (United States)

    Han, Zongchao; Zhong, Li; Maina, Njeri; Hu, Zhongbo; Li, Xiaomiao; Chouthai, Nitin S; Bischof, Daniela; Weigel-Van Aken, Kirsten A; Slayton, William B; Yoder, Mervin C; Srivastava, Arun

    2008-03-01

    We previously reported that among single-stranded adeno-associated virus (ssAAV) vectors, serotypes 1 through 5, ssAAV1 is the most efficient in transducing murine hematopoietic stem cells (HSCs), but viral second-strand DNA synthesis remains a rate-limiting step. Subsequently, using double-stranded, self-complementary AAV (scAAV) vectors, serotypes 7 through 10, we observed that scAAV7 vectors also transduce murine HSCs efficiently. In the present study, we used scAAV1 and scAAV7 shuttle vectors to transduce HSCs in a murine bone marrow serial transplant model in vivo, which allowed examination of the AAV proviral integration pattern in the mouse genome, as well as recovery and nucleotide sequence analyses of AAV-HSC DNA junction fragments. The proviral genomes were stably integrated, and integration sites were localized to different mouse chromosomes. None of the integration sites was found to be in a transcribed gene, or near a cellular oncogene. None of the animals, monitored for up to 1 year, exhibited pathological abnormalities. Thus, AAV proviral integration-induced risk of oncogenesis was not found in our study, which provides functional confirmation of stable transduction of self-renewing multipotential HSCs by scAAV vectors as well as promise for the use of these vectors in the potential treatment of disorders of the hematopoietic system.

  12. Photon-induced cell migration and integrin expression promoted by DNA integration of HPV16 genome

    International Nuclear Information System (INIS)

    Rieken, Stefan; Simon, Florian; Habermehl, Daniel; Dittmar, Jan Oliver; Combs, Stephanie E.; Weber, Klaus; Debus, Juergen; Lindel, Katja

    2014-01-01

    Persistent human papilloma virus 16 (HPV16) infections are a major cause of cervical cancer. The integration of the viral DNA into the host genome causes E2 gene disruption which prevents apoptosis and increases host cell motility. In cervical cancer patients, survival is limited by local infiltration and systemic dissemination. Surgical control rates are poor in cases of parametrial infiltration. In these patients, radiotherapy (RT) is administered to enhance local control. However, photon irradiation itself has been reported to increase cell motility. In cases of E2-disrupted cervical cancers, this phenomenon would impose an additional risk of enhanced tumor cell motility. Here, we analyze mechanisms underlying photon-increased migration in keratinocytes with differential E2 gene status. Isogenic W12 (intact E2 gene status) and S12 (disrupted E2 gene status) keratinocytes were analyzed in fibronectin-based and serum-stimulated migration experiments following single photon doses of 0, 2, and 10 Gy. Quantitative FACS analyses of integrin expression were performed. Migration and adhesion are increased in E2 gene-disrupted keratinocytes. E2 gene disruption promotes attractability by serum components, therefore, effectuating the risk of local infiltration and systemic dissemination. In S12 cells, migration is further increased by photon RT which leads to enhanced expression of fibronectin receptor integrins. HPV16-associated E2 gene disruption is a main predictor of treatment-refractory cancer virulence. E2 gene disruption promotes cell motility. Following photon RT, E2-disrupted tumors bear the risk of integrin-related infiltration and dissemination. (orig.) [de

  13. Photon-induced cell migration and integrin expression promoted by DNA integration of HPV16 genome

    Energy Technology Data Exchange (ETDEWEB)

    Rieken, Stefan; Simon, Florian; Habermehl, Daniel; Dittmar, Jan Oliver; Combs, Stephanie E.; Weber, Klaus; Debus, Juergen; Lindel, Katja [University Hospital of Heidelberg, Department of Radiation Therapy and Radiation Oncology, Heidelberg (Germany)

    2014-10-15

    Persistent human papilloma virus 16 (HPV16) infections are a major cause of cervical cancer. The integration of the viral DNA into the host genome causes E2 gene disruption which prevents apoptosis and increases host cell motility. In cervical cancer patients, survival is limited by local infiltration and systemic dissemination. Surgical control rates are poor in cases of parametrial infiltration. In these patients, radiotherapy (RT) is administered to enhance local control. However, photon irradiation itself has been reported to increase cell motility. In cases of E2-disrupted cervical cancers, this phenomenon would impose an additional risk of enhanced tumor cell motility. Here, we analyze mechanisms underlying photon-increased migration in keratinocytes with differential E2 gene status. Isogenic W12 (intact E2 gene status) and S12 (disrupted E2 gene status) keratinocytes were analyzed in fibronectin-based and serum-stimulated migration experiments following single photon doses of 0, 2, and 10 Gy. Quantitative FACS analyses of integrin expression were performed. Migration and adhesion are increased in E2 gene-disrupted keratinocytes. E2 gene disruption promotes attractability by serum components, therefore, effectuating the risk of local infiltration and systemic dissemination. In S12 cells, migration is further increased by photon RT which leads to enhanced expression of fibronectin receptor integrins. HPV16-associated E2 gene disruption is a main predictor of treatment-refractory cancer virulence. E2 gene disruption promotes cell motility. Following photon RT, E2-disrupted tumors bear the risk of integrin-related infiltration and dissemination. (orig.) [German] Persistierende Infektionen mit humanen Papillomaviren 16 (HPV16) sind ein Hauptausloeser des Zervixkarzinoms. Die Integration der viralen DNS in das Wirtszellgenom fuehrt zum Integritaetsverlust des E2-Gens, wodurch in der Wirtszelle Apoptose verhindert und Motilitaet gesteigert werden. In

  14. DFAST and DAGA: web-based integrated genome annotation tools and resources.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori

    2016-01-01

    Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.

  15. De novo assembly of a haplotype-resolved human genome.

    Science.gov (United States)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  16. Electroclinical presentation and genotype-phenotype relationships in patients with Unverricht-Lundborg disease carrying compound heterozygous CSTB point and indel mutations.

    Science.gov (United States)

    Canafoglia, Laura; Gennaro, Elena; Capovilla, Giuseppe; Gobbi, Giuseppe; Boni, Antonella; Beccaria, Francesca; Viri, Maurizio; Michelucci, Roberto; Agazzi, Pamela; Assereto, Stefania; Coviello, Domenico A; Di Stefano, Maria; Rossi Sebastiano, Davide; Franceschetti, Silvana; Zara, Federico

    2012-12-01

    Unverricht-Lundborg disease (EPM1A) is frequently due to an unstable expansion of a dodecamer repeat in the CSTB gene, whereas other types of mutations are rare. EPM1A due to homozygous expansion has a rather stereotyped presentation with prominent action myoclonus. We describe eight patients with five different compound heterozygous CSTB point or indel mutations in order to highlight their particular phenotypical presentations and evaluate their genotype-phenotype relationships. We screened CSTB mutations by means of Southern blotting and the sequencing of the genomic DNA of each proband. CSTB messenger RNA (mRNA) aberrations were characterized by sequencing the complementary DNA (cDNA) of lymphoblastoid cells, and assessing the protein concentrations in the lymphoblasts. The patient evaluations included the use of a simplified myoclonus severity rating scale, multiple neurophysiologic tests, and electroencephalography (EEG)-polygraphic recordings. To highlight the particular clinical features and disease time-course in compound heterozygous patients, we compared some of their characteristics with those observed in a series of 40 patients carrying the common homozygous expansion mutation observed at the C. Besta Foundation, Milan, Italy. The eight compound heterozygous patients belong to six EPM1A families (out of 52; 11.5%) diagnosed at the Laboratory of Genetics of the Galliera Hospitals in Genoa, Italy. They segregated five different heterozygous point or indel mutations in association with the common dodecamer expansion. Four patients from three families had previously reported CSTB mutations (c.67-1G>C and c.168+1_18del); one had a novel nonsense mutation at the first exon (c.133C>T) leading to a premature stop codon predicting a short peptide; the other three patients from two families had a complex novel indel mutation involving the donor splice site of intron 2 (c.168+2_169+21delinsAA) and leading to an aberrant transcript with a partially retained intron

  17. CRISPR-Cas9-Mediated Genome Editing and Transcriptional Control in Yarrowia lipolytica.

    Science.gov (United States)

    Schwartz, Cory; Wheeldon, Ian

    2018-01-01

    The discovery and adaptation of RNA-guided nucleases has resulted in the rapid development of efficient, scalable, and easily accessible synthetic biology tools for targeted genome editing and transcriptional control. In these systems, for example CRISPR-Cas9 from Streptococcus pyogenes, a protein with nuclease activity is targeted to a specific nucleotide sequence by a short RNA molecule, whereupon binding it cleaves the targeted nucleotide strand. To extend this genome-editing ability to the industrially important oleaginous yeast Yarrowia lipolytica, we developed a set of easily usable and effective CRISPR-Cas9 episomal vectors. In this protocols chapter, we first present a method by which arbitrary protein-coding genes can be disrupted via indel formation after CRISPR-Cas9 targeting. A second method demonstrates how the same CRISPR-Cas9 system can be used to induce markerless gene cassette integration into the genome by inducing homologous recombination after DNA cleavage by Cas9. Finally, we describe how a catalytically inactive form of Cas9 fused to a transcriptional repressor can be used to control transcription of native genes in Y. lipolytica. The CRISPR-Cas9 tools and strategies described here greatly increase the types of genome editing and transcriptional control that can be achieved in Y. lipolytica, and promise to facilitate more advanced engineering of this important oleaginous host.

  18. Involvement of the Ventrolateral Prefrontal Cortex in Learning Others' Bad Reputations and Indelible Distrust.

    Science.gov (United States)

    Suzuki, Atsunobu; Ito, Yuichi; Kiyama, Sachiko; Kunimi, Mitsunobu; Ohira, Hideki; Kawaguchi, Jun; Tanabe, Hiroki C; Nakai, Toshiharu

    2016-01-01

    A bad reputation can persistently affect judgments of an individual even when it turns out to be invalid and ought to be disregarded. Such indelible distrust may reflect that the negative evaluation elicited by a bad reputation transfers to a person. Consequently, the person him/herself may come to activate this negative evaluation irrespective of the accuracy of the reputation. If this theoretical model is correct, an evaluation-related brain region will be activated when witnessing a person whose bad reputation one has learned about, regardless of whether the reputation is deemed valid or not. Here, we tested this neural hypothesis with functional magnetic resonance imaging (fMRI). Participants memorized faces paired with either a good or a bad reputation. Next, they viewed the faces alone and inferred whether each person was likely to cooperate, first while retrieving the reputations, and then while trying to disregard them as false. A region of the left ventrolateral prefrontal cortex (vlPFC), which may be involved in negative evaluation, was activated by faces previously paired with bad reputations, irrespective of whether participants attempted to retrieve or disregard these reputations. Furthermore, participants showing greater activity of the left ventrolateral prefrontal region in response to the faces with bad reputations were more likely to infer that these individuals would not cooperate. Thus, once associated with a bad reputation, a person may elicit evaluation-related brain responses on their own, thereby evoking distrust independently of their reputation.

  19. Structural analysis of polarizing indels: an emerging consensus on the root of the tree of life

    Directory of Open Access Journals (Sweden)

    Bourne Philip E

    2009-08-01

    Full Text Available Abstract Background The root of the tree of life has been a holy grail ever since Darwin first used the tree as a metaphor for evolution. New methods seek to narrow down the location of the root by excluding it from branches of the tree of life. This is done by finding traits that must be derived, and excluding the root from the taxa those traits cover. However the two most comprehensive attempts at this strategy, performed by Cavalier-Smith and Lake et al., have excluded each other's rootings. Results The indel polarizations of Lake et al. rely on high quality alignments between paralogs that diverged before the last universal common ancestor (LUCA. Therefore, sequence alignment artifacts may skew their conclusions. We have reviewed their data using protein structure information where available. Several of the conclusions are quite different when viewed in the light of structure which is conserved over longer evolutionary time scales than sequence. We argue there is no polarization that excludes the root from all Gram-negatives, and that polarizations robustly exclude the root from the Archaea. Conclusion We conclude that there is no contradiction between the polarization datasets. The combination of these datasets excludes the root from every possible position except near the Chloroflexi. Reviewers This article was reviewed by Greg Fournier (nominated by J. Peter Gogarten, Purificación López-García, and Eugene Koonin.

  20. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    Science.gov (United States)

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  1. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. The Integrated Genomic Architecture and Evolution of Dental Divergence in East African Cichlid Fishes (Haplochromis chilotes x H. nyererei

    Directory of Open Access Journals (Sweden)

    C. Darrin Hulsey

    2017-09-01

    Full Text Available The independent evolution of the two toothed jaws of cichlid fishes is thought to have promoted their unparalleled ecological divergence and species richness. However, dental divergence in cichlids could exhibit substantial genetic covariance and this could dictate how traits like tooth numbers evolve in different African Lakes and on their two jaws. To test this hypothesis, we used a hybrid mapping cross of two trophically divergent Lake Victoria species (Haplochromis chilotes × Haplochromis nyererei to examine genomic regions associated with cichlid tooth diversity. Surprisingly, a similar genomic region was found to be associated with oral jaw tooth numbers in cichlids from both Lake Malawi and Lake Victoria. Likewise, this same genomic location was associated with variation in pharyngeal jaw tooth numbers. Similar relationships between tooth numbers on the two jaws in both our Victoria hybrid population and across the phylogenetic diversity of Malawi cichlids additionally suggests that tooth numbers on the two jaws of haplochromine cichlids might generally coevolve owing to shared genetic underpinnings. Integrated, rather than independent, genomic architectures could be key to the incomparable evolutionary divergence and convergence in cichlid tooth numbers.

  3. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal

    Science.gov (United States)

    Gao, Jianjiong; Aksoy, Bülent Arman; Dogrusoz, Ugur; Dresdner, Gideon; Gross, Benjamin; Sumer, S. Onur; Sun, Yichao; Jacobsen, Anders; Sinha, Rileen; Larsson, Erik; Cerami, Ethan; Sander, Chris; Schultz, Nikolaus

    2014-01-01

    The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics. PMID:23550210

  4. Integrative genome-wide expression profiling identifies three distinct molecular subgroups of renal cell carcinoma with different patient outcome

    Directory of Open Access Journals (Sweden)

    Beleut Manfred

    2012-07-01

    Full Text Available Abstract Background Renal cell carcinoma (RCC is characterized by a number of diverse molecular aberrations that differ among individuals. Recent approaches to molecularly classify RCC were based on clinical, pathological as well as on single molecular parameters. As a consequence, gene expression patterns reflecting the sum of genetic aberrations in individual tumors may not have been recognized. In an attempt to uncover such molecular features in RCC, we used a novel, unbiased and integrative approach. Methods We integrated gene expression data from 97 primary RCC of different pathologic parameters, 15 RCC metastases as well as 34 cancer cell lines for two-way nonsupervised hierarchical clustering using gene groups suggested by the PANTHER Classification System. We depicted the genomic landscape of the resulted tumor groups by means of Single Nuclear Polymorphism (SNP technology. Finally, the achieved results were immunohistochemically analyzed using a tissue microarray (TMA composed of 254 RCC. Results We found robust, genome wide expression signatures, which split RCC into three distinct molecular subgroups. These groups remained stable even if randomly selected gene sets were clustered. Notably, the pattern obtained from RCC cell lines was clearly distinguishable from that of primary tumors. SNP array analysis demonstrated differing frequencies of chromosomal copy number alterations among RCC subgroups. TMA analysis with group-specific markers showed a prognostic significance of the different groups. Conclusion We propose the existence of characteristic and histologically independent genome-wide expression outputs in RCC with potential biological and clinical relevance.

  5. An Integrative Genomic Island Affects the Adaptations of Piezophilic Hyperthermophilic Archaeon Pyrococcus yayanosii to High Temperature and High Hydrostatic Pressure

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2016-11-01

    Full Text Available Deep-sea hydrothermal vent environments are characterized by high hydrostatic pressure and sharp temperature and chemical gradients. Horizontal gene transfer is thought to play an important role in the microbial adaptation to such an extreme environment. In this study, a 21.4-kb DNA fragment was identified as a genomic island, designated PYG1, in the genomic sequence of the piezophilic hyperthermophile Pyrococcus yayanosii. According to the sequence alignment and functional annotation, the genes in PYG1 could tentatively be divided into five modules, with functions related to mobility, DNA repair, metabolic processes and the toxin-antitoxin system. Integrase can mediate the site-specific integration and excision of PYG1 in the chromosome of P. yayanosii A1. Gene replacement of PYG1 with a SimR cassette was successful. The growth of the mutant strain ∆PYG1 was compared with its parent strain P. yayanosii A2 under various stress conditions, including different pH, salinity, temperature and hydrostatic pressure. The ∆PYG1 mutant strain showed reduced growth when grown at 100 °C, while the biomass of ∆PYG1 increased significantly when cultured at 80 MPa. Differential expression of the genes in module Ⅲ of PYG1 was observed under different temperature and pressure conditions. This study demonstrates the first example of an archaeal integrative genomic island that could affect the adaptation of the hyperthermophilic piezophile P. yayanosii to high temperature and high hydrostatic pressure.

  6. IW-Scoring: an Integrative Weighted Scoring framework for annotating and prioritizing genetic variations in the noncoding genome.

    Science.gov (United States)

    Wang, Jun; Dayem Ullah, Abu Z; Chelala, Claude

    2018-01-30

    The vast majority of germline and somatic variations occur in the noncoding part of the genome, only a small fraction of which are believed to be functional. From the tens of thousands of noncoding variations detectable in each genome, identifying and prioritizing driver candidates with putative functional significance is challenging. To address this, we implemented IW-Scoring, a new Integrative Weighted Scoring model to annotate and prioritise functionally relevant noncoding variations. We evaluate 11 scoring methods, and apply an unsupervised spectral approach for subsequent selective integration into two linear weighted functional scoring schemas for known and novel variations. IW-Scoring produces stable high-quality performance as the best predictors for three independent data sets. We demonstrate the robustness of IW-Scoring in identifying recurrent functional mutations in the TERT promoter, as well as disease SNPs in proximity to consensus motifs and with gene regulatory effects. Using follicular lymphoma as a paradigmatic cancer model, we apply IW-Scoring to locate 11 recurrently mutated noncoding regions in 14 follicular lymphoma genomes, and validate 9 of these regions in an extension cohort, including the promoter and enhancer regions of PAX5. Overall, IW-Scoring demonstrates greater versatility in identifying trait- and disease-associated noncoding variants. Scores from IW-Scoring as well as other methods are freely available from http://www.snp-nexus.org/IW-Scoring/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Repair of oxidative DNA base damage in the host genome influences the HIV integration site sequence preference.

    Directory of Open Access Journals (Sweden)

    Geoffrey R Bennett

    Full Text Available Host base excision repair (BER proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1 and mutY homolog (MYH as well as DNA polymerase beta (Polβ. While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5'dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggested Polβ DNA synthesis activity is not necessary while 5'dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.

  8. Genetic Diversity of Myanmar and Indonesia Native Chickens Together with Two Jungle Fowl Species by Using 102 Indels Polymorphisms

    Directory of Open Access Journals (Sweden)

    Aye Aye Maw

    2012-07-01

    Full Text Available The efficiency of insertion and/or deletion (indels polymorphisms as genetic markers was evaluated by genotyping 102 indels loci in native chicken populations from Myanmar and Indonesia as well as Red jungle fowls and Green jungle fowls from Java Island. Out of the 102 indel markers, 97 were polymorphic. The average observed and expected heterozygosities were 0.206 to 0.268 and 0.229 to 0.284 in native chicken populations and 0.003 to 0.101 and 0.012 to 0.078 in jungle fowl populations. The coefficients of genetic differentiation (Gst of the native chicken populations from Myanmar and Indonesia were 0.041 and 0.098 respectively. The genetic variability is higher among native chicken populations than jungle fowl populations. The high Gst value was found between native chicken populations and jungle fowl populations. Neighbor-joining tree using genetic distance revealed that the native chickens from two countries were genetically close to each other and remote from Red and Green jungle fowls of Java Island.

  9. Genomic variation and its impact on gene expression in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Andreas Massouras

    Full Text Available Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels, and complex variants (1 to 6,000 bp. While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals.

  10. Integrating Public Health and Deliberative Public Bioethics: Lessons from the Human Genome Project Ethical, Legal, and Social Implications Program.

    Science.gov (United States)

    Meagher, Karen M; Lee, Lisa M

    2016-01-01

    Public health policy works best when grounded in firm public health standards of evidence and widely shared social values. In this article, we argue for incorporating a specific method of ethical deliberation--deliberative public bioethics--into public health. We describe how deliberative public bioethics is a method of engagement that can be helpful in public health. Although medical, research, and public health ethics can be considered some of what bioethics addresses, deliberative public bioethics offers both a how and where. Using the Human Genome Project Ethical, Legal, and Social Implications program as an example of effective incorporation of deliberative processes to integrate ethics into public health policy, we examine how deliberative public bioethics can integrate both public health and bioethics perspectives into three areas of public health practice: research, education, and health policy. We then offer recommendations for future collaborations that integrate deliberative methods into public health policy and practice.

  11. cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila.

    Science.gov (United States)

    Yang, Tzu-Hsien; Wang, Chung-Ching; Hung, Po-Cheng; Wu, Wei-Sheng

    2014-01-01

    Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs. It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality. cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable hypotheses for the molecular mechanisms

  12. Construction of an integrated genetic linkage map for the A genome of Brassica napus using SSR markers derived from sequenced BACs in B. rapa

    Directory of Open Access Journals (Sweden)

    King Graham J

    2010-10-01

    Full Text Available Abstract Background The Multinational Brassica rapa Genome Sequencing Project (BrGSP has developed valuable genomic resources, including BAC libraries, BAC-end sequences, genetic and physical maps, and seed BAC sequences for Brassica rapa. An integrated linkage map between the amphidiploid B. napus and diploid B. rapa will facilitate the rapid transfer of these valuable resources from B. rapa to B. napus (Oilseed rape, Canola. Results In this study, we identified over 23,000 simple sequence repeats (SSRs from 536 sequenced BACs. 890 SSR markers (designated as BrGMS were developed and used for the construction of an integrated linkage map for the A genome in B. rapa and B. napus. Two hundred and nineteen BrGMS markers were integrated to an existing B. napus linkage map (BnaNZDH. Among these mapped BrGMS markers, 168 were only distributed on the A genome linkage groups (LGs, 18 distrubuted both on the A and C genome LGs, and 33 only distributed on the C genome LGs. Most of the A genome LGs in B. napus were collinear with the homoeologous LGs in B. rapa, although minor inversions or rearrangements occurred on A2 and A9. The mapping of these BAC-specific SSR markers enabled assignment of 161 sequenced B. rapa BACs, as well as the associated BAC contigs to the A genome LGs of B. napus. Conclusion The genetic mapping of SSR markers derived from sequenced BACs in B. rapa enabled direct links to be established between the B. napus linkage map and a B. rapa physical map, and thus the assignment of B. rapa BACs and the associated BAC contigs to the B. napus linkage map. This integrated genetic linkage map will facilitate exploitation of the B. rapa annotated genomic resources for gene tagging and map-based cloning in B. napus, and for comparative analysis of the A genome within Brassica species.

  13. Genetic and epigenetic alterations induced by different levels of rye genome integration in wheat recipient.

    Science.gov (United States)

    Zheng, X L; Zhou, J P; Zang, L L; Tang, A T; Liu, D Q; Deng, K J; Zhang, Y

    2016-06-17

    The narrow genetic variation present in common wheat (Triticum aestivum) varieties has greatly restricted the improvement of crop yield in modern breeding systems. Alien addition lines have proven to be an effective means to broaden the genetic diversity of common wheat. Wheat-rye addition lines, which are the direct bridge materials for wheat improvement, have been wildly used to produce new wheat cultivars carrying alien rye germplasm. In this study, we investigated the genetic and epigenetic alterations in two sets of wheat-rye disomic addition lines (1R-7R) and the corresponding triticales. We used expressed sequence tag-simple sequence repeat, amplified fragment length polymorphism, and methylation-sensitive amplification polymorphism analyses to analyze the effects of the introduction of alien chromosomes (either the entire genome or sub-genome) to wheat genetic background. We found obvious and diversiform variations in the genomic primary structure, as well as alterations in the extent and pattern of the genomic DNA methylation of the recipient. Meanwhile, these results also showed that introduction of different rye chromosomes could induce different genetic and epigenetic alterations in its recipient, and the genetic background of the parents is an important factor for genomic and epigenetic variation induced by alien chromosome addition.

  14. Neurogenomics: An opportunity to integrate neuroscience, genomics and bioinformatics research in Africa

    Directory of Open Access Journals (Sweden)

    Thomas K. Karikari

    2015-06-01

    Full Text Available Modern genomic approaches have made enormous contributions to improving our understanding of the function, development and evolution of the nervous system, and the diversity within and between species. However, most of these research advances have been recorded in countries with advanced scientific resources and funding support systems. On the contrary, little is known about, for example, the possible interplay between different genes, non-coding elements and environmental factors in modulating neurological diseases among populations in low-income countries, including many African countries. The unique ancestry of African populations suggests that improved inclusion of these populations in neuroscience-related genomic studies would significantly help to identify novel factors that might shape the future of neuroscience research and neurological healthcare. This perspective is strongly supported by the recent identification that diseased individuals and their kindred from specific sub-Saharan African populations lack common neurological disease-associated genetic mutations. This indicates that there may be population-specific causes of neurological diseases, necessitating further investigations into the contribution of additional, presently-unknown genomic factors. Here, we discuss how the development of neurogenomics research in Africa would help to elucidate disease-related genomic variants, and also provide a good basis to develop more effective therapies. Furthermore, neurogenomics would harness African scientists' expertise in neuroscience, genomics and bioinformatics to extend our understanding of the neural basis of behaviour, development and evolution.

  15. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification.

    Science.gov (United States)

    Doostparast Torshizi, Abolfazl; Petzold, Linda R

    2018-01-01

    Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  16. Integrating genome-based informatics to modernize global disease monitoring, information sharing, and response

    DEFF Research Database (Denmark)

    Aarestrup, Frank Møller; Brown, Eric W; Detter, Chris

    2012-01-01

    The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease...... typing methods to provide point-of-care clinical diagnosis and other essential information for quicker and better treatment of patients. Provided there is free-sharing of information by all clinical and public health laboratories, these genomic tools could spawn a global system of linked databases......-causing microbes, and in the near future these technologies will be suitable for routine use in national, regional, and global public health laboratories. With additional improvements in instrumentation, these next- or third-generation sequencers are likely to replace conventional culture-based and molecular...

  17. New traits in crops produced by genome editing techniques based on deletions

    NARCIS (Netherlands)

    Wiel, van de C.C.M.; Schaart, J.G.; Lotz, L.A.P.; Smulders, M.J.M.

    2017-01-01

    One of the most promising New Plant Breeding Techniques is genome editing (also called gene editing) with the help of a programmable site-directed nuclease (SDN). In this review, we focus on SDN-1, which is the generation of small deletions or insertions (indels) at a precisely defined location in

  18. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx)

    DEFF Research Database (Denmark)

    Xia, Qingyou; Guo, Yiran; Zhang, Ze

    2009-01-01

    A single-base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ~16 million single-nucleotide polymorphisms, many indels, and structural varia...

  19. FIGENIX: Intelligent automation of genomic annotation: expertise integration in a new software platform

    Directory of Open Access Journals (Sweden)

    Pontarotti Pierre

    2005-08-01

    Full Text Available Abstract Background Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes. Structural and functional annotation both require the complex chaining of numerous different software, algorithms and methods under the supervision of a biologist. The automation of these pipelines is necessary to manage huge amounts of data released by sequencing projects. Several pipelines already automate some of these complex chaining but still necessitate an important contribution of biologists for supervising and controlling the results at various steps. Results Here we propose an innovative automated platform, FIGENIX, which includes an expert system capable to substitute to human expertise at several key steps. FIGENIX currently automates complex pipelines of structural and functional annotation under the supervision of the expert system (which allows for example to make key decisions, check intermediate results or refine the dataset. The quality of the results produced by FIGENIX is comparable to those obtained by expert biologists with a drastic gain in terms of time costs and avoidance of errors due to the human manipulation of data. Conclusion The core engine and expert system of the FIGENIX platform currently handle complex annotation processes of broad interest for the genomic community. They could be easily adapted to new, or more specialized pipelines, such as for example the annotation of miRNAs, the classification of complex multigenic families, annotation of regulatory elements and other genomic features of interest.

  20. Integration and comparison of different genomic data for outcome prediction in cancer

    OpenAIRE

    Gomez Rueda, Hugo; Martínez Ledesma, Emmanuel; Martínez Torteya, Antonio; Palacios Corona, Rebeca; Treviño, Victor

    2005-01-01

    Background In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is c...

  1. Analysis of indel variations in the human disease-associated genes ...

    Indian Academy of Sciences (India)

    Keywords. insertion–deletion variations; haematological disease; tumours; human genetics. Journal of Genetics ... domly selected healthy Korean individuals using a blood genomic DNA ... Bioinformatics annotation and 3-D protein structure analysis. In this study ..... 2009 A genome-wide meta-analysis identifies. Journal of ...

  2. Genomic and functional integrity of the hematopoietic system requires tolerance of oxidative DNA lesions

    DEFF Research Database (Denmark)

    Martín-Pardillos, Ana; Tsaalbi-Shtylik, Anastasia; Chen, Si

    2017-01-01

    -distorting nucleotide lesions, resulted in the perinatal loss of hematopoietic stem cells, progressive loss of bone marrow, and fatal aplastic anemia between 3 and 4 months of age. This was associated with replication stress, genomic breaks, DNA damage signaling, senescence, and apoptosis in bone marrow. Surprisingly...

  3. Functional food ingredients against colorectal cancer. An example project integrating functional genomics, nutrition and health

    NARCIS (Netherlands)

    Stierum, R.; Burgemeister, R.; Helvoort, van A.; Peijnenburg, A.; Schütze, K.; Seidelin, M.; Vang, O.; Ommen, van B.

    2001-01-01

    Functional Food Ingredients Against Colorectal Cancer is one of the first European Union funded Research Projects at the cross-road of functional genomics [comprising transcriptomics, the measurement of the expression of all messengers RNA (mRNAs) and proteomics, the measurement of expression/state

  4. Lack of evidence for integration of Trypanosoma cruzi minicircle DNA in South American human genomes

    Czech Academy of Sciences Publication Activity Database

    Flegontova, Olga; Lukeš, Julius; Flegontov, Pavel

    2012-01-01

    Roč. 42, č. 5 (2012), s. 437-441 ISSN 0020-7519 Grant - others:GA MŠk(CZ) LM2010005 Institutional support: RVO:60077344 Keywords : Trypanosoma cruzi * Kinetoplast minicircle * Chagas disease * Horizontal gene transfer * Human genome Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.637, year: 2012 http://www.sciencedirect.com/science/article/pii/S0020751912000781

  5. Integrating Nonadditive Genomic Relationship Matrices into the Study of Genetic Architecture of Complex Traits.

    Science.gov (United States)

    Nazarian, Alireza; Gezan, Salvador A

    2016-03-01

    The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. Grand challenges in evolutionary and population genetics: The importance of integrating epigenetics, genomics, modeling, and experimentation

    Science.gov (United States)

    Samuel A. Cushman

    2014-01-01

    This is a time of explosive growth in the fields of evolutionary and population genetics, with whole genome sequencing and bioinformatics driving a transformative paradigm shift (Morozova and Marra, 2008). At the same time, advances in epigenetics are thoroughly transforming our understanding of evolutionary processes and their implications for populations, species and...

  7. Integrating proteomic and functional genomic technologies in discovery-driven translational breast cancer research

    DEFF Research Database (Denmark)

    Celis, Julio E; Gromov, Pavel; Gromova, Irina

    2003-01-01

    The application of state-of-the-art proteomics and functional genomics technologies to the study of cancer is rapidly shifting toward the analysis of clinically relevant samples derived from patients, as the ultimate aim of translational research is to bring basic discoveries closer to the bedside...

  8. Elg1 forms an alternative RFC complex important for DNA replication and genome integrity

    NARCIS (Netherlands)

    Bellaoui, Mohammed; Chang, Michael; Ou, Jiongwen; Xu, Hong; Boone, Charles; Brown, Grant W

    2003-01-01

    Genome-wide synthetic genetic interaction screens with mutants in the mus81 and mms4 replication fork-processing genes identified a novel replication factor C (RFC) homolog, Elg1, which forms an alternative RFC complex with Rfc2-5. This complex is distinct from the DNA replication RFC, the DNA

  9. Improving biological understanding and complex trait prediction by integrating prior information in genomic feature models

    DEFF Research Database (Denmark)

    Edwards, Stefan McKinnon

    externally founded information, such as KEGG pathways, Gene Ontology gene sets, or genomic features, and estimate the joint contribution of the genetic variants within these sets to complex trait phenotypes. The analysis of complex trait phenotypes is hampered by the myriad of genes that control the trait...

  10. Ricebase: a breeding and genetics platform for rice, integrating individual molecular markers, pedigrees and whole-genome-based data.

    Science.gov (United States)

    Edwards, J D; Baldo, A M; Mueller, L A

    2016-01-01

    Ricebase (http://ricebase.org) is an integrative genomic database for rice (Oryza sativa) with an emphasis on combining datasets in a way that maintains the key links between past and current genetic studies. Ricebase includes DNA sequence data, gene annotations, nucleotide variation data and molecular marker fragment size data. Rice research has benefited from early adoption and extensive use of simple sequence repeat (SSR) markers; however, the majority of rice SSR markers were developed prior to the latest rice pseudomolecule assembly. Interpretation of new research using SNPs in the context of literature citing SSRs requires a common coordinate system. A new pipeline, using a stepwise relaxation of stringency, was used to map SSR primers onto the latest rice pseudomolecule assembly. The SSR markers and experimentally assayed amplicon sizes are presented in a relational database with a web-based front end, and are available as a track loaded in a genome browser with links connecting the browser and database. The combined capabilities of Ricebase link genetic markers, genome context, allele states across rice germplasm and potentially user curated phenotypic interpretations as a community resource for genetic discovery and breeding in rice. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.

  11. Germline transgenic pigs by Sleeping Beauty transposition in porcine zygotes and targeted integration in the pig genome.

    Directory of Open Access Journals (Sweden)

    Wiebke Garrels

    Full Text Available Genetic engineering can expand the utility of pigs for modeling human diseases, and for developing advanced therapeutic approaches. However, the inefficient production of transgenic pigs represents a technological bottleneck. Here, we assessed the hyperactive Sleeping Beauty (SB100X transposon system for enzyme-catalyzed transgene integration into the embryonic porcine genome. The components of the transposon vector system were microinjected as circular plasmids into the cytoplasm of porcine zygotes, resulting in high frequencies of transgenic fetuses and piglets. The transgenic animals showed normal development and persistent reporter gene expression for >12 months. Molecular hallmarks of transposition were confirmed by analysis of 25 genomic insertion sites. We demonstrate germ-line transmission, segregation of individual transposons, and continued, copy number-dependent transgene expression in F1-offspring. In addition, we demonstrate target-selected gene insertion into transposon-tagged genomic loci by Cre-loxP-based cassette exchange in somatic cells followed by nuclear transfer. Transposase-catalyzed transgenesis in a large mammalian species expands the arsenal of transgenic technologies for use in domestic animals and will facilitate the development of large animal models for human diseases.

  12. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

    Science.gov (United States)

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

    2018-01-04

    ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank.

    Science.gov (United States)

    Prlic, Andreas; Kalro, Tara; Bhattacharya, Roshni; Christie, Cole; Burley, Stephen K; Rose, Peter W

    2016-12-15

    The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. andreas.prlic@rcsb.orgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  14. Integrated, multi-scale, spatial-temporal cell biology--A next step in the post genomic era.

    Science.gov (United States)

    Horwitz, Rick

    2016-03-01

    New microscopic approaches, high-throughput imaging, and gene editing promise major new insights into cellular behaviors. When coupled with genomic and other 'omic information and "mined" for correlations and associations, a new breed of powerful and useful cellular models should emerge. These top down, coarse-grained, and statistical models, in turn, can be used to form hypotheses merging with fine-grained, bottom up mechanistic studies and models that are the back bone of cell biology. The goal of the Allen Institute for Cell Science is to develop the top down approach by developing a high throughput microscopy pipeline that is integrated with modeling, using gene edited hiPS cell lines in various physiological and pathological contexts. The output of these experiments and models will be an "animated" cell, capable of integrating and analyzing image data generated from experiments and models. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Navigating the evidentiary turn in public health: Sensemaking strategies to integrate genomics into state-level chronic disease prevention programs.

    Science.gov (United States)

    Senier, Laura; Smollin, Leandra; Lee, Rachael; Nicoll, Lauren; Shields, Michael; Tan, Catherine

    2018-06-23

    In the past decade, healthcare delivery has faced two major disruptions: the mapping of the human genome and the rise of evidence-based practice. Sociologists have documented the paradigmatic shift towards evidence-based practice in medicine, but have yet to examine its effect on other health professions or the broader healthcare arena. This article shows how evidence-based practice is transforming public health in the United States. We present an in-depth qualitative analysis of interview, ethnographic, and archival data to show how Michigan's state public health agency has navigated the turn to evidence-based practice, as they have integrated scientific advances in genomics into their chronic disease prevention programming. Drawing on organizational theory, we demonstrate how they managed ambiguity through a combination of sensegiving and sensemaking activities. Specifically, they linked novel developments in genomics to a long-accepted public health planning model, the Core Public Health Functions. This made cutting edge advances in genomics more familiar to their peers in the state health agency. They also marshaled state-specific surveillance data to illustrate the public health burden of hereditary cancers in Michigan, and to make expert panel recommendations for genetic screening more locally relevant. Finally, they mobilized expertise to help their internal colleagues and external partners modernize conventional public health activities in chronic disease prevention. Our findings show that tools and concepts from organizational sociology can help medical sociologists understand how evidence-based practice is shaping institutions and interprofessional relations in the healthcare arena. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Integrative Genomics: Quantifying significance of phenotype-genotype relationships from multiple sources of high-throughput data

    Directory of Open Access Journals (Sweden)

    Eric eGamazon

    2013-05-01

    Full Text Available Given recent advances in the generation of high-throughput data such as whole genome genetic variation and transcriptome expression, it is critical to come up with novel methods to integrate these heterogeneous datasets and to assess the significance of identified phenotype-genotype relationships. Recent studies show that genome-wide association findings are likely to fall in loci with gene regulatory effects such as expression quantitative trait loci (eQTLs, demonstrating the utility of such integrative approaches. When genotype and gene expression data are available on the same individuals, we developed methods wherein top phenotype-associated genetic variants are prioritized if they are associated, as eQTLs, with gene expression traits that are themselves associated with the phenotype. Yet there has been no method to determine an overall p-value for the findings that arise specifically from the integrative nature of the approach. We propose a computationally feasible permutation method that accounts for the assimilative nature of the method and the correlation structure among gene expression traits and among genotypes. We apply the method to data from a study of cellular sensitivity to etoposide, one of the most widely used chemotherapeutic drugs. To our knowledge, this study is the first statistically sound quantification of the significance of the genotype-phenotype relationships resulting from applying an integrative approach. This method can be easily extended to cases in which gene expression data are replaced by other molecular phenotypes of interest, e.g., microRNA or proteomic data. This study has important implications for studies seeking to expand on genetic association studies by the use of omics data. Finally, we provide an R code to compute the empirical FDR when p-values for the observed and simulated phenotypes are available.

  17. An integrative variant analysis suite for whole exome next-generation sequencing data

    Directory of Open Access Journals (Sweden)

    Challis Danny

    2012-01-01

    Full Text Available Abstract Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454. The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%. Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.

  18. Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling

    Science.gov (United States)

    Medina, Ignacio; Carbonell, José; Pulido, Luis; Madeira, Sara C.; Goetz, Stefan; Conesa, Ana; Tárraga, Joaquín; Pascual-Montano, Alberto; Nogales-Cadenas, Ruben; Santoyo, Javier; García, Francisco; Marbà, Martina; Montaner, David; Dopazo, Joaquín

    2010-01-01

    Babelomics is a response to the growing necessity of integrating and analyzing different types of genomic data in an environment that allows an easy functional interpretation of the results. Babelomics includes a complete suite of methods for the analysis of gene expression data that include normalization (covering most commercial platforms), pre-processing, differential gene expression (case-controls, multiclass, survival or continuous values), predictors, clustering; large-scale genotyping assays (case controls and TDTs, and allows population stratification analysis and correction). All these genomic data analysis facilities are integrated and connected to multiple options for the functional interpretation of the experiments. Different methods of functional enrichment or gene set enrichment can be used to understand the functional basis of the experiment analyzed. Many sources of biological information, which include functional (GO, KEGG, Biocarta, Reactome, etc.), regulatory (Transfac, Jaspar, ORegAnno, miRNAs, etc.), text-mining or protein–protein interaction modules can be used for this purpose. Finally a tool for the de novo functional annotation of sequences has been included in the system. This provides support for the functional analysis of non-model species. Mirrors of Babelomics or command line execution of their individual components are now possible. Babelomics is available at http://www.babelomics.org. PMID:20478823

  19. Control of Genome Integrity by RFC Complexes; Conductors of PCNA Loading onto and Unloading from Chromatin during DNA Replication

    Directory of Open Access Journals (Sweden)

    Yasushi Shiomi

    2017-01-01

    Full Text Available During cell division, genome integrity is maintained by faithful DNA replication during S phase, followed by accurate segregation in mitosis. Many DNA metabolic events linked with DNA replication are also regulated throughout the cell cycle. In eukaryotes, the DNA sliding clamp, proliferating cell nuclear antigen (PCNA, acts on chromatin as a processivity factor for DNA polymerases. Since its discovery, many other PCNA binding partners have been identified that function during DNA replication, repair, recombination, chromatin remodeling, cohesion, and proteolysis in cell-cycle progression. PCNA not only recruits the proteins involved in such events, but it also actively controls their function as chromatin assembles. Therefore, control of PCNA-loading onto chromatin is fundamental for various replication-coupled reactions. PCNA is loaded onto chromatin by PCNA-loading replication factor C (RFC complexes. Both RFC1-RFC and Ctf18-RFC fundamentally function as PCNA loaders. On the other hand, after DNA synthesis, PCNA must be removed from chromatin by Elg1-RFC. Functional defects in RFC complexes lead to chromosomal abnormalities. In this review, we summarize the structural and functional relationships among RFC complexes, and describe how the regulation of PCNA loading/unloading by RFC complexes contributes to maintaining genome integrity.

  20. Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function.

    Science.gov (United States)

    Chasman, Daniel I; Fuchsberger, Christian; Pattaro, Cristian; Teumer, Alexander; Böger, Carsten A; Endlich, Karlhans; Olden, Matthias; Chen, Ming-Huei; Tin, Adrienne; Taliun, Daniel; Li, Man; Gao, Xiaoyi; Gorski, Mathias; Yang, Qiong; Hundertmark, Claudia; Foster, Meredith C; O'Seaghdha, Conall M; Glazer, Nicole; Isaacs, Aaron; Liu, Ching-Ti; Smith, Albert V; O'Connell, Jeffrey R; Struchalin, Maksim; Tanaka, Toshiko; Li, Guo; Johnson, Andrew D; Gierman, Hinco J; Feitosa, Mary F; Hwang, Shih-Jen; Atkinson, Elizabeth J; Lohman, Kurt; Cornelis, Marilyn C; Johansson, Asa; Tönjes, Anke; Dehghan, Abbas; Lambert, Jean-Charles; Holliday, Elizabeth G; Sorice, Rossella; Kutalik, Zoltan; Lehtimäki, Terho; Esko, Tõnu; Deshmukh, Harshal; Ulivi, Sheila; Chu, Audrey Y; Murgia, Federico; Trompet, Stella; Imboden, Medea; Coassin, Stefan; Pistis, Giorgio; Harris, Tamara B; Launer, Lenore J; Aspelund, Thor; Eiriksdottir, Gudny; Mitchell, Braxton D; Boerwinkle, Eric; Schmidt, Helena; Cavalieri, Margherita; Rao, Madhumathi; Hu, Frank; Demirkan, Ayse; Oostra, Ben A; de Andrade, Mariza; Turner, Stephen T; Ding, Jingzhong; Andrews, Jeanette S; Freedman, Barry I; Giulianini, Franco; Koenig, Wolfgang; Illig, Thomas; Meisinger, Christa; Gieger, Christian; Zgaga, Lina; Zemunik, Tatijana; Boban, Mladen; Minelli, Cosetta; Wheeler, Heather E; Igl, Wilmar; Zaboli, Ghazal; Wild, Sarah H; Wright, Alan F; Campbell, Harry; Ellinghaus, David; Nöthlings, Ute; Jacobs, Gunnar; Biffar, Reiner; Ernst, Florian; Homuth, Georg; Kroemer, Heyo K; Nauck, Matthias; Stracke, Sylvia; Völker, Uwe; Völzke, Henry; Kovacs, Peter; Stumvoll, Michael; Mägi, Reedik; Hofman, Albert; Uitterlinden, Andre G; Rivadeneira, Fernando; Aulchenko, Yurii S; Polasek, Ozren; Hastie, Nick; Vitart, Veronique; Helmer, Catherine; Wang, Jie Jin; Stengel, Bénédicte; Ruggiero, Daniela; Bergmann, Sven; Kähönen, Mika; Viikari, Jorma; Nikopensius, Tiit; Province, Michael; Ketkar, Shamika; Colhoun, Helen; Doney, Alex; Robino, Antonietta; Krämer, Bernhard K; Portas, Laura; Ford, Ian; Buckley, Brendan M; Adam, Martin; Thun, Gian-Andri; Paulweber, Bernhard; Haun, Margot; Sala, Cinzia; Mitchell, Paul; Ciullo, Marina; Kim, Stuart K; Vollenweider, Peter; Raitakari, Olli; Metspalu, Andres; Palmer, Colin; Gasparini, Paolo; Pirastu, Mario; Jukema, J Wouter; Probst-Hensch, Nicole M; Kronenberg, Florian; Toniolo, Daniela; Gudnason, Vilmundur; Shuldiner, Alan R; Coresh, Josef; Schmidt, Reinhold; Ferrucci, Luigi; Siscovick, David S; van Duijn, Cornelia M; Borecki, Ingrid B; Kardia, Sharon L R; Liu, Yongmei; Curhan, Gary C; Rudan, Igor; Gyllensten, Ulf; Wilson, James F; Franke, Andre; Pramstaller, Peter P; Rettig, Rainer; Prokopenko, Inga; Witteman, Jacqueline; Hayward, Caroline; Ridker, Paul M; Parsa, Afshin; Bochud, Murielle; Heid, Iris M; Kao, W H Linda; Fox, Caroline S; Köttgen, Anna

    2012-12-15

    In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.

  1. Whole-genome analysis of herbicide-tolerant mutant rice generated by Agrobacterium-mediated gene targeting.

    Science.gov (United States)

    Endo, Masaki; Kumagai, Masahiko; Motoyama, Ritsuko; Sasaki-Yamagata, Harumi; Mori-Hosokawa, Satomi; Hamada, Masao; Kanamori, Hiroyuki; Nagamura, Yoshiaki; Katayose, Yuichi; Itoh, Takeshi; Toki, Seiichi

    2015-01-01

    Gene targeting (GT) is a technique used to modify endogenous genes in target genomes precisely via homologous recombination (HR). Although GT plants are produced using genetic transformation techniques, if the difference between the endogenous and the modified gene is limited to point mutations, GT crops can be considered equivalent to non-genetically modified mutant crops generated by conventional mutagenesis techniques. However, it is difficult to guarantee the non-incorporation of DNA fragments from Agrobacterium in GT plants created by Agrobacterium-mediated GT despite screening with conventional Southern blot and/or PCR techniques. Here, we report a comprehensive analysis of herbicide-tolerant rice plants generated by inducing point mutations in the rice ALS gene via Agrobacterium-mediated GT. We performed genome comparative genomic hybridization (CGH) array analysis and whole-genome sequencing to evaluate the molecular composition of GT rice plants. Thus far, no integration of Agrobacterium-derived DNA fragments has been detected in GT rice plants. However, >1,000 single nucleotide polymorphisms (SNPs) and insertion/deletion (InDels) were found in GT plants. Among these mutations, 20-100 variants might have some effect on expression levels and/or protein function. Information about additive mutations should be useful in clearing out unwanted mutations by backcrossing. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  2. EchoBASE: an integrated post-genomic database for Escherichia coli.

    Science.gov (United States)

    Misra, Raju V; Horler, Richard S P; Reindl, Wolfgang; Goryanin, Igor I; Thomas, Gavin H

    2005-01-01

    EchoBASE (http://www.ecoli-york.org) is a relational database designed to contain and manipulate information from post-genomic experiments using the model bacterium Escherichia coli K-12. Its aim is to collate information from a wide range of sources to provide clues to the functions of the approximately 1500 gene products that have no confirmed cellular function. The database is built on an enhanced annotation of the updated genome sequence of strain MG1655 and the association of experimental data with the E.coli genes and their products. Experiments that can be held within EchoBASE include proteomics studies, microarray data, protein-protein interaction data, structural data and bioinformatics studies. EchoBASE also contains annotated information on 'orphan' enzyme activities from this microbe to aid characterization of the proteins that catalyse these elusive biochemical reactions.

  3. Evolution and genome specialization of Brucella suis biovar 2 Iberian lineages.

    Science.gov (United States)

    Ferreira, Ana Cristina; Tenreiro, Rogério; de Sá, Maria Inácia Corrêa; Dias, Ricardo

    2017-09-12

    Swine brucellosis caused by B. suis biovar 2 is an emergent disease in domestic pigs in Europe. The emergence of this pathogen has been linked to the increase of extensive pig farms and the high density of infected wild boars (Sus scrofa). In Portugal and Spain, the majority of strains share specific molecular characteristics, which allowed establishing an Iberian clonal lineage. However, several strains isolated from wild boars in the North-East region of Spain are similar to strains isolated in different Central European countries. Comparative analysis of five newly fully sequenced B. suis biovar 2 strains belonging to the main circulating clones in Iberian Peninsula, with publicly available Brucella spp. genomes, revealed that strains from Iberian clonal lineage share 74% similarity with those reference genomes. Besides the 210 kb translocation event present in all biovar 2 strains, an inversion with 944 kb was presented in chromosome I of strains from the Iberian clone. At left and right crossover points, the inversion disrupted a TRAP dicarboxylate transporter, DctM subunit, and an integral membrane protein TerC. The gene dctM is well conserved in Brucella spp. except in strains from the Iberian clonal lineage. Intraspecies comparative analysis also exposed a number of biovar-, haplotype- and strain-specific insertion-deletion (INDELs) events and single nucleotide polymorphisms (SNPs) that could explain differences in virulence and host specificities. Most discriminative mutations were associated to membrane related molecules (29%) and enzymes involved in catabolism processes (20%). Molecular identification of both B. suis biovar 2 clonal lineages could be easily achieved using the target-PCR procedures established in this work for the evaluated INDELs. Whole-genome analyses supports that the B. suis biovar 2 Iberian clonal lineage evolved from the Central-European lineage and suggests that the genomic specialization of this pathogen in the Iberian Peninsula

  4. Integration of genomic information into sport horse breeding programs for optimization of accuracy of selection.

    Science.gov (United States)

    Haberland, A M; König von Borstel, U; Simianer, H; König, S

    2012-09-01

    Reliable selection criteria are required for young riding horses to increase genetic gain by increasing accuracy of selection and decreasing generation intervals. In this study, selection strategies incorporating genomic breeding values (GEBVs) were evaluated. Relevant stages of selection in sport horse breeding programs were analyzed by applying selection index theory. Results in terms of accuracies of indices (r(TI) ) and relative selection response indicated that information on single nucleotide polymorphism (SNP) genotypes considerably increases the accuracy of breeding values estimated for young horses without own or progeny performance. In a first scenario, the correlation between the breeding value estimated from the SNP genotype and the true breeding value (= accuracy of GEBV) was fixed to a relatively low value of r(mg) = 0.5. For a low heritability trait (h(2) = 0.15), and an index for a young horse based only on information from both parents, additional genomic information doubles r(TI) from 0.27 to 0.54. Including the conventional information source 'own performance' into the before mentioned index, additional SNP information increases r(TI) by 40%. Thus, particularly with regard to traits of low heritability, genomic information can provide a tool for well-founded selection decisions early in life. In a further approach, different sources of breeding values (e.g. GEBV and estimated breeding values (EBVs) from different countries) were combined into an overall index when altering accuracies of EBVs and correlations between traits. In summary, we showed that genomic selection strategies have the potential to contribute to a substantial reduction in generation intervals in horse breeding programs.

  5. The pathological consequences of impaired genome integrity in humans; disorders of the DNA replication machinery.

    Science.gov (United States)

    O'Driscoll, Mark

    2017-01-01

    Accurate and efficient replication of the human genome occurs in the context of an array of constitutional barriers, including regional topological constraints imposed by chromatin architecture and processes such as transcription, catenation of the helical polymer and spontaneously generated DNA lesions, including base modifications and strand breaks. DNA replication is fundamentally important for tissue development and homeostasis; differentiation programmes are intimately linked with stem cell division. Unsurprisingly, impairments of the DNA replication machinery can have catastrophic consequences for genome stability and cell division. Functional impacts on DNA replication and genome stability have long been known to play roles in malignant transformation through a variety of complex mechanisms, and significant further insights have been gained from studying model organisms in this context. Congenital hypomorphic defects in components of the DNA replication machinery have been and continue to be identified in humans. These disorders present with a wide range of clinical features. Indeed, in some instances, different mutations in the same gene underlie different clinical presentations. Understanding the origin and molecular basis of these features opens a window onto the range of developmental impacts of suboptimal DNA replication and genome instability in humans. Here, I will briefly overview the basic steps involved in DNA replication and the key concepts that have emerged from this area of research, before switching emphasis to the pathological consequences of defects within the DNA replication network; the human disorders. Copyright © 2016 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd. Copyright © 2016 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.

  6. Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics

    DEFF Research Database (Denmark)

    Khurana, Ekta; Fu, Yao; Colonna, Vincenza

    2013-01-01

    Identifying Important Identifiers Each of us has millions of sequence variations in our genomes. Signatures of purifying or negative selection should help identify which of those variations is functionally important. Khurana et al. (1235587) used sequence polymorphisms from 1092 humans across 14...... sites tended to occur in network hub promoters. Many recurrent somatic cancer variants occurred in noncoding regulatory regions and thus might indicate mutations that drive cancer....

  7. Mutant woodchuck hepatitis virus genomes from virions resemble rearranged hepadnaviral integrants in hepatocellular carcinoma.

    OpenAIRE

    Kew, M C; Miller, R H; Chen, H S; Tennant, B C; Purcell, R H

    1993-01-01

    Although hepadnaviruses are implicated in the etiology of hepatocellular carcinoma, the pathogenic mechanisms involved remain uncertain. Clonally propagated integrations of hepadnaviral DNA into cellular DNA can be demonstrated in most virally induced hepatocellular carcinomas. Integration occurs at random sites in cellular DNA, but the highly preferred sites in viral DNA are adjacent to the directly repeated sequence DR1, less often DR2, or in the cohesive overlap region. Integrants invariab...

  8. The Development of PIPA: An Integrated and Automated Pipeline for Genome-Wide Protein Function Annotation

    National Research Council Canada - National Science Library

    Yu, Chenggang; Zavaljevski, Nela; Desai, Valmik; Johnson, Seth; Stevens, Fred J; Reifman, Jaques

    2008-01-01

    .... With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method...

  9. Functional Analysis of In-frame Indel ARID1A Mutations Reveals New Regulatory Mechanisms of Its Tumor Suppressor Functions

    Directory of Open Access Journals (Sweden)

    Bin Guan

    2012-10-01

    Full Text Available AT-rich interactive domain 1A (ARID1A has emerged as a new tumor suppressor in which frequent somatic mutations have been identified in several types of human cancers. Although most ARID1A somatic mutations are frame-shift or nonsense mutations that contribute to mRNA decay and loss of protein expression, 5% of ARID1A mutations are in-frame insertions or deletions (indels that involve only a small stretch of peptides. Naturally occurring in-frame indel mutations provide unique and useful models to explore the biology and regulatory role of ARID1A. In this study, we analyzed indel mutations identified in gynecological cancers to determine how these mutations affect the tumor suppressor function of ARID1A. Our results demonstrate that all in-frame mutants analyzed lost their ability to inhibit cellular proliferation or activate transcription of CDKN1A, which encodes p21, a downstream effector of ARID1A. We also showed that ARID1A is a nucleocytoplasmic protein whose stability depends on its subcellular localization. Nuclear ARID1A is less stable than cytoplasmic ARID1A because ARID1A is rapidly degraded by the ubiquitin-proteasome system in the nucleus. In-frame deletions affecting the consensus nuclear export signal reduce steady-state protein levels of ARID1A. This defect in nuclear exportation leads to nuclear retention and subsequent degradation. Our findings delineate a mechanism underlying the regulation of ARID1A subcellular distribution and protein stability and suggest that targeting the nuclear ubiquitin-proteasome system can increase the amount of the ARID1A protein in the nucleus and restore its tumor suppressor functions.

  10. Reconstruction of putative DNA virus from endogenous rice tungro bacilliform virus-like sequences in the rice genome: implications for integration and evolution

    Directory of Open Access Journals (Sweden)

    Kishima Yuji

    2004-10-01

    Full Text Available Abstract Background Plant genomes contain various kinds of repetitive sequences such as transposable elements, microsatellites, tandem repeats and virus-like sequences. Most of them, with the exception of virus-like sequences, do not allow us to trace their origins nor to follow the process of their integration into the host genome. Recent discoveries of virus-like sequences in plant genomes led us to set the objective of elucidating the origin of the repetitive sequences. Endogenous rice tungro bacilliform virus (RTBV-like sequences (ERTBVs have been found throughout the rice genome. Here, we reconstructed putative virus structures from RTBV-like sequences in the rice genome and characterized to understand evolutionary implication, integration manner and involvements of endogenous virus segments in the corresponding disease response. Results We have collected ERTBVs from the rice genomes. They contain rearranged structures and no intact ORFs. The identified ERTBV segments were shown to be phylogenetically divided into three clusters. For each phylogenetic cluster, we were able to make a consensus alignment for a circular virus-like structure carrying two complete ORFs. Comparisons of DNA and amino acid sequences suggested the closely relationship between ERTBV and RTBV. The Oryza AA-genome species vary in the ERTBV copy number. The species carrying low-copy-number of ERTBV segments have been reported to be extremely susceptible to RTBV. The DNA methylation state of the ERTBV sequences was correlated with their copy number in the genome. Conclusions These ERTBV segments are unlikely to have functional potential as a virus. However, these sequences facilitate to establish putative virus that provided information underlying virus integration and evolutionary relationship with existing virus. Comparison of ERTBV among the Oryza AA-genome species allowed us to speculate a possible role of endogenous virus segments against its related disease.

  11. SCREEN FOR DOMINANT BEHAVIORAL MUTATIONS CAUSED BY GENOMIC INSERTION OF P-ELEMENT TRANSPOSONS IN DROSOPHILA: AN EXAMINATION OF THE INTEGRATION OF VIRAL VECTOR SEQUENCES

    OpenAIRE

    FOX, LYLE E.; GREEN, DAVID; YAN, ZIYING; ENGELHARDT, JOHN F.; WU, CHUN-FANG

    2007-01-01

    Here we report the development of a high-throughput screen to assess dominant mutation rates caused by P-element transposition within the Drosophila genome that is suitable for assessing the undesirable effects of integrating foreign regulatory sequences (viral cargo) into a host genome. Three different behavioral paradigms were used: sensitivity to mechanical stress, response to heat stress, and ability to fly. The results, from our screen of 35,000 flies, indicate that mutations caused by t...

  12. An empirical test of the treatment of indels during optimization alignment based on the phylogeny of the genus Secale (Poaceae)

    DEFF Research Database (Denmark)

    Petersen, Gitte; Seberg, Ole; Aagesen, Lone

    2004-01-01

    The ability of the program POY, implementing optimization alignment, to deal with major indels is explored and discussed in connection with a phylogenetic analysis of the genus Secale based on partial Adhl sequences. The Adhl sequences used span exon 2-4. Nearly all variation is found in intron 2...... recovers both genera as monophyletic when knowledge of the duplication is incorporated in the analysis. The phylogenetic relationships within Secale are not clearly resolved. Subspecific taxa of Secale strictum have identical sequences and they are confined to a monophyletic group. However, the two...

  13. A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data.

    NARCIS (Netherlands)

    Post, L.J.G.; Roos, M.; Marshall, M.S.; van Driel, R.; Breit, T.M.

    2007-01-01

    The numerous public data resources make integrative bioinformatics experimentation increasingly important in life sciences research. However, it is severely hampered by the way the data and information are made available. The semantic web approach enhances data exchange and integration by providing

  14. Drosophila Model for the Analysis of Genesis of LIM-kinase 1-Dependent Williams-Beuren Syndrome Cognitive Phenotypes: INDELs, Transposable Elements of the Tc1/Mariner Superfamily and MicroRNAs

    Directory of Open Access Journals (Sweden)

    Elena V. Savvateeva-Popova

    2017-09-01

    Full Text Available Genomic disorders, the syndromes with multiple manifestations, may occur sporadically due to unequal recombination in chromosomal regions with specific architecture. Therefore, each patient may carry an individual structural variant of DNA sequence (SV with small insertions and deletions (INDELs sometimes less than 10 bp. The transposable elements of the Tc1/mariner superfamily are often associated with hotspots for homologous recombination involved in human genetic disorders, such as Williams Beuren Syndromes (WBS with LIM-kinase 1-dependent cognitive defects. The Drosophila melanogaster mutant agnts3 has unusual architecture of the agnostic locus harboring LIMK1: it is a hotspot of chromosome breaks, ectopic contacts, underreplication, and recombination. Here, we present the analysis of LIMK1-containing locus sequencing data in agnts3 and three D. melanogaster wild-type strains—Canton-S, Berlin, and Oregon-R. We found multiple strain-specific SVs, namely, single base changes and small INDEls. The specific feature of agnts3 is 28 bp A/T-rich insertion in intron 1 of LIMK1 and the insertion of mobile S-element from Tc1/mariner superfamily residing ~460 bp downstream LIMK1 3′UTR. Neither of SVs leads to amino acid substitutions in agnts3 LIMK1. However, they apparently affect the nucleosome distribution, non-canonical DNA structure formation and transcriptional factors binding. Interestingly, the overall expression of miRNAs including the biomarkers for human neurological diseases, is drastically reduced in agnts3 relative to the wild-type strains. Thus, LIMK1 DNA structure per se, as well as the pronounced changes in total miRNAs profile, probably lead to LIMK1 dysregulation and complex behavioral dysfunctions observed in agnts3 making this mutant a simple plausible Drosophila model for WBS.

  15. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    Energy Technology Data Exchange (ETDEWEB)

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-01-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 [sup 32]P- or [sup 33]P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  16. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    Energy Technology Data Exchange (ETDEWEB)

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-12-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 {sup 32}P- or {sup 33}P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation.

  17. Common developmental genome deprogramming in schizophrenia - Role of Integrative Nuclear FGFR1 Signaling (INFS).

    Science.gov (United States)

    Narla, S T; Lee, Y-W; Benson, C A; Sarder, P; Brennand, K J; Stachowiak, E K; Stachowiak, M K

    2017-07-01

    The watershed-hypothesis of schizophrenia asserts that over 200 different mutations dysregulate distinct pathways that converge on an unspecified common mechanism(s) that controls disease ontogeny. Consistent with this hypothesis, our RNA-sequencing of neuron committed cells (NCCs) differentiated from established iPSCs of 4 schizophrenia patients and 4 control subjects uncovered a dysregulated transcriptome of 1349 mRNAs common to all patients. Data reveals a global dysregulation of developmental genome, deconstruction of coordinated mRNA networks, and the formation of aberrant, new coordinated mRNA networks indicating a concerted action of the responsible factor(s). Sequencing of miRNA transcriptomes demonstrated an overexpression of 16 miRNAs and deconstruction of interactive miRNA-mRNA networks in schizophrenia NCCs. ChiPseq revealed that the nuclear (n) form of FGFR1, a pan-ontogenic regulator, is overexpressed in schizophrenia NCCs and overtargets dysregulated mRNA and miRNA genes. The nFGFR1 targeted 54% of all human gene promoters and 84.4% of schizophrenia dysregulated genes. The upregulated genes reside within major developmental pathways that control neurogenesis and neuron formation, whereas downregulated genes are involved in oligodendrogenesis. Our results indicate (i) an early (preneuronal) genomic etiology of schizophrenia, (ii) dysregulated genes and new coordinated gene networks are common to unrelated cases of schizophrenia, (iii) gene dysregulations are accompanied by increased nFGFR1-genome interactions, and (iv) modeling of increased nFGFR1 by an overexpression of a nFGFR1 lead to up or downregulation of selected genes as observed in schizophrenia NCCs. Together our results designate nFGFR1 signaling as a potential common dysregulated mechanism in investigated patients and potential therapeutic target in schizophrenia. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  18. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis; FINAL

    International Nuclear Information System (INIS)

    Simon, M. I.; Kim, U.-J.

    2002-01-01

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year

  19. Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation.

    Directory of Open Access Journals (Sweden)

    Frank Technow

    Full Text Available Genomic selection, enabled by whole genome prediction (WGP methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E, continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC, a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics.

  20. SBH and the integration of complementary approaches in the mapping, sequencing, and understanding of complex genomes

    International Nuclear Information System (INIS)

    Drmanac, R.; Drmanac, S.; Labat, I.; Vicentic, A.; Gemmell, A.; Stavropoulos, N.; Jarvis, J.

    1992-01-01

    A variant of sequencing by hybridization (SBH) is being developed with a potential to inexpensively determine up to 100 million base pairs per year. The method comprises (1) arraying short clones in 864-well plates; (2) growth of the M13 clones or PCR of the inserts; (3) automated spotting of DNAs by corresponding pin-arrays; (4) hybridization of dotted samples with 200-3000 32 P- or 33 P-labeled 6- to 8-mer probes; and (5) scoring hybridization signals using storage phosphor plates. Some 200 7- to 8-mers can provide an inventory of the genes if CDNA clones are hybridized, or can define the order of 2-kb genomic clones, creating physical and structural maps with 100-bp resolution; the distribution of G+C, LINEs, SINEs, and gene families would be revealed. cDNAs that represent new genes and genomic clones in regions of interest selected by SBH can be sequenced by a gel method. Uniformly distributed clones from the previous step will be hybridized with 2000--3000 6- to 8-mers. As a result, approximately 50--60% of the genomic regions containing members of large repetitive and gene families and those families represented in GenBank would be completely sequenced. In the less redundant regions, every base pair is expected to be read with 3-4 probes, but the complete sequence can not be reconstructed. Such partial sequences allow the inference of similarity and the recognition of coding, regulatory, and repetitive sequences, as well as study of the evolutionary processes all the way up to the species delineation

  1. Reverse gyrase functions in genome integrity maintenance by protecting DNA breaks in vivo

    DEFF Research Database (Denmark)

    Han, Wenyuan; Feng, Xu; She, Qunxin

    2017-01-01

    Reverse gyrase introduces positive supercoils to circular DNA and is implicated in genome stability maintenance in thermophiles. The extremely thermophilic crenarchaeon Sulfolobus encodes two reverse gyrase proteins, TopR1 (topoisomerase reverse gyrase 1) and TopR2, whose functions in thermophilic...... and subsequent DNA degradation. The former occurred immediately after drug treatment, leading to chromosomal DNA degradation that concurred with TopR1 degradation, followed by chromatin protein degradation and DNA-less cell formation. To gain a further insight into TopR1 function, the expression of the enzyme...

  2. Local chromatin structure of heterochromatin regulates repeated DNA stability, nucleolus structure, and genome integrity

    Energy Technology Data Exchange (ETDEWEB)

    Peng, Jamy C. [Univ. of California, Berkeley, CA (United States)

    2007-01-01

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  3. Fluorescent In Situ Hybridization to Detect Transgene Integration into Plant Genomes

    Science.gov (United States)

    Schwarzacher, Trude

    Fluorescent chromosome analysis technologies have advanced our understanding of genome organization during the last 30 years and have enabled the investigation of DNA organization and structure as well as the evolution of chromosomes. Fluorescent chromosome staining allows even small chromosomes to be visualized, characterized by their composition and morphology, and counted. Aneuploidies and polyploidies can be established for species, breeding lines, and individuals, including changes occurring during hybridization or tissue culture and transformation protocols. Fluorescent in situ hybridization correlates molecular information of a DNA sequence with its physical location on chromosomes and genomes. It thus allows determination of the physical position of sequences and often is the only means to determine the abundance and distribution of DNA sequences that are difficult to map with any other molecular method or would require segregation analysis, in particular multicopy or repetitive DNA. Equally, it is often the best way to establish the incorporation of transgenes, their numbers, and physical organization along chromosomes. This chapter presents protocols for probe and chromosome preparation, fluorescent in situ hybridization, chromosome staining, and the analysis of results.

  4. Genomic integration and germline transmission of plasmid injected into crustacean Daphnia magna eggs.

    Directory of Open Access Journals (Sweden)

    Yasuhiko Kato

    Full Text Available The water flea, Daphnia, has been the subject of study in ecology, evolution, and environmental sciences for decades. Over the last few years, expressed sequence tags and a genome sequence have been determined. In addition, functional approaches of overexpression and gene silencing based on microinjection of RNAs into eggs have been established. However, the transient nature of these approaches prevents us from analyzing gene functions in later stages of development. To overcome this limitation, transgenesis would become a key tool. Here we report establishment of a transgenic line using microinjection of plasmid into Daphnia magna eggs. The green fluorescent protein (GFP gene fused with the D. magna histone H2B gene under the control of a promoter/enhancer region of the elongation factor 1α-1 (EF1α-1 gene, EF1α-1::H2B-GFP, was used as a reporter providing high resolution visualization of active chromatin. Transgenic lines were obtained from 0.67% of the total fertile adults that survived the injections. One of the transgenic animals, which exhibited fluorescence in the nuclei of cells during embryogenesis and oogenesis, had two copies of EF1α-1::H2B-GFP in a head-to-tail array. This is the first report of a transgenesis technique in Daphnia and, together with emerging genome sequences, will be useful for advancing knowledge of the molecular biology of Daphnia.

  5. Genomic resources for water yam (Dioscorea alata L.): analyses of EST-Sequences, De Novo sequencing and GBS libraries

    Science.gov (United States)

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources such as SSRs, SNPs and InDels in several model and non-model plant species. Yam (Dioscorea spp.) i...

  6. Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum and Comparative Analysis with Common Buckwheat (F. esculentum.

    Directory of Open Access Journals (Sweden)

    Kwang-Soo Cho

    Full Text Available We report the chloroplast (cp genome sequence of tartary buckwheat (Fagopyrum tataricum obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats and F. esculentum (one repeat, and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes--rpoC2, ycf3, accD, and clpP--have high synonymous (Ks value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum.

  7. Impact of delay to cryopreservation on RNA integrity and genome-wide expression profiles in resected tumor samples.

    Directory of Open Access Journals (Sweden)

    Elodie Caboux

    Full Text Available The quality of tissue samples and extracted mRNA is a major source of variability in tumor transcriptome analysis using genome-wide expression microarrays. During and immediately after surgical tumor resection, tissues are exposed to metabolic, biochemical and physical stresses characterized as "warm ischemia". Current practice advocates cryopreservation of biosamples within 30 minutes of resection, but this recommendation has not been systematically validated by measurements of mRNA decay over time. Using Illumina HumanHT-12 v3 Expression BeadChips, providing a genome-wide coverage of over 24,000 genes, we have analyzed gene expression variation in samples of 3 hepatocellular carcinomas (HCC and 3 lung carcinomas (LC cryopreserved at times up to 2 hours after resection. RNA Integrity Numbers (RIN revealed no significant deterioration of mRNA up to 2 hours after resection. Genome-wide transcriptome analysis detected non-significant gene expression variations of -3.5%/hr (95% CI: -7.0%/hr to 0.1%/hr; p = 0.054. In LC, no consistent gene expression pattern was detected in relation with warm ischemia. In HCC, a signature of 6 up-regulated genes (CYP2E1, IGLL1, CABYR, CLDN2, NQO1, SCL13A5 and 6 down-regulated genes (MT1G, MT1H, MT1E, MT1F, HABP2, SPINK1 was identified (FDR <0.05. Overall, our observations support current recommendation of time to cryopreservation of up to 30 minutes and emphasize the need for identifying tissue-specific genes deregulated following resection to avoid misinterpreting expression changes induced by warm ischemia as pathologically significant changes.

  8. Ancestry informative markers: inference of ancestry in aged bone samples using an autosomal AIM-Indel multiplex.

    Science.gov (United States)

    Romanini, Carola; Romero, Magdalena; Salado Puerto, Mercedes; Catelli, Laura; Phillips, Christopher; Pereira, Rui; Gusmão, Leonor; Vullo, Carlos

    2015-05-01

    Ancestry informative markers (AIMs) can be useful to infer ancestry proportions of the donors of forensic evidence. The probability of success typing degraded samples, such as human skeletal remains, is strongly influenced by the DNA fragment lengths that can be amplified and the presence of PCR inhibitors. Several AIM panels are available amongst the many forensic marker sets developed for genotyping degraded DNA. Using a 46 AIM Insertion Deletion (Indel) multiplex, we analyzed human skeletal remains of post mortem time ranging from 35 to 60 years from four different continents (Sub-Saharan Africa, South and Central America, East Asia and Europe) to ascertain the genetic ancestry components. Samples belonging to non-admixed individuals could be assigned to their corresponding continental group. For the remaining samples with admixed ancestry, it was possible to estimate the proportion of co-ancestry components from the four reference population groups. The 46 AIM Indel set was informative enough to efficiently estimate the proportion of ancestry even in samples yielding partial profiles, a frequent occurrence when analyzing inhibited and/or degraded DNA extracts. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  9. Genetic Diversity and Population Structure in Native Chicken Populations from Myanmar, Thailand and Laos by Using 102 Indels Markers

    Directory of Open Access Journals (Sweden)

    A. A. Maw

    2015-01-01

    Full Text Available The genetic diversity of native chicken populations from Myanmar, Thailand, and Laos was examined by using 102 insertion and/or deletion (indels markers. Most of the indels loci were polymorphic (71% to 96%, and the genetic variability was similar in all populations. The average observed heterozygosities (HO and expected heterozygosities (HE ranged from 0.205 to 0.263 and 0.239 to 0.381, respectively. The coefficients of genetic differentiation (Gst for all cumulated populations was 0.125, and the Thai native chickens showed higher Gst (0.088 than Myanmar (0.041 and Laotian (0.024 populations. The pairwise Fst distances ranged from 0.144 to 0.308 among populations. A neighbor-joining (NJ tree, using Nei’s genetic distance, revealed that Thai and Laotian native chicken populations were genetically close, while Myanmar native chickens were distant from the others. The native chickens from these three countries were thought to be descended from three different origins (K = 3 from STRUCTURE analysis. Genetic admixture was observed in Thai and Laotian native chickens, while admixture was absent in Myanmar native chickens.

  10. Association and Genetic Identification of Loci for Four Fruit Traits in Tomato Using InDel Markers

    Directory of Open Access Journals (Sweden)

    Xiaoxi Liu

    2017-07-01

    Full Text Available Tomato (Solanum lycopersicum fruit weight (FW, soluble solid content (SSC, fruit shape and fruit color are crucial for yield, quality and consumer acceptability. In this study, a 192 accessions tomato association panel comprising a mixture of wild species, cherry tomato, landraces, and modern varieties collected worldwide was genotyped with 547 InDel markers evenly distributed on 12 chromosomes and scored for FW, SSC, fruit shape index (FSI, and color parameters over 2 years with three replications each year. The association panel was sorted into two subpopulations. Linkage disequilibrium ranged from 3.0 to 47.2 Mb across 12 chromosomes. A set of 102 markers significantly (p < 1.19–1.30 × 10−4 associated with SSC, FW, fruit shape, and fruit color was identified on 11 of the 12 chromosomes using a mixed linear model. The associations were compared with the known gene/QTLs for the same traits. Genetic analysis using F2 populations detected 14 and 4 markers significantly (p < 0.05 associated with SSC and FW, respectively. Some loci were commonly detected by both association and linkage analysis. Particularly, one novel locus for FW on chromosome 4 detected by association analysis was also identified in F2 populations. The results demonstrated that association mapping using limited number of InDel markers and a relatively small population could not only complement and enhance previous QTL information, but also identify novel loci for marker-assisted selection of fruit traits in tomato.

  11. Integrative genomic approaches to dissect clinically-significant relationships between the VDR cistrome and gene expression in primary colon cancer.

    Science.gov (United States)

    Long, Mark D; Campbell, Moray J

    2017-10-01

    Recently, we undertook a pan-cancer analyses of the nuclear hormone receptor (NR) superfamily in The Cancer Genome Atlas (TCGA), and revealed that the vitamin D receptor (NR1I1/VDR) was commonly and significantly down-regulated specifically in colon adenocarcinoma cohort (COAD). To examine the consequence of down-regulated VDR expression we re-analyzed VDR chromatin immunoprecipitation sequencing (ChIP-Seq) data from LS180 colon cancer cells (GSE31939). This analysis identified 1809 loci that displayed significant (p.adjcolon tumor suppressor, Galactin 4) had significantly shorted disease free survival. These analyses suggest that reduced expression of VDR in colon cancer (but neither loss nor mutation) changes the actions of the VDR by both dampening the expression of tumor suppressors (e.g. LGALS4) whilst either stabilizing or not down-regulating expression of oncogenes (e.g. Carbonic Anhydrase 9 (CA9)). These integrative genomic approaches are relatively generic and applicable to the study of any transcription factor. Copyright © 2016. Published by Elsevier Ltd.

  12. Deoxyribonucleic Acid Damage and Repair: Capitalizing on Our Understanding of the Mechanisms of Maintaining Genomic Integrity for Therapeutic Purposes

    Directory of Open Access Journals (Sweden)

    Jolene Michelle Helena

    2018-04-01

    Full Text Available Deoxyribonucleic acid (DNA is the self-replicating hereditary material that provides a blueprint which, in collaboration with environmental influences, produces a structural and functional phenotype. As DNA coordinates and directs differentiation, growth, survival, and reproduction, it is responsible for life and the continuation of our species. Genome integrity requires the maintenance of DNA stability for the correct preservation of genetic information. This is facilitated by accurate DNA replication and precise DNA repair. DNA damage may arise from a wide range of both endogenous and exogenous sources but may be repaired through highly specific mechanisms. The most common mechanisms include mismatch, base excision, nucleotide excision, and double-strand DNA (dsDNA break repair. Concurrent with regulation of the cell cycle, these mechanisms are precisely executed to ensure full restoration of damaged DNA. Failure or inaccuracy in DNA repair contributes to genome instability and loss of genetic information which may lead to mutations resulting in disease or loss of life. A detailed understanding of the mechanisms of DNA damage and its repair provides insight into disease pathogeneses and may facilitate diagnosis and the development of targeted therapies.

  13. Integrated analysis of epigenomic and genomic changes by DNA methylation dependent mechanisms provides potential novel biomarkers for prostate cancer.

    Science.gov (United States)

    White-Al Habeeb, Nicole M A; Ho, Linh T; Olkhov-Mitsel, Ekaterina; Kron, Ken; Pethe, Vaijayanti; Lehman, Melanie; Jovanovic, Lidija; Fleshner, Neil; van der Kwast, Theodorus; Nelson, Colleen C; Bapat, Bharati

    2014-09-15

    Epigenetic silencing mediated by CpG methylation is a common feature of many cancers. Characterizing aberrant DNA methylation changes associated with tumor progression may identify potential prognostic markers for prostate cancer (PCa). We treated two PCa cell lines, 22Rv1 and DU-145 with the demethylating agent 5-Aza 2'-deoxycitidine (DAC) and global methylation status was analyzed by performing methylation-sensitive restriction enzyme based differential methylation hybridization strategy followed by genome-wide CpG methylation array profiling. In addition, we examined gene expression changes using a custom microarray. Gene Set Enrichment Analysis (GSEA) identified the most significantly dysregulated pathways. In addition, we assessed methylation status of candidate genes that showed reduced CpG methylation and increased gene expression after DAC treatment, in Gleason score (GS) 8 vs. GS6 patients using three independent cohorts of patients; the publically available The Cancer Genome Atlas (TCGA) dataset, and two separate patient cohorts. Our analysis, by integrating methylation and gene expression in PCa cell lines, combined with patient tumor data, identified novel potential biomarkers for PCa patients. These markers may help elucidate the pathogenesis of PCa and represent potential prognostic markers for PCa patients.

  14. Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B as revealed by whole genome re-sequencing.

    Science.gov (United States)

    Fu, Chong-Yun; Liu, Wu-Ge; Liu, Di-Lin; Li, Ji-Hua; Zhu, Man-Shan; Liao, Yi-Long; Liu, Zhen-Rong; Zeng, Xue-Qin; Wang, Feng

    2016-03-01

    Next-generation sequencing technologies provide opportunities to further understand genetic variation, even within closely related cultivars. We performed whole genome resequencing of two elite indica rice varieties, RGD-7S and Taifeng B, whose F1 progeny showed hybrid weakness and hybrid vigor when grown in the early- and late-cropping seasons, respectively. Approximately 150 million 100-bp pair-end reads were generated, which covered ∼86% of the rice (Oryza sativa L. japonica 'Nipponbare') reference genome. A total of 2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified 961,791 SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S, 1989 of 2727 CNVs were overlapped in 218 genes, and 1231 of 2010 CNVs were annotated in 175 genes in Taifeng B. In addition, we verified a subset of InDels in the interval of hybrid weakness genes, Hw3 and Hw4, and obtained some polymorphic InDel markers, which will provide a sound foundation for cloning hybrid weakness genes. Analysis of genomic variations will also contribute to understanding the genetic basis of hybrid weakness and heterosis.

  15. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Science.gov (United States)

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  16. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  17. DNA Delivery and Genomic Integration into Mammalian Target Cells through Type IV A and B Secretion Systems of Human Pathogens

    Directory of Open Access Journals (Sweden)

    Dolores L. Guzmán-Herrador

    2017-08-01

    Full Text Available We explore the potential of bacterial secretion systems as tools for genomic modification of human cells. We previously showed that foreign DNA can be introduced into human cells through the Type IV A secretion system of the human pathogen Bartonella henselae. Moreover, the DNA is delivered covalently attached to the conjugative relaxase TrwC, which promotes its integration into the recipient genome. In this work, we report that this tool can be adapted to other target cells by using different relaxases and secretion systems. The promiscuous relaxase MobA from plasmid RSF1010 can be used to deliver DNA into human cells with higher efficiency than TrwC. MobA also promotes DNA integration, albeit at lower rates than TrwC. Notably, we report that DNA transfer to human cells can also take place through the Type IV secretion system of two intracellular human pathogens, Legionella pneumophila and Coxiella burnetii, which code for a distantly related Dot/Icm Type IV B secretion system. This suggests that DNA transfer could be an intrinsic ability of this family of secretion systems, expanding the range of target human cells. Further analysis of the DNA transfer process showed that recruitment of MobA by Dot/Icm was dependent on the IcmSW chaperone, which may explain the higher DNA transfer rates obtained. Finally, we observed that the presence of MobA negatively affected the intracellular replication of C. burnetii, suggesting an interference with Dot/Icm translocation of virulence factors.

  18. Assessment of adaptability of zebu cattle (Bos indicus) breeds in two different climatic conditions: using cytogenetic techniques on genome integrity.

    Science.gov (United States)

    Kumar, Anil; Waiz, Syma Ashraf; Sridhar Goud, T; Tonk, R K; Grewal, Anita; Singh, S V; Yadav, B R; Upadhyay, R C

    2016-06-01

    The aim of this study was to evaluate the genome integrity so as to assess the adaptability of three breeds of indigenous cattle reared under arid and semi-arid regions of Rajasthan (Bikaner) and Haryana (Karnal) India. The cattle were of homogenous group (same age and sex) of indigenous breeds viz. Sahiwal, Tharparkar and Kankrej. A total of 100 animals were selected for this study from both climatic conditions. The sister chromatid exchanges (SCE's), chromosomal gaps and chromatid breaks were observed in metaphase plates of chromosome preparations obtained from in vitro culture of peripheral blood lymphocytes. The mean number of breaks and gaps in Sahiwal and Tharparkar of semi-arid zone were 8.56 ± 3.16, 6.4 ± 3.39 and 8.72 ± 2.04, 3.52 ± 6.29, respectively. Similarly, the mean number of breaks and gaps in Tharparkar and Kankrej cattle of arid zone were 5.26 ± 1.76, 2.74 ± 1.76 and 5.24 ± 1.84, 2.5 ± 1.26, respectively. The frequency of SCEs in chromosomes was found significantly higher (P  0.05) was observed in the same zone. The analysis of frequency of CAs and SCEs revealed significant effects of environmental conditions on the genome integrity of animals, thereby indicating an association with their adaptability.

  19. Post-genome integrative biology: so that's what they call clinical science.

    Science.gov (United States)

    Rees, J

    2001-01-01

    Medical science is increasingly dominated by slogans, a characteristic reflecting its growing bureaucratic and corporate structure. Chief amongst these slogans is the idea that genomics will transform the public health. I believe this view is mistaken. Using studies of the genetics of skin cancer and the genetics of skin pigmentation, I describe how recent discoveries have contributed to our understanding of these topics and of human evolution. I contrast these discoveries with insights gained from other approaches, particularly those based on clinical studies. The 'IKEA model of medical advance'--you just do the basic science in the laboratory and self-assemble in the clinic--is not only damaging to clinical advance, but reflects a widespread ignorance about the nature of disease and how clinical discovery arises. We need to think more about disease and less about genes; more in the clinic and less in the laboratory.

  20. An integrated genomic and transcriptomic survey of mucormycosis-causing fungi

    Science.gov (United States)

    Chibucos, Marcus C.; Soliman, Sameh; Gebremariam, Teclegiorgis; Lee, Hongkyu; Daugherty, Sean; Orvis, Joshua; Shetty, Amol C.; Crabtree, Jonathan; Hazen, Tracy H.; Etienne, Kizee A.; Kumari, Priti; O'Connor, Timothy D.; Rasko, David A.; Filler, Scott G.; Fraser, Claire M.; Lockhart, Shawn R.; Skory, Christopher D.; Ibrahim, Ashraf S.; Bruno, Vincent M.

    2016-01-01

    Mucormycosis is a life-threatening infection caused by Mucorales fungi. Here we sequence 30 fungal genomes, and perform transcriptomics with three representative Rhizopus and Mucor strains and with human airway epithelial cells during fungal invasion, to reveal key host and fungal determinants contributing to pathogenesis. Analysis of the host transcriptional response to Mucorales reveals platelet-derived growth factor receptor B (PDGFRB) signaling as part of a core response to divergent pathogenic fungi; inhibition of PDGFRB reduces Mucorales-induced damage to host cells. The unique presence of CotH invasins in all invasive Mucorales, and the correlation between CotH gene copy number and clinical prevalence, are consistent with an important role for these proteins in mucormycosis pathogenesis. Our work provides insight into the evolution of this medically and economically important group of fungi, and identifies several molecular pathways that might be exploited as potential therapeutic targets. PMID:27447865

  1. Integrity of nuclear genomic deoxyribonucleic acid in cooked meat: Implications for food traceability.

    Science.gov (United States)

    Aslan, O; Hamill, R M; Sweeney, T; Reardon, W; Mullen, A M

    2009-01-01

    It is essential to isolate high-quality DNA from muscle tissue for PCR-based applications in traceability of animal origin. We wished to examine the impact of cooking meat to a range of core temperatures on the quality and quantity of subsequently isolated genomic (specifically, nuclear) DNA. Triplicate steak samples were cooked in a water bath (100 degrees C) until their final internal temperature was 75, 80, 85, 90, 95, or 100 degrees C, and DNA was extracted. Deoxyribonucleic acid quantity was significantly reduced in cooked meat samples compared with raw (6.5 vs. 56.6 ng/microL; P 800 bp) were observed only when using DNA from raw meat and steak cooked to lower core temperatures. Small amplicons (food authentication, it is less abundant, and results suggest that analyses should be designed to use small amplicon sizes for meat cooked to high core temperatures.

  2. The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

    Science.gov (United States)

    Harb, Omar S; Roos, David S

    2015-01-01

    Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.

  3. The future of genome-scale modeling of yeast through integration of a transcriptional regulatory network

    DEFF Research Database (Denmark)

    Liu, Guodong; Marras, Antonio; Nielsen, Jens

    2014-01-01

    regulatory information is necessary to improve the accuracy and predictive ability of metabolic models. Here we review the strategies for the reconstruction of a transcriptional regulatory network (TRN) for yeast and the integration of such a reconstruction into a flux balance analysis-based metabolic model......Metabolism is regulated at multiple levels in response to the changes of internal or external conditions. Transcriptional regulation plays an important role in regulating many metabolic reactions by altering the concentrations of metabolic enzymes. Thus, integration of the transcriptional....... While many large-scale TRN reconstructions have been reported for yeast, these reconstructions still need to be improved regarding the functionality and dynamic property of the regulatory interactions. In addition, mathematical modeling approaches need to be further developed to efficiently integrate...

  4. Integration of genomic and medical data into a 3D atlas of human anatomy.

    Science.gov (United States)

    Turinsky, Andrei L; Fanea, Elena; Trinh, Quang; Dong, Xiaoli; Stromer, Julie N; Shu, Xueling; Wat, Stephen; Hallgrímsson, Benedikt; Hill, Jonathan W; Edwards, Carol; Grosenick, Brenda; Yajima, Masumi; Sensen, Christoph W

    2008-01-01

    We have developed a framework for the visual integration and exploration of multi-scale biomedical data, which includes anatomical and molecular components. We have also created a Java-based software system that integrates molecular information, such as gene expression data, into a three-dimensional digital atlas of the male adult human anatomy. Our atlas is structured according to the Terminologia Anatomica. The underlying data-indexing mechanism uses open standards and semantic ontology-processing tools to establish the associations between heterogeneous data types. The software system makes an extensive use of virtual reality visualization.

  5. Genome analysis of a clinical isolate of Shewanella sp. uncovered an active hybrid integrative and conjugative element carrying an integron platform inserted in a novel genomic locus.

    Science.gov (United States)

    Parmeciano Di Noto, Gisela; Jara, Eugenio; Iriarte, Andrés; Centrón, Daniela; Quiroga, Cecilia

    2016-08-01

    Shewanella spp. are currently considered to be emerging pathogens that can code for a blaOXA carbapenemase in their chromosome. Complete genome analysis of the clinical isolate Shewanella sp. Sh95 revealed that this strain is a novel species, which shares a lineage with marine isolates. Characterization of its resistome showed that it codes for genes drfA15, qacH and blaOXA-48. We propose that Shewanella sp. Sh95 acts as reservoir of blaOXA-48. Moreover, analysis of mobilome showed that it contains a novel integrative and conjugative element (ICE), named ICESh95. Comparative analysis between the close relatives ICESpuPO1 from Shewanella sp. W3-18-1 and ICE SXTMO10 from Vibrio cholerae showed that ICESh95 encompassed two new regions, a type III restriction modification system and a multidrug resistance integron. The integron platform contained a novel arrangement formed by gene cassettes drfA15 and qacH, and a class C-attC group II intron. Furthermore, insertion of ICESh95 occurred at a unique target site, which correlated with the presence of a different xis/int module. Mobility of ICESh95 was assessed and demonstrated its ability to self-transfer with high efficiency to different species of bacteria. Our results show that ICESh95 is a self-transmissible, mobile element, which can contribute to the dissemination of antimicrobial resistance; this is clearly a threat when natural bacteria from water ecosystems, such as Shewanella, act as vectors in its propagation.

  6. Application of integrative genomics and systems biology to conventional and in vitro reproductive traits in cattle

    DEFF Research Database (Denmark)

    Mazzoni, Gianluca; Pedersen, Hanne S.; de Oliveira Junior, Gerson A.

    2017-01-01

    by both conventional and ARTs such as OPU-IVP. The integration of systems biology information across different biological layers generates a complete view of the different molecular networks that control complex traits and can provide a strong contribution to the understanding of traits related to ARTs....

  7. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration

    KAUST Repository

    Suzuki, Keiichiro; Tsunekawa, Yuji; Herná ndez-Bení tez, Reyna; Wu, Jun; Zhu, Jie; Kim, Euiseok J.; Hatanaka, Fumiyuki; Yamamoto, Mako; Araoka, Toshikazu; Li, Zhe; Kurita, Masakazu; Hishida, Tomoaki; Li, Mo; Aizawa, Emi; Guo, Shicheng; Chen, Song; Goebl, April; Soligalla, Rupa Devi; Qu, Jing; Jiang, Tingshuai; Fu, Xin; Jafari, Maryam; Esteban, Concepcion Rodriguez; Berggren, W. Travis; Lajara, Jeronimo; Nuñ ez-Delicado, Estrella; Guillen, Pedro; Campistol, Josep M.; Matsuzaki, Fumio; Liu, Guang-Hui; Magistretti, Pierre J.; Zhang, Kun; Callaway, Edward M.; Zhang, Kang; Belmonte, Juan Carlos Izpisua

    2016-01-01

    regularly interspaced short palindromic repeat/Cas9 (CRISPR/Cas9)3, 4 technology, here we devise a homology-independent targeted integration (HITI) strategy, which allows for robust DNA knock-in in both dividing and non-dividing cells in vitro and, more

  8. Data-driven integration of genome-scale regulatory and metabolic network models

    Science.gov (United States)

    Imam, Saheed; Schäuble, Sascha; Brooks, Aaron N.; Baliga, Nitin S.; Price, Nathan D.

    2015-01-01

    Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription, and signaling) have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert—a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system. PMID:25999934

  9. Data-driven integration of genome-scale regulatory and metabolic network models

    Directory of Open Access Journals (Sweden)

    Saheed eImam

    2015-05-01

    Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.

  10. Comparison of 432 Pseudomonas strains through integration of genomic, functional, metabolic and expression data

    NARCIS (Netherlands)

    Koehorst, Jasper J.; Dam, van Jesse C.J.; Heck, van Ruben G.A.; Saccenti, Edoardo; Martins dos Santos, Vitor; Suarez-Diez, Maria; Schaap, Peter J.

    2016-01-01

    Pseudomonas is a highly versatile genus containing species that can be harmful to humans and plants while others are widely used for bioengineering and bioremediation. We analysed 432 sequenced Pseudomonas strains by integrating results from a large scale functional comparison using protein

  11. An Integrated Metabolomic and Genomic Mining Workflow to Uncover the Biosynthetic Potential of Bacteria

    DEFF Research Database (Denmark)

    Månsson, Maria; Vynne, Nikolaj Grønnegaard; Klitgaard, Andreas

    2016-01-01

    Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in ba...

  12. Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

    Directory of Open Access Journals (Sweden)

    Walker M Andrew

    2006-09-01

    Full Text Available Abstract Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c, 54 (Dixon, 83 (Ann1 and 9 (Temecula-1. A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes

  13. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

    Science.gov (United States)

    Huo, Zhiguang; Tseng, George

    2017-06-01

    Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K -means (is- K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is- K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.

  14. atBioNet– an integrated network analysis tool for genomics and biomarker discovery

    Directory of Open Access Journals (Sweden)

    Ding Yijun

    2012-07-01

    Full Text Available Abstract Background Large amounts of mammalian protein-protein interaction (PPI data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. Results atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks. The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. Conclusion atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http

  15. atBioNet--an integrated network analysis tool for genomics and biomarker discovery.

    Science.gov (United States)

    Ding, Yijun; Chen, Minjun; Liu, Zhichao; Ding, Don; Ye, Yanbin; Zhang, Min; Kelly, Reagan; Guo, Li; Su, Zhenqiang; Harris, Stephen C; Qian, Feng; Ge, Weigong; Fang, Hong; Xu, Xiaowei; Tong, Weida

    2012-07-20

    Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity. atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis. atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.

  16. The oncogenic potential of BK-polyomavirus is linked to viral integration into the human genome.

    Science.gov (United States)

    Kenan, Daniel J; Mieczkowski, Piotr A; Burger-Calderon, Raquel; Singh, Harsharan K; Nickeleit, Volker

    2015-11-01

    It has been suggested that BK-polyomavirus is linked to oncogenesis via high expression levels of large T-antigen in some urothelial neoplasms arising following kidney transplantation. However, a causal association between BK-polyomavirus, large T-antigen expression and oncogenesis has never been demonstrated in humans. Here we describe an investigation using high-throughput sequencing of tumour DNA obtained from an urothelial carcinoma arising in a renal allograft. We show that a novel BK-polyomavirus strain, named CH-1, is integrated into exon 26 of the myosin-binding protein C1 gene (MYBPC1) on chromosome 12 in tumour cells but not in normal renal cells. Integration of the BK-polyomavirus results in a number of discrete alterations in viral gene expression, including: (a) disruption of VP1 protein expression and robust expression of large T-antigen; (b) preclusion of viral replication; and (c) deletions in the non-coding control region (NCCR), with presumed alterations in promoter feedback loops. Viral integration disrupts one MYBPC1 gene copy and likely alters its expression. Circular episomal BK-polyomavirus gene sequences are not found, and the renal allograft shows no productive polyomavirus infection or polyomavirus nephropathy. These findings support the hypothesis that integration of polyomaviruses is essential to tumourigenesis. It is likely that dysregulation of large T-antigen, with persistent over-expression in non-lytic cells, promotes cell growth, genetic instability and neoplastic transformation. © 2015 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.

  17. Integration of genomics, proteomics, and imaging for cardiac stem cell therapy

    International Nuclear Information System (INIS)

    Chun, Hyung J.; Wilson, Kitch O.; Huang, Mei; Wu, Joseph C.

    2007-01-01

    Cardiac stem cell therapy is beginning to mature as a valid treatment for heart disease. As more clinical trials utilizing stem cells emerge, it is imperative to establish the mechanisms by which stem cells confer benefit in cardiac diseases. In this paper, we review three methods - molecular cellular imaging, gene expression profiling, and proteomic analysis - that can be integrated to provide further insights into the role of this emerging therapy. (orig.)

  18. Integrated metabolism in sponge-microbe symbiosis revealed by genome-centered metatranscriptomics.

    Science.gov (United States)

    Moitinho-Silva, Lucas; Díez-Vives, Cristina; Batani, Giampiero; Esteves, Ana Is; Jahn, Martin T; Thomas, Torsten

    2017-07-01

    Despite an increased understanding of functions in sponge microbiomes, the interactions among the symbionts and between symbionts and host are not well characterized. Here we reconstructed the metabolic interactions within the sponge Cymbastela concentrica microbiome in the context of functional features of symbiotic diatoms and the host. Three genome bins (CcPhy, CcNi and CcThau) were recovered from metagenomic data of C. concentrica, belonging to the proteobacterial family Phyllobacteriaceae, the Nitrospira genus and the thaumarchaeal order Nitrosopumilales. Gene expression was estimated by mapping C. concentrica metatranscriptomic reads. Our analyses indicated that CcPhy is heterotrophic, while CcNi and CcThau are chemolithoautotrophs. CcPhy expressed many transporters for the acquisition of dissolved organic compounds, likely available through the sponge's filtration activity and symbiotic carbon fixation. Coupled nitrification by CcThau and CcNi was reconstructed, supported by the observed close proximity of the cells in fluorescence in situ hybridization. CcPhy facultative anaerobic respiration and assimilation by diatoms may consume the resulting nitrate. Transcriptional analysis of diatom and sponge functions indicated that these organisms are likely sources of organic compounds, for example, creatine/creatinine and dissolved organic carbon, for other members of the symbiosis. Our results suggest that organic nitrogen compounds, for example, creatine, creatinine, urea and cyanate, fuel the nitrogen cycle within the sponge. This study provides an unprecedented view of the metabolic interactions within sponge-microbe symbiosis, bridging the gap between cell- and community-level knowledge.

  19. Integrated Genomic Analysis of Diverse Induced Pluripotent Stem Cells from the Progenitor Cell Biology Consortium.

    Science.gov (United States)

    Salomonis, Nathan; Dexheimer, Phillip J; Omberg, Larsson; Schroll, Robin; Bush, Stacy; Huo, Jeffrey; Schriml, Lynn; Ho Sui, Shannan; Keddache, Mehdi; Mayhew, Christopher; Shanmukhappa, Shiva Kumar; Wells, James; Daily, Kenneth; Hubler, Shane; Wang, Yuliang; Zambidis, Elias; Margolin, Adam; Hide, Winston; Hatzopoulos, Antonis K; Malik, Punam; Cancelas, Jose A; Aronow, Bruce J; Lutzko, Carolyn

    2016-07-12

    The rigorous characterization of distinct induced pluripotent stem cells (iPSC) derived from multiple reprogramming technologies, somatic sources, and donors is required to understand potential sources of variability and downstream potential. To achieve this goal, the Progenitor Cell Biology Consortium performed comprehensive experimental and genomic analyses of 58 iPSC from ten laboratories generated using a variety of reprogramming genes, vectors, and cells. Associated global molecular characterization studies identified functionally informative correlations in gene expression, DNA methylation, and/or copy-number variation among key developmental and oncogenic regulators as a result of donor, sex, line stability, reprogramming technology, and cell of origin. Furthermore, X-chromosome inactivation in PSC produced highly correlated differences in teratoma-lineage staining and regulator expression upon differentiation. All experimental results, and raw, processed, and metadata from these analyses, including powerful tools, are interactively accessible from a new online portal at https://www.synapse.org to serve as a reusable resource for the stem cell community. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  20. Genetics and crime: Integrating new genomic discoveries into psychological research about antisocial behavior

    Science.gov (United States)

    Wertz, J.; Caspi, A.; Belsky, D. W.; Beckley, A. L.; Arseneault, L.; Barnes, J. C.; Corcoran, D. L.; Hogan, S.; Houts, R. M.; Morgan, N.; Odgers, C. L.; Prinz, J. A.; Sugden, K.; Williams, B. S.; Poulton, R.; Moffitt, T. E.

    2018-01-01

    Drawing on psychological and sociological theories of crime causation, we tested the hypothesis that genetic risk for low educational attainment (assessed via a genome-wide polygenic score) is associated with offending. We further tested hypotheses of how polygenic risk relates to the development of antisocial behavior from childhood through adulthood. Across the Dunedin and E-Risk birth cohorts of individuals growing up 20 years and 20,000 kilometres apart, education polygenic scores predicted risk of a criminal record, with modest effects. Polygenic risk manifested during primary schooling, in lower cognitive abilities, lower self-control, academic difficulties, and truancy, and predicted a life-course persistent pattern of antisocial behavior that onsets in childhood and persists into adulthood. Crime is central in the nature/nurture debate, and findings reported here demonstrate how molecular-genetic discoveries can be incorporated into established theories of antisocial behavior. They also suggest the hypothesis that improving school experiences might prevent genetic influences on crime from unfolding. PMID:29513605

  1. Genetics and Crime: Integrating New Genomic Discoveries Into Psychological Research About Antisocial Behavior.

    Science.gov (United States)

    Wertz, J; Caspi, A; Belsky, D W; Beckley, A L; Arseneault, L; Barnes, J C; Corcoran, D L; Hogan, S; Houts, R M; Morgan, N; Odgers, C L; Prinz, J A; Sugden, K; Williams, B S; Poulton, R; Moffitt, T E

    2018-05-01

    Drawing on psychological and sociological theories of crime causation, we tested the hypothesis that genetic risk for low educational attainment (assessed via a genome-wide polygenic score) is associated with criminal offending. We further tested hypotheses of how polygenic risk relates to the development of antisocial behavior from childhood through adulthood. Across the Dunedin and Environmental Risk (E-Risk) birth cohorts of individuals growing up 20 years and 20,000 kilometers apart, education polygenic scores predicted risk of a criminal record with modest effects. Polygenic risk manifested during primary schooling in lower cognitive abilities, lower self-control, academic difficulties, and truancy, and it was associated with a life-course-persistent pattern of antisocial behavior that onsets in childhood and persists into adulthood. Crime is central in the nature-nurture debate, and findings reported here demonstrate how molecular-genetic discoveries can be incorporated into established theories of antisocial behavior. They also suggest that improving school experiences might prevent genetic influences on crime from unfolding.

  2. Accelerating Genome Editing in CHO Cells Using CRISPR Cas9 and CRISPy, a Web-Based Target Finding Tool

    DEFF Research Database (Denmark)

    Ronda, Carlotta; Pedersen, Lasse Ebdrup; Hansen, Henning Gram

    2014-01-01

    of the CRISPR Cas9 technology in CHO cells by generating site-specific gene disruptions in COSMC and FUT8, both of which encode proteins involved in glycosylation. The tested single guide RNAs (sgRNAs) created an indel frequency up to 47.3% in COSMC, while an indel frequency up to 99.7% in FUT8 was achieved...... mutations at the target sites, with a strong preference for single base indels. Finally, we have developed a user-friendly bioinformatics tool, named “CRISPy” for rapid identification of sgRNA target sequences in the CHO-K1 genome. The CRISPy tool identified 1,970,449 CRISPR targets divided into 27...

  3. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization.

    Directory of Open Access Journals (Sweden)

    Xiaoquan Wen

    2017-03-01

    Full Text Available We propose a novel statistical framework for integrating the result from molecular quantitative trait loci (QTL mapping into genome-wide genetic association analysis of complex traits, with the primary objectives of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals. We introduce a natural Bayesian hierarchical model that treats the latent association status of molecular QTLs as SNP-level annotations for candidate SNPs of complex traits. We detail a computational procedure to seamlessly perform enrichment, fine-mapping and colocalization analyses, which is a distinct feature compared to the existing colocalization analysis procedures in the literature. The proposed approach is computationally efficient and requires only summary-level statistics. We evaluate and demonstrate the proposed computational approach through extensive simulation studies and analyses of blood lipid data and the whole blood eQTL data from the GTEx project. In addition, a useful utility from our proposed method enables the computation of expected colocalization signals using simple characteristics of the association data. Using this utility, we further illustrate the importance of enrichment analysis on the ability to discover colocalized signals and the potential limitations of currently available molecular QTL data. The software pipeline that implements the proposed computation procedures, enloc, is freely available at https://github.com/xqwen/integrative.

  4. Genome-wide search for miRNA-target interactions in Arabidopsis thaliana with an integrated approach

    Directory of Open Access Journals (Sweden)

    Ding Jiandong

    2012-06-01

    Full Text Available Abstract Background MiRNA are about 22nt long small noncoding RNAs that post transcriptionally regulate gene expression in animals, plants and protozoa. Confident identification of MiRNA-Target Interactions (MTI is vital to understand their function. Currently, several integrated computational programs and databases are available for animal miRNAs, the mechanisms of which are significantly different from plant miRNAs. Methods Here we present an integrated MTI prediction and analysis toolkit (imiRTP for Arabidopsis thaliana. It features two important functions: (i combination of several effective plant miRNA target prediction methods provides a sufficiently large MTI candidate set, and (ii different filters allow for an efficient selection of potential targets. The modularity of imiRTP enables the prediction of high quality targets on genome-wide scale. Moreover, predicted MTIs can be presented in various ways, which allows for browsing through the putative target sites as well as conducting simple and advanced analyses. Results Results show that imiRTP could always find high quality candidates compared with single method by choosing appropriate filter and parameter. And we also reveal that a portion of plant miRNA could bind target genes out of coding region. Based on our results, imiRTP could facilitate the further study of Arabidopsis miRNAs in real use. All materials of imiRTP are freely available under a GNU license at (http://admis.fudan.edu.cn/projects/imiRTP.htm.

  5. Integrative genomic analysis of interleukin-36RN and its prognostic value in cancer.

    Science.gov (United States)

    Lv, Zhilei; Fan, Jinshuo; Zhang, Xiuxiu; Huang, Qi; Han, Jieli; Wu, Feng; Hu, Guorong; Guo, Mengfei; Jin, Yang

    2016-02-01

    Interleukin (IL)-36RN, previously known as IL1-F5 and IL-1δ, shares a 360-kb region of chromosome 2q13 with members of IL-1 systems. IL-36RN encodes an anti-inflammatory cytokine, IL-36 receptor antagonist (IL-36Ra). In spite of IL-36Ra showing the highest homology to IL-1 receptor (IL-1R) antagonist, it differs from the latter in aspects including its binding to IL-lRrp2 but not to IL-1R1. IL-36RN is mainly expressed in epithelial cells and has important roles in inflammatory diseases. In the present study, IL-36RN was identified in the genomes of 27 species, including human, chimpanzee, mouse, horse and dolphin. Human IL-36RN was mainly expressed in the eye, head and neck, fetal heart, lung, testis, cervix and placenta; furthermore, it was highly expressed in bladder and parathyroid tumors. Furthermore, a total of 30 single nucleotide polymorphisms causing missense mutations were determined, which are considered to be the causes of various diseases, such as generalized pustular psoriasis. In addition, the link between IL-36RN and the prognosis of certain cancer types was revealed through meta-analysis. Tumor-associated transcriptional factors c-Fos, activator protein-1, c-Jun and nuclear factor κB were found to bind to the upstream region in the IL-36RN gene. This may indicate that IL-36RN is involved in tumorigenesis and tumor progression through the regulation of tumor-associated transcriptional factors. The present study identified IL-36RN in various species and investigated the associations between IL-36RN and cancer prognosis, which would determine whether IL-36RN drove the evolution of the various species with regard to tumorigenesis.

  6. Integrated genomic and BMI analysis for type 2 diabetes risk assessment.

    Directory of Open Access Journals (Sweden)

    Dayanara eLebrón-Aldea

    2015-03-01

    Full Text Available Type 2 Diabetes (T2D is a chronic disease arising from the development of insulin absence or resistance within the body, and a complex interplay of environmental and genetic factors. The incidence of T2D has increased throughout the last few decades, together with the occurrence of the obesity epidemic. The consideration of variants identified by Genome Wide Association Studies (GWAS into risk assessment models for T2D could aid in the identification of at-risk patients who could benefit from preventive medicine. In this study, we build several risk assessment models, and evaluated them with two different classification approaches (Logistic Regression and Neural Networks, to measure the effect of including genetic information in the prediction of T2D. We used data from to the Original and the Offspring cohorts of the Framingham Heart Study, which provides phenotypic and genetic information for 5,245 subjects (4,306 controls and 939 cases. Models were built by using several covariates: gender, exposure time, cohort, body mass index (BMI, and 65 established T2D-associated SNPs. We fitted Logistic Regressions and Bayesian Regularized Neural Network and then assessed their predictive ability by using a ten-fold cross validation. We found that the inclusion of genetic information into the risk assessment models increased the predictive ability by 2%, when compared to the baseline model. Furthermore, the models that included BMI at the onset of diabetes as a possible effector, gave an improvement of 6% in the area under the curve derived from the ROC analysis. The highest AUC achieved (0.75 belonged to the model that included BMI, and a genetic score based on the 65 established T2D-associated SNPs. Finally, the inclusion of SNPs and BMI raised predictive ability in all models as expected; however, results from the AUC in Neural Networks and Logistic Regression did not differ significantly in their prediction accuracy.

  7. Using genomic data to unravel the root of the placental mammal phylogeny.

    Science.gov (United States)

    Murphy, William J; Pringle, Thomas H; Crider, Tess A; Springer, Mark S; Miller, Webb

    2007-04-01

    The phylogeny of placental mammals is a critical framework for choosing future genome sequencing targets and for resolving the ancestral mammalian genome at the nucleotide level. Despite considerable recent progress defining superordinal relationships, several branches remain poorly resolved, including the root of the placental tree. Here we analyzed the genome sequence assemblies of human, armadillo, elephant, and opossum to identify informative coding indels that would serve as rare genomic changes to infer early events in placental mammal phylogeny. We also expanded our species sampling by including sequence data from >30 ongoing genome projects, followed by PCR and sequencing validation of each indel in additional taxa. Our data provide support for a sister-group relationship between Afrotheria and Xenarthra (the Atlantogenata hypothesis), which is in turn the sister-taxon to Boreoeutheria. We failed to recover any indels in support of a basal position for Xenarthra (Epitheria), which is suggested by morphology and a recent retroposon analysis, or a hypothesis with Afrotheria basal (Exafricoplacentalia), which is favored by phylogenetic analysis of large nuclear gene data sets. In addition, we identified two retroposon insertions that also support Atlantogenata and none for the alternative hypotheses. A revised molecular timescale based on these phylogenetic inferences suggests Afrotheria and Xenarthra diverged from other placental mammals approximately 103 (95-114) million years ago. We discuss the impacts of this topology on earlier phylogenetic reconstructions and repeat-based inferences of phylogeny.

  8. Targeted Porcine Genome Engineering with TALENs

    DEFF Research Database (Denmark)

    Luo, Yonglun; Lin, Lin; Golas, Mariola Monika

    2015-01-01

    confers precisely editing (e.g., mutations or indels) or insertion of a functional transgenic cassette to user-designed loci. Techniques for targeted genome engineering are growing dramatically and include, e.g., zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs......, including construction of sequence-specific TALENs, delivery of TALENs into primary porcine fibroblasts, and detection of TALEN-mediated cleavage, is described. This chapter is useful for scientists who are inexperienced with TALEN engineering of porcine cells as well as of other large animals....

  9. Integrative and comparative genomics analysis of early hepatocellular carcinoma differentiated from liver regeneration in young and old

    Directory of Open Access Journals (Sweden)

    Ozand Pinar T

    2010-06-01

    Full Text Available Abstract Background Hepatocellular carcinoma (HCC is the third-leading cause of cancer-related deaths worldwide. It is often diagnosed at an advanced stage, and hence typically has a poor prognosis. To identify distinct molecular mechanisms for early HCC we developed a rat model of liver regeneration post-hepatectomy, as well as liver cells undergoing malignant transformation and compared them to normal liver using a microarray approach. Subsequently, we performed cross-species comparative analysis coupled with copy number alterations (CNA of independent early human HCC microarray studies to facilitate the identification of critical regulatory modules conserved across species. Results We identified 35 signature genes conserved across species, and shared among different types of early human HCCs. Over 70% of signature genes were cancer-related, and more than 50% of the conserved genes were mapped to human genomic CNA regions. Functional annotation revealed genes already implicated in HCC, as well as novel genes which were not previously reported in liver tumors. A subset of differentially expressed genes was validated using quantitative RT-PCR. Concordance was also confirmed for a significant number of genes and pathways in five independent validation microarray datasets. Our results indicated alterations in a number of cancer related pathways, including p53, p38 MAPK, ERK/MAPK, PI3K/AKT, and TGF-β signaling pathways, and potential critical regulatory role of MYC, ERBB2, HNF4A, and SMAD3 for early HCC transformation. Conclusions The integrative analysis of transcriptional deregulation, genomic CNA and comparative cross species analysis brings new insights into the molecular profile of early hepatoma formation. This approach may lead to robust biomarkers for the detection of early human HCC.

  10. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    Science.gov (United States)

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. A FRAMEWORK FOR ATTRIBUTE-BASED COMMUNITY DETECTION WITH APPLICATIONS TO INTEGRATED FUNCTIONAL GENOMICS.

    Science.gov (United States)

    Yu, Han; Hageman Blair, Rachael

    2016-01-01

    Understanding community structure in networks has received considerable attention in recent years. Detecting and leveraging community structure holds promise for understanding and potentially intervening with the spread of influence. Network features of this type have important implications in a number of research areas, including, marketing, social networks, and biology. However, an overwhelming majority of traditional approaches to community detection cannot readily incorporate information of node attributes. Integrating structural and attribute information is a major challenge. We propose a exible iterative method; inverse regularized Markov Clustering (irMCL), to network clustering via the manipulation of the transition probability matrix (aka stochastic flow) corresponding to a graph. Similar to traditional Markov Clustering, irMCL iterates between "expand" and "inflate" operations, which aim to strengthen the intra-cluster flow, while weakening the inter-cluster flow. Attribute information is directly incorporated into the iterative method through a sigmoid (logistic function) that naturally dampens attribute influence that is contradictory to the stochastic flow through the network. We demonstrate advantages and the exibility of our approach using simulations and real data. We highlight an application that integrates breast cancer gene expression data set and a functional network defined via KEGG pathways reveal significant modules for survival.

  12. Multilevel functional genomics data integration as a tool for understanding physiology: a network biology perspective.

    Science.gov (United States)

    Davidsen, Peter K; Turan, Nil; Egginton, Stuart; Falciani, Francesco

    2016-02-01

    The overall aim of physiological research is to understand how living systems function in an integrative manner. Consequently, the discipline of physiology has since its infancy attempted to link multiple levels of biological organization. Increasingly this has involved mathematical and computational approaches, typically to model a small number of components spanning several levels of biological organization. With the advent of "omics" technologies, which can characterize the molecular state of a cell or tissue (intended as the level of expression and/or activity of its molecular components), the number of molecular components we can quantify has increased exponentially. Paradoxically, the unprecedented amount of experimental data has made it more difficult to derive conceptual models underlying essential mechanisms regulating mammalian physiology. We present an overview of state-of-the-art methods currently used to identifying biological networks underlying genomewide responses. These are based on a data-driven approach that relies on advanced computational methods designed to "learn" biology from observational data. In this review, we illustrate an application of these computational methodologies using a case study integrating an in vivo model representing the transcriptional state of hypoxic skeletal muscle with a clinical study representing muscle wasting in chronic obstructive pulmonary disease patients. The broader application of these approaches to modeling multiple levels of biological data in the context of modern physiology is discussed. Copyright © 2016 the American Physiological Society.

  13. The human vascular endothelial cell line HUV-EC-C harbors the integrated HHV-6B genome which remains stable in long term culture.

    Science.gov (United States)

    Shioda, Setsuko; Kasai, Fumio; Ozawa, Midori; Hirayama, Noriko; Satoh, Motonobu; Kameoka, Yousuke; Watanabe, Ken; Shimizu, Norio; Tang, Huamin; Mori, Yasuko; Kohara, Arihiro

    2018-02-01

    Human herpes virus 6 (HHV-6) is a common human pathogen that is most often detected in hematopoietic cells. Although human cells harboring chromosomally integrated HHV-6 can be generated in vitro, the availability of such cell lines originating from in vivo tissues is limited. In this study, chromosomally integrated HHV-6B has been identified in a human vascular endothelial cell line, HUV-EC-C (IFO50271), derived from normal umbilical cord tissue. Sequence analysis revealed that the viral genome was similar to the HHV-6B HST strain. FISH analysis using a HHV-6 DNA probe showed one signal in each cell, detected at the distal end of the long arm of chromosome 9. This was consistent with a digital PCR assay, validating one copy of the viral DNA. Because exposure of HUV-EC-C to chemicals did not cause viral reactivation, long term cell culture of HUV-EC-C was carried out to assess the stability of viral integration. The growth rate was altered depending on passage numbers, and morphology also changed during culture. SNP microarray profiles showed some differences between low and high passages, implying that the HUV-EC-C genome had changed during culture. However, no detectable change was observed in chromosome 9, where HHV-6B integration and the viral copy number remained unchanged. Our results suggest that integrated HHV-6B is stable in HUV-EC-C despite genome instability.

  14. Predictors of Chemosensitivity in Triple Negative Breast Cancer: An Integrated Genomic Analysis.

    Directory of Open Access Journals (Sweden)

    Tingting Jiang

    2016-12-01

    Full Text Available Triple negative breast cancer (TNBC is a highly heterogeneous and aggressive disease, and although no effective targeted therapies are available to date, about one-third of patients with TNBC achieve pathologic complete response (pCR from standard-of-care anthracycline/taxane (ACT chemotherapy. The heterogeneity of these tumors, however, has hindered the discovery of effective biomarkers to identify such patients.We performed whole exome sequencing on 29 TNBC cases from the MD Anderson Cancer Center (MDACC selected because they had either pCR (n = 18 or extensive residual disease (n = 11 after neoadjuvant chemotherapy, with cases from The Cancer Genome Atlas (TCGA; n = 144 and METABRIC (n = 278 cohorts serving as validation cohorts. Our analysis revealed that mutations in the AR- and FOXA1-regulated networks, in which BRCA1 plays a key role, are associated with significantly higher sensitivity to ACT chemotherapy in the MDACC cohort (pCR rate of 94.1% compared to 16.6% in tumors without mutations in AR/FOXA1 pathway, adjusted p = 0.02 and significantly better survival outcome in the TCGA TNBC cohort (log-rank test, p = 0.05. Combined analysis of DNA sequencing, DNA methylation, and RNA sequencing identified tumors of a distinct BRCA-deficient (BRCA-D TNBC subtype characterized by low levels of wild-type BRCA1/2 expression. Patients with functionally BRCA-D tumors had significantly better survival with standard-of-care chemotherapy than patients whose tumors were not BRCA-D (log-rank test, p = 0.021, and they had significantly higher mutation burden (p < 0.001 and presented clonal neoantigens that were associated with increased immune cell activity. A transcriptional signature of BRCA-D TNBC tumors was independently validated to be significantly associated with improved survival in the METABRIC dataset (log-rank test, p = 0.009. As a retrospective study, limitations include the small size and potential selection bias in the discovery cohort

  15. Integrated genomics identifies five medulloblastoma subtypes with distinct genetic profiles, pathway signatures and clinicopathological features.

    Directory of Open Access Journals (Sweden)

    Marcel Kool

    Full Text Available BACKGROUND: Medulloblastoma is the most common malignant brain tumor in children. Despite recent improvements in cure rates, prediction of disease outcome remains a major challenge and survivors suffer from serious therapy-related side-effects. Recent data showed that patients with WNT-activated tumors have a favorable prognosis, suggesting that these patients could be treated less intensively, thereby reducing the side-effects. This illustrates the potential benefits of a robust classification of medulloblastoma patients and a detailed knowledge of associated biological mechanisms. METHODS AND FINDINGS: To get a better insight into the molecular biology of medulloblastoma we established mRNA expression profiles of 62 medulloblastomas and analyzed 52 of them also by comparative genomic hybridization (CGH arrays. Five molecular subtypes were identified, characterized by WNT signaling (A; 9 cases, SHH signaling (B; 15 cases, expression of neuronal differentiation genes (C and D; 16 and 11 cases, respectively or photoreceptor genes (D and E; both 11 cases. Mutations in beta-catenin were identified in all 9 type A tumors, but not in any other tumor. PTCH1 mutations were exclusively identified in type B tumors. CGH analysis identified several fully or partly subtype-specific chromosomal aberrations. Monosomy of chromosome 6 occurred only in type A tumors, loss of 9q mostly occurred in type B tumors, whereas chromosome 17 aberrations, most common in medulloblastoma, were strongly associated with type C or D tumors. Loss of the inactivated X-chromosome was highly specific for female cases of type C, D and E tumors. Gene expression levels faithfully reflected the chromosomal copy number changes. Clinicopathological features significantly different between the 5 subtypes included metastatic disease and age at diagnosis and histology. Metastatic disease at diagnosis was significantly associated with subtypes C and D and most strongly with subtype E

  16. Autism spectrum disorders: Integration of the genome, transcriptome and the environment.

    Science.gov (United States)

    Vijayakumar, N Thushara; Judy, M V

    2016-05-15

    Autism spectrum disorders denote a series of lifelong neurodevelopmental conditions characterized by an impaired social communication profile and often repetitive, stereotyped behavior. Recent years have seen the complex genetic architecture of the disease being progressively unraveled with advancements in gene finding technology and next generation sequencing methods. However, a complete elucidation of the molecular mechanisms behind autism is necessary for potential diagnostic and therapeutic applications. A multidisciplinary approach should be adopted where the focus is not only on the 'genetics' of autism but also on the combinational roles of epigenetics, transcriptomics, immune system disruption and environmental factors that could all influence the etiopathogenesis of the disease. ASD is a clinically heterogeneous disorder with great genetic complexity; only through an integrated multidimensional effort can modern autism research progress further. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Deciphering the genomes of 16 Acanthamoeba species does not provide evidence of integration of known giant virus-associated mobile genetic elements.

    Science.gov (United States)

    Chelkha, Nisrine; Colson, Philippe; Levasseur, Anthony; La Scola, Bernard

    2018-06-02

    Giant viruses infect protozoa, especially amoebae of the genus Acanthamoeba. These viruses possess genetic elements named Mobilome. So far, this mobilome comprises provirophages which are integrated into the genome of their hosts, transpovirons, and Maverick/Polintons. Virophages replicate inside virus factories within Acanthamoeba and can decrease the infectivity of giant viruses. The virophage infecting CroV was found to be integrated in the host of CroV, Cafeteria roenbergensis, thus protecting C. roenbergensis by reduction of CroV multiplication. Because of this unique property, assessment of the mechanisms of replication of virophages and their relationship with giant viruses is a key element of this investigation. This work aimed at evaluating the presence and the dynamic of these mobile elements in sixteen Acanthamoeba genomes. No significant traces of the integration of genomes or sequences from known virophages were identified in all the available Acanthamoeba genomes. These results brought us to hypothesize that the interactions between mimiviruses and their virophages might occur through different mechanisms, or at low frequency. An additional explanation could be that our knowledge of the diversity of virophages is still very limited. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Kernel machine methods for integrative analysis of genome-wide methylation and genotyping studies.

    Science.gov (United States)

    Zhao, Ni; Zhan, Xiang; Huang, Yen-Tsung; Almli, Lynn M; Smith, Alicia; Epstein, Michael P; Conneely, Karen; Wu, Michael C

    2018-03-01

    Many large GWAS consortia are expanding to simultaneously examine the joint role of DNA methylation in addition to genotype in the same subjects. However, integrating information from both data types is challenging. In this paper, we propose a composite kernel machine regression model to test the joint epigenetic and genetic effect. Our approach works at the gene level, which allows for a common unit of analysis across different data types. The model compares the pairwise similarities in the phenotype to the pairwise similarities in the genotype and methylation values; and high correspondence is suggestive of association. A composite kernel is constructed to measure the similarities in the genotype and methylation values between pairs of samples. We demonstrate through simulations and real data applications that the proposed approach can correctly control type I error, and is more robust and powerful than using only the genotype or methylation data in detecting trait-associated genes. We applied our method to investigate the genetic and epigenetic regulation of gene expression in response to stressful life events using data that are collected from the Grady Trauma Project. Within the kernel machine testing framework, our methods allow for heterogeneity in effect sizes, nonlinear, and interactive effects, as well as rapid P-value computation. © 2017 WILEY PERIODICALS, INC.

  19. A genome-scale integration and analysis of Lactococcus lactis translation data.

    Directory of Open Access Journals (Sweden)

    Julien Racle

    Full Text Available Protein synthesis is a template polymerization process composed by three main steps: initiation, elongation, and termination. During translation, ribosomes are engaged into polysomes whose size is used for the quantitative characterization of translatome. However, simultaneous transcription and translation in the bacterial cytosol complicates the analysis of translatome data. We established a procedure for robust estimation of the ribosomal density in hundreds of genes from Lactococcus lactis polysome size measurements. We used a mechanistic model of translation to integrate the information about the ribosomal density and for the first time we estimated the protein synthesis rate for each gene and identified the rate limiting steps. Contrary to conventional considerations, we find significant number of genes to be elongation limited. This number increases during stress conditions compared to optimal growth and proteins synthesized at maximum rate are predominantly elongation limited. Consistent with bacterial physiology, we found proteins with similar rate and control characteristics belonging to the same functional categories. Under stress conditions, we found that synthesis rate of regulatory proteins is becoming comparable to proteins favored under optimal growth. These findings suggest that the coupling of metabolic states and protein synthesis is more important than previously thought.

  20. An Integrated Cell Purification and Genomics Strategy Reveals Multiple Regulators of Pancreas Development

    Science.gov (United States)

    Benitez, Cecil M.; Qu, Kun; Sugiyama, Takuya; Pauerstein, Philip T.; Liu, Yinghua; Tsai, Jennifer; Gu, Xueying; Ghodasara, Amar; Arda, H. Efsun; Zhang, Jiajing; Dekker, Joseph D.; Tucker, Haley O.; Chang, Howard Y.; Kim, Seung K.

    2014-01-01

    The regulatory logic underlying global transcriptional programs controlling development of visceral organs like the pancreas remains undiscovered. Here, we profiled gene expression in 12 purified populations of fetal and adult pancreatic epithelial cells representing crucial progenitor cell subsets, and their endocrine or exocrine progeny. Using probabilistic models to decode the general programs organizing gene expression, we identified co-expressed gene sets in cell subsets that revealed patterns and processes governing progenitor cell development, lineage specification, and endocrine cell maturation. Purification of Neurog3 mutant cells and module network analysis linked established regulators such as Neurog3 to unrecognized gene targets and roles in pancreas development. Iterative module network analysis nominated and prioritized transcriptional regulators, including diabetes risk genes. Functional validation of a subset of candidate regulators with corresponding mutant mice revealed that the transcription factors Etv1, Prdm16, Runx1t1 and Bcl11a are essential for pancreas development. Our integrated approach provides a unique framework for identifying regulatory genes and functional gene sets underlying pancreas development and associated diseases such as diabetes mellitus. PMID:25330008

  1. An integrated cell purification and genomics strategy reveals multiple regulators of pancreas development.

    Directory of Open Access Journals (Sweden)

    Cecil M Benitez

    2014-10-01

    Full Text Available The regulatory logic underlying global transcriptional programs controlling development of visceral organs like the pancreas remains undiscovered. Here, we profiled gene expression in 12 purified populations of fetal and adult pancreatic epithelial cells representing crucial progenitor cell subsets, and their endocrine or exocrine progeny. Using probabilistic models to decode the general programs organizing gene expression, we identified co-expressed gene sets in cell subsets that revealed patterns and processes governing progenitor cell development, lineage specification, and endocrine cell maturation. Purification of Neurog3 mutant cells and module network analysis linked established regulators such as Neurog3 to unrecognized gene targets and roles in pancreas development. Iterative module network analysis nominated and prioritized transcriptional regulators, including diabetes risk genes. Functional validation of a subset of candidate regulators with corresponding mutant mice revealed that the transcription factors Etv1, Prdm16, Runx1t1 and Bcl11a are essential for pancreas development. Our integrated approach provides a unique framework for identifying regulatory genes and functional gene sets underlying pancreas development and associated diseases such as diabetes mellitus.

  2. Targeted Porcine Genome Engineering with TALENs

    DEFF Research Database (Denmark)

    Luo, Yonglun; Lin, Lin; Golas, Mariola Monika

    2015-01-01

    Genetically modified pigs are becoming an invaluable animal model for agricultural, pharmaceutical, and biomedical applications. Unlike traditional transgenesis, which is accomplished by randomly inserting an exogenous transgene cassette into the natural chromosomal context, targeted genome editing...... confers precisely editing (e.g., mutations or indels) or insertion of a functional transgenic cassette to user-designed loci. Techniques for targeted genome engineering are growing dramatically and include, e.g., zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs......), and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems. These systems provide enormous potential applications. In this chapter, we review the use of TALENs for targeted genome editing with focus on their application in pigs. In addition, a brief protocol...

  3. Integrated Genomics of Crohn’s Disease Risk Variant Identifies a Role for CLEC12A in Antibacterial Autophagy

    Directory of Open Access Journals (Sweden)

    Jakob Begun

    2015-06-01

    Full Text Available The polymorphism ATG16L1 T300A, associated with increased risk of Crohn’s disease, impairs pathogen defense mechanisms including selective autophagy, but specific pathway interactions altered by the risk allele remain unknown. Here, we use perturbational profiling of human peripheral blood cells to reveal that CLEC12A is regulated in an ATG16L1-T300A-dependent manner. Antibacterial autophagy is impaired in CLEC12A-deficient cells, and this effect is exacerbated in the presence of the ATG16L1∗300A risk allele. Clec12a−/− mice are more susceptible to Salmonella infection, supporting a role for CLEC12A in antibacterial defense pathways in vivo. CLEC12A is recruited to sites of bacterial entry, bacteria-autophagosome complexes, and sites of sterile membrane damage. Integrated genomics identified a functional interaction between CLEC12A and an E3-ubiquitin ligase complex that functions in antibacterial autophagy. These data identify CLEC12A as early adaptor molecule for antibacterial autophagy and highlight perturbational profiling as a method to elucidate defense pathways in complex genetic disease.

  4. Integration of genomic, transcriptomic and proteomic data identifies two biologically distinct subtypes of invasive lobular breast cancer.

    Science.gov (United States)

    Michaut, Magali; Chin, Suet-Feung; Majewski, Ian; Severson, Tesa M; Bismeijer, Tycho; de Koning, Leanne; Peeters, Justine K; Schouten, Philip C; Rueda, Oscar M; Bosma, Astrid J; Tarrant, Finbarr; Fan, Yue; He, Beilei; Xue, Zheng; Mittempergher, Lorenza; Kluin, Roelof J C; Heijmans, Jeroen; Snel, Mireille; Pereira, Bernard; Schlicker, Andreas; Provenzano, Elena; Ali, Hamid Raza; Gaber, Alexander; O'Hurley, Gillian; Lehn, Sophie; Muris, Jettie J F; Wesseling, Jelle; Kay, Elaine; Sammut, Stephen John; Bardwell, Helen A; Barbet, Aurélie S; Bard, Floriane; Lecerf, Caroline; O'Connor, Darran P; Vis, Daniël J; Benes, Cyril H; McDermott, Ultan; Garnett, Mathew J; Simon, Iris M; Jirström, Karin; Dubois, Thierry; Linn, Sabine C; Gallagher, William M; Wessels, Lodewyk F A; Caldas, Carlos; Bernards, Rene

    2016-01-05

    Invasive lobular carcinoma (ILC) is the second most frequently occurring histological breast cancer subtype after invasive ductal carcinoma (IDC), accounting for around 10% of all breast cancers. The molecular processes that drive the development of ILC are still largely unknown. We have performed a comprehensive genomic, transcriptomic and proteomic analysis of a large ILC patient cohort and present here an integrated molecular portrait of ILC. Mutations in CDH1 and in the PI3K pathway are the most frequent molecular alterations in ILC. We identified two main subtypes of ILCs: (i) an immune related subtype with mRNA up-regulation of PD-L1, PD-1 and CTLA-4 and greater sensitivity to DNA-damaging agents in representative cell line models; (ii) a hormone related subtype, associated with Epithelial to Mesenchymal Transition (EMT), and gain of chromosomes 1q and 8q and loss of chromosome 11q. Using the somatic mutation rate and eIF4B protein level, we identified three groups with different clinical outcomes, including a group with extremely good prognosis. We provide a comprehensive overview of the molecular alterations driving ILC and have explored links with therapy response. This molecular characterization may help to tailor treatment of ILC through the application of specific targeted, chemo- and/or immune-therapies.

  5. Accelerating Genetic Gains in Legumes for the Development of Prosperous Smallholder Agriculture: Integrating Genomics, Phenotyping, Systems Modelling and Agronomy.

    Science.gov (United States)

    Varshney, Rajeev K; Thudi, Mahendar; Pandey, Manish K; Tardieu, Francois; Ojiewo, Chris; Vadez, Vincent; Whitbread, Anthony M; Siddique, Kadambot H M; Nguyen, Henry T; Carberry, Peter S; Bergvinson, David

    2018-03-05

    Grain legumes form an important component of the human diet, feed for livestock and replenish soil fertility through biological nitrogen fixation. Globally, the demand for food legumes is increasing as they complement cereals in protein requirements and possess a high percentage of digestible protein. Climate change has enhanced the frequency and intensity of drought stress that is posing serious production constraints, especially in rainfed regions where most legumes are produced. Genetic improvement of legumes, like other crops, is mostly based on pedigree and performance-based selection over the last half century. For achieving faster genetic gains in legumes in rainfed conditions, this review article proposes the integration of modern genomics approaches, high throughput phenomics and simulation modelling as support for crop improvement that leads to improved varieties that perform with appropriate agronomy. Selection intensity, generation interval and improved operational efficiencies in breeding are expected to further enhance the genetic gain in experiment plots. Improved seed access to farmers, combined with appropriate agronomic packages in farmers' fields, will deliver higher genetic gains. Enhanced genetic gains including not only productivity but also nutritional and market traits will increase the profitability of farmers and the availability of affordable nutritious food especially in developing countries.

  6. Integrating genomics and proteomics data to predict drug effects using binary linear programming.

    Science.gov (United States)

    Ji, Zhiwei; Su, Jing; Liu, Chenglin; Wang, Hongyan; Huang, Deshuang; Zhou, Xiaobo

    2014-01-01

    The Library of Integrated Network-Based Cellular Signatures (LINCS) project aims to create a network-based understanding of biology by cataloging changes in gene expression and signal transduction that occur when cells are exposed to a variety of perturbations. It is helpful for understanding cell pathways and facilitating drug discovery. Here, we developed a novel approach to infer cell-specific pathways and identify a compound's effects using gene expression and phosphoproteomics data under treatments with different compounds. Gene expression data were employed to infer potential targets of compounds and create a generic pathway map. Binary linear programming (BLP) was then developed to optimize the generic pathway topology based on the mid-stage signaling response of phosphorylation. To demonstrate effectiveness of this approach, we built a generic pathway map for the MCF7 breast cancer cell line and inferred the cell-specific pathways by BLP. The first group of 11 compounds was utilized to optimize the generic pathways, and then 4 compounds were used to identify effects based on the inferred cell-specific pathways. Cross-validation indicated that the cell-specific pathways reliably predicted a compound's effects. Finally, we applied BLP to re-optimize the cell-specific pathways to predict the effects of 4 compounds (trichostatin A, MS-275, staurosporine, and digoxigenin) according to compound-induced topological alterations. Trichostatin A and MS-275 (both HDAC inhibitors) inhibited the downstream pathway of HDAC1 and caused cell growth arrest via activation of p53 and p21; the effects of digoxigenin were totally opposite. Staurosporine blocked the cell cycle via p53 and p21, but also promoted cell growth via activated HDAC1 and its downstream pathway. Our approach was also applied to the PC3 prostate cancer cell line, and the cross-validation analysis showed very good accuracy in predicting effects of 4 compounds. In summary, our computational model can be

  7. Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak.

    Science.gov (United States)

    Snitkin, Evan S; Won, Sarah; Pirani, Ali; Lapp, Zena; Weinstein, Robert A; Lolans, Karen; Hayden, Mary K

    2017-11-22

    Development of effective strategies to limit the proliferation of multidrug-resistant organisms requires a thorough understanding of how such organisms spread among health care facilities. We sought to uncover the chains of transmission underlying a 2008 U.S. regional outbreak of carbapenem-resistant Klebsiella pneumoniae by performing an integrated analysis of genomic and interfacility patient-transfer data. Genomic analysis yielded a high-resolution transmission network that assigned directionality to regional transmission events and discriminated between intra- and interfacility transmission when epidemiologic data were ambiguous or misleading. Examining the genomic transmission network in the context of interfacility patient transfers (patient-sharing networks) supported the role of patient transfers in driving the outbreak, with genomic analysis revealing that a small subset of patient-transfer events was sufficient to explain regional spread. Further integration of the genomic and patient-sharing networks identified one nursing home as an important bridge facility early in the outbreak-a role that was not apparent from analysis of genomic or patient-transfer data alone. Last, we found that when simulating a real-time regional outbreak, our methodology was able to accurately infer the facility at which patients acquired their infections. This approach has the potential to identify facilities with high rates of intra- or interfacility transmission, data that will be useful for triggering targeted interventions to prevent further spread of multidrug-resistant organisms. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  8. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  9. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    Science.gov (United States)

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  10. Integration of mouse and human genome-wide association data identifies KCNIP4 as an asthma gene.

    Directory of Open Access Journals (Sweden)

    Blanca E Himes

    Full Text Available Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR. The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify asthma-related genes by integrating AHR associations in mouse with human genome-wide association study (GWAS data. We used Efficient Mixed Model Association (EMMA analysis to conduct a GWAS of baseline AHR measures from males and females of 31 mouse strains. Genes near or containing SNPs with EMMA p-values <0.001 were selected for further study in human GWAS. The results of the previously reported EVE consortium asthma GWAS meta-analysis consisting of 12,958 diverse North American subjects from 9 study centers were used to select a subset of homologous genes with evidence of association with asthma in humans. Following validation attempts in three human asthma GWAS (i.e., Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG and two human AHR GWAS (i.e., SHARP, DAG, the Kv channel interacting protein 4 (KCNIP4 gene was identified as nominally associated with both asthma and AHR at a gene- and SNP-level. In EVE, the smallest KCNIP4 association was at rs6833065 (P-value 2.9e-04, while the strongest associations for Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG were 1.5e-03, 1.0e-03, 3.1e-03 at rs7664617, rs4697177, rs4696975, respectively. At a SNP level, the strongest association across all asthma GWAS was at rs4697177 (P-value 1.1e-04. The smallest P-values for association with AHR were 2.3e-03 at rs11947661 in SHARP and 2.1e-03 at rs402802 in DAG. Functional studies are required to validate the potential involvement of KCNIP4 in modulating asthma susceptibility and/or AHR. Our results suggest that a useful approach to identify genes associated with human asthma is to leverage mouse AHR association data.

  11. An integrated genetic map based on four mapping populations and quantitative trait loci associated with economically important traits in watermelon (Citrullus lanatus)

    Science.gov (United States)

    2014-01-01

    Background Modern watermelon (Citrullus lanatus L.) cultivars share a narrow genetic base due to many years of selection for desirable horticultural qualities. Wild subspecies within C. lanatus are important potential sources of novel alleles for watermelon breeding, but successful trait introgression into elite cultivars has had limited success. The application of marker assisted selection (MAS) in watermelon is yet to be realized, mainly due to the past lack of high quality genetic maps. Recently, a number of useful maps have become available, however these maps have few common markers, and were constructed using different marker sets, thus, making integration and comparative analysis among maps difficult. The objective of this research was to use single-nucleotide polymorphism (SNP) anchor markers to construct an integrated genetic map for C. lanatus. Results Under the framework of the high density genetic map, an integrated genetic map was constructed by merging data from four independent mapping experiments using a genetically diverse array of parental lines, which included three subspecies of watermelon. The 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel), 36 structure variation (SV) and 386 SNP markers from the four maps were used to construct an integrated map. This integrated map contained 1339 markers, spanning 798 cM with an average marker interval of 0.6 cM. Fifty-eight previously reported quantitative trait loci (QTL) for 12 traits in these populations were also integrated into the map. In addition, new QTL identified for brix, fructose, glucose and sucrose were added. Some QTL associated with economically important traits detected in different genetic backgrounds mapped to similar genomic regions of the integrated map, suggesting that such QTL are responsible for the phenotypic variability observed in a broad array of watermelon germplasm. Conclusions The integrated map described herein enhances the utility of genomic tools over

  12. An integrated genetic map based on four mapping populations and quantitative trait loci associated with economically important traits in watermelon (Citrullus lanatus).

    Science.gov (United States)

    Ren, Yi; McGregor, Cecilia; Zhang, Yan; Gong, Guoyi; Zhang, Haiying; Guo, Shaogui; Sun, Honghe; Cai, Wantao; Zhang, Jie; Xu, Yong

    2014-01-20

    Modern watermelon (Citrullus lanatus L.) cultivars share a narrow genetic base due to many years of selection for desirable horticultural qualities. Wild subspecies within C. lanatus are important potential sources of novel alleles for watermelon breeding, but successful trait introgression into elite cultivars has had limited success. The application of marker assisted selection (MAS) in watermelon is yet to be realized, mainly due to the past lack of high quality genetic maps. Recently, a number of useful maps have become available, however these maps have few common markers, and were constructed using different marker sets, thus, making integration and comparative analysis among maps difficult. The objective of this research was to use single-nucleotide polymorphism (SNP) anchor markers to construct an integrated genetic map for C. lanatus. Under the framework of the high density genetic map, an integrated genetic map was constructed by merging data from four independent mapping experiments using a genetically diverse array of parental lines, which included three subspecies of watermelon. The 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel), 36 structure variation (SV) and 386 SNP markers from the four maps were used to construct an integrated map. This integrated map contained 1339 markers, spanning 798 cM with an average marker interval of 0.6 cM. Fifty-eight previously reported quantitative trait loci (QTL) for 12 traits in these populations were also integrated into the map. In addition, new QTL identified for brix, fructose, glucose and sucrose were added. Some QTL associated with economically important traits detected in different genetic backgrounds mapped to similar genomic regions of the integrated map, suggesting that such QTL are responsible for the phenotypic variability observed in a broad array of watermelon germplasm. The integrated map described herein enhances the utility of genomic tools over previous watermelon genetic maps. A

  13. A novel Sulfolobus non-conjugative extrachromosomal genetic element capable of integration into the host genome and spreading in the presence of a fusellovirus

    DEFF Research Database (Denmark)

    Wang, Ying; Duan, Zhenhong; Zhu, Haojun

    2007-01-01

    An integrative non-conjugative extrachromosomal genetic element, denoted as pSSVi, has been isolated from a Sulfolobus solfataricus P2 strain and was characterized. This genetic element is a double-stranded DNA of 5740 bp in size and contains eight open reading frames (ORFs). It resembles members....... Interestingly, pSSVi encodes an SSV-type integrase which probably catalyzes the integration of its genome into a specific site (a tRNA(Arg) gene) in the S. solfataricus P2 genome. Like pSSVx, pSSVi can be packaged into a spindle-like viral particle and spread with the help of SSV1 or SSV2. In addition, both SSV......1 and SSV2 appeared to replicate more efficiently in the presence of pSSVi. Given the versatile genetic abilities, pSSVi appears to be well suited for a role in horizontal gene transfer....

  14. Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations

    KAUST Repository

    Abdallah, Abdallah

    2015-10-21

    Although Bacillus Calmette-Guérin (BCG) vaccines against tuberculosis have been available for more than 90 years, their effectiveness has been hindered by variable protective efficacy and a lack of lasting memory responses. One factor contributing to this variability may be the diversity of the BCG strains that are used around the world, in part from genomic changes accumulated during vaccine production and their resulting differences in gene expression. We have compared the genomes and transcriptomes of a global collection of fourteen of the most widely used BCG strains at single base-pair resolution. We have also used quantitative proteomics to identify key differences in expression of proteins across five representative BCG strains of the four tandem duplication (DU) groups. We provide a comprehensive map of single nucleotide polymorphisms (SNPs), copy number variation and insertions and deletions (indels) across fourteen BCG strains. Genome-wide SNP characterization allowed the construction of a new and robust phylogenic genealogy of BCG strains. Transcriptional and proteomic profiling revealed a metabolic remodeling in BCG strains that may be reflected by altered immunogenicity and possibly vaccine efficacy. Together, these integrated-omic data represent the most comprehensive catalogue of genetic variation across a global collection of BCG strains.

  15. Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations

    KAUST Repository

    Abdallah, Abdallah; Hill-Cawthorne, Grant A.; Otto, Thomas D.; Coll, Francesc; Guerra-Assunç ã o, José Afonso; Gao, Ge; Naeem, Raeece; Ansari, Hifzur Rahman; Malas, Tareq Majed Yasin; Adroub, Sabir; Verboom, Theo; Ummels, Roy; Zhang, Huoming; Panigrahi, Aswini Kumar; McNerney, Ruth; Brosch, Roland; Clark, Taane G.; Behr, Marcel A.; Bitter, Wilbert; Pain, Arnab

    2015-01-01

    Although Bacillus Calmette-Guérin (BCG) vaccines against tuberculosis have been available for more than 90 years, their effectiveness has been hindered by variable protective efficacy and a lack of lasting memory responses. One factor contributing to this variability may be the diversity of the BCG strains that are used around the world, in part from genomic changes accumulated during vaccine production and their resulting differences in gene expression. We have compared the genomes and transcriptomes of a global collection of fourteen of the most widely used BCG strains at single base-pair resolution. We have also used quantitative proteomics to identify key differences in expression of proteins across five representative BCG strains of the four tandem duplication (DU) groups. We provide a comprehensive map of single nucleotide polymorphisms (SNPs), copy number variation and insertions and deletions (indels) across fourteen BCG strains. Genome-wide SNP characterization allowed the construction of a new and robust phylogenic genealogy of BCG strains. Transcriptional and proteomic profiling revealed a metabolic remodeling in BCG strains that may be reflected by altered immunogenicity and possibly vaccine efficacy. Together, these integrated-omic data represent the most comprehensive catalogue of genetic variation across a global collection of BCG strains.

  16. A validated pipeline for detection of SNVs and short InDels from RNA Sequencing

    Directory of Open Access Journals (Sweden)

    Nitin Mandloi

    2017-12-01

    In this study, we have developed a pipeline to detect germline variants from RNA-seq data. The pipeline steps include: pre-processing, alignment, GATK best practices for RNA-seq and variant filtering. The pre-processing step includes base and adapter trimming and removal of contamination reads from rRNA, tRNA, mitochondrial DNA and repeat regions. The read alignment of the pre-processed reads is performed using STAR/HiSAT. After this we used GATK best practices for the RNA-seq dataset to call germline variants. We benchmarked our pipeline on NA12878 RNA-seq data downloaded from SRA (SRR1258218. After variant calling, the quality passed variants were compared against the gold standard variants provided by GIAB consortium. Of the total ~3.6 million high quality variants reported as gold standard variants for this sample (considering whole genome, our pipeline identified ~58,104 variants to be expressed in RNA-seq. Our pipeline achieved more than 99% of sensitivity in detection of germline variants.

  17. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-01-01

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  18. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  19. The Genomics Education Partnership: Successful Integration of Research into Laboratory Classes at a Diverse Group of Undergraduate Institutions

    Science.gov (United States)

    Shaffer, Christopher D.; Alvarez, Consuelo; Bailey, Cheryl; Barnard, Daron; Bhalla, Satish; Chandrasekaran, Chitra; Chandrasekaran, Vidya; Chung, Hui-Min; Dorer, Douglas R.; Du, Chunguang; Eckdahl, Todd T.; Poet, Jeff L.; Frohlich, Donald; Goodman, Anya L.; Gosser, Yuying; Hauser, Charles; Hoopes, Laura L.M.; Johnson, Diana; Jones, Christopher J.; Kaehler, Marian; Kokan, Nighat; Kopp, Olga R.; Kuleck, Gary A.; McNeil, Gerard; Moss, Robert; Myka, Jennifer L.; Nagengast, Alexis; Morris, Robert; Overvoorde, Paul J.; Shoop, Elizabeth; Parrish, Susan; Reed, Kelynne; Regisford, E. Gloria; Revie, Dennis; Rosenwald, Anne G.; Saville, Ken; Schroeder, Stephanie; Shaw, Mary; Skuse, Gary; Smith, Christopher; Smith, Mary; Spana, Eric P.; Spratt, Mary; Stamm, Joyce; Thompson, Jeff S.; Wawersik, Matthew; Wilson, Barbara A.; Youngblom, Jim; Leung, Wilson; Buhler, Jeremy; Mardis, Elaine R.; Lopatto, David

    2010-01-01

    Genomics is not only essential for students to understand biology but also provides unprecedented opportunities for undergraduate research. The goal of the Genomics Education Partnership (GEP), a collaboration between a growing number of colleges and universities around the country and the Department of Biology and Genome Center of Washington University in St. Louis, is to provide such research opportunities. Using a versatile curriculum that has been adapted to many different class settings, GEP undergraduates undertake projects to bring draft-quality genomic sequence up to high quality and/or participate in the annotation of these sequences. GEP undergraduates have improved more than 2 million bases of draft genomic sequence from several species of Drosophila and have produced hundreds of gene models using evidence-based manual annotation. Students appreciate their ability to make a contribution to ongoing research, and report increased independence and a more active learning approach after participation in GEP projects. They show knowledge gains on pre- and postcourse quizzes about genes and genomes and in bioinformatic analysis. Participating faculty also report professional gains, increased access to genomics-related technology, and an overall positive experience. We have found that using a genomics research project as the core of a laboratory course is rewarding for both faculty and students. PMID:20194808

  20. Ensembl Genomes 2016: more genomes, more complexity.

    Science.gov (United States)

    Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M

    2016-01-04

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Distinct high resolution genome profiles of early onset and late onset colorectal cancer integrated with gene expression data identify candidate susceptibility loci

    Directory of Open Access Journals (Sweden)

    Merok Marianne A

    2010-05-01

    Full Text Available Abstract Background Estimates suggest that up to 30% of colorectal cancers (CRC may develop due to an increased genetic risk. The mean age at diagnosis for CRC is about 70 years. Time of disease onset 20 years younger than the mean age is assumed to be indicative of genetic susceptibility. We have compared high resolution tumor genome copy number variation (CNV (Roche NimbleGen, 385 000 oligo CGH array in microsatellite stable (MSS tumors from two age groups, including 23 young at onset patients without known hereditary syndromes and with a median age of 44 years (range: 28-53 and 17 elderly patients with median age 79 years (range: 69-87. Our aim was to identify differences in the tumor genomes between these groups and pinpoint potential susceptibility loci. Integration analysis of CNV and genome wide mRNA expression data, available for the same tumors, was performed to identify a restricted candidate gene list. Results The total fraction of the genome with aberrant copy number, the overall genomic profile and the TP53 mutation spectrum were similar between the two age groups. However, both the number of chromosomal aberrations and the number of breakpoints differed significantly between the groups. Gains of 2q35, 10q21.3-22.1, 10q22.3 and 19q13.2-13.31 and losses from 1p31.3, 1q21.1, 2q21.2, 4p16.1-q28.3, 10p11.1 and 19p12, positions that in total contain more than 500 genes, were found significantly more often in the early onset group as compared to the late onset group. Integration analysis revealed a covariation of DNA copy number at these sites and mRNA expression for 107 of the genes. Seven of these genes, CLC, EIF4E, LTBP4, PLA2G12A, PPAT, RG9MTD2, and ZNF574, had significantly different mRNA expression comparing median expression levels across the transcriptome between the two groups. Conclusions Ten genomic loci, containing more than 500 protein coding genes, are identified as more often altered in tumors from early onset versus late

  2. Roles of CcrA and CcrB in Excision and Integration of Staphylococcal Cassette Chromosome mec, a Staphylococcus aureus Genomic Island▿

    OpenAIRE

    Wang, Lei; Archer, Gordon L.

    2010-01-01

    The gene encoding resistance to methicillin and other β-lactam antibiotics in staphylococci, mecA, is carried on a genomic island, SCCmec (for staphylococcal cassette chromosome mec). The chromosomal excision and integration of types I to IV SCCmec are catalyzed by the site-specific recombinases CcrA and CcrB, the genes for which are encoded on each element. We sought to identify the relative contributions of CcrA and CcrB in the excision and integration of SCCmec. Purified CcrB but not CcrA ...

  3. MelanomaDB: a Web Tool for Integrative Analysis of Melanoma Genomic Information to Identify Disease-Associated Molecular Pathways

    Directory of Open Access Journals (Sweden)

    Alexander Joseph Trevarton

    2013-07-01

    Full Text Available Despite on-going research, metastatic melanoma survival rates remain low and treatment options are limited. Researchers can now access a rapidly growing amount of molecular and clinical information about melanoma. This information is becoming difficult to assemble and interpret due to its dispersed nature, yet as it grows it becomes increasingly valuable for understanding melanoma. Integration of this information into a comprehensive resource to aid rational experimental design and patient stratification is needed. As an initial step in this direction, we have assembled a web-accessible melanoma database, MelanomaDB, which incorporates clinical and molecular data from publically available sources, which will be regularly updated as new information becomes available. This database allows complex links to be drawn between many different aspects of melanoma biology: genetic changes (e.g. mutations in individual melanomas revealed by DNA sequencing, associations between gene expression and patient survival, data concerning drug targets, biomarkers, druggability and clinical trials, as well as our own statistical analysis of relationships between molecular pathways and clinical parameters that have been produced using these data sets. The database is freely available at http://genesetdb.auckland.ac.nz/melanomadb/about.html . A subset of the information in the database can also be accessed through a freely available web application in the Illumina genomic cloud computing platform BaseSpace at http://www.biomatters.com/apps/melanoma-profiler-for-research . This illustrates dysregulation of specific signalling pathways, both across 310 exome-sequenced melanomas and in individual tumours and identifies novel features about the distribution of somatic variants in melanoma. We suggest that this database can provide a context in which to interpret the tumour molecular profiles of individual melanoma patients relative to biological information and available

  4. Complete Genome Sequence of Germline Chromosomally Integrated Human Herpesvirus 6A and Analyses Integration Sites Define a New Human Endogenous Virus with Potential to Reactivate as an Emerging Infection.

    Science.gov (United States)

    Tweedy, Joshua; Spyrou, Maria Alexandra; Pearson, Max; Lassner, Dirk; Kuhl, Uwe; Gompels, Ursula A

    2016-01-15

    Human herpesvirus-6A and B (HHV-6A, HHV-6B) have recently defined endogenous genomes, resulting from integration into the germline: chromosomally-integrated "CiHHV-6A/B". These affect approximately 1.0% of human populations, giving potential for virus gene expression in every cell. We previously showed that CiHHV-6A was more divergent than CiHHV-6B by examining four genes in 44 European CiHHV-6A/B cardiac/haematology patients. There was evidence for gene expression/reactivation, implying functional non-defective genomes. To further define the relationship between HHV-6A and CiHHV-6A we used next-generation sequencing to characterize genomes from three CiHHV-6A cardiac patients. Comparisons to known exogenous HHV-6A showed CiHHV-6A genomes formed a separate clade; including all 85 non-interrupted genes and necessary cis-acting signals for reactivation as infectious virus. Greater single nucleotide polymorphism (SNP) density was defined in 16 genes and the direct repeats (DR) terminal regions. Using these SNPs, deep sequencing analyses demonstrated superinfection with exogenous HHV-6A in two of the CiHHV-6A patients with recurrent cardiac disease. Characterisation of the integration sites in twelve patients identified the human chromosome 17p subtelomere as a prevalent site, which had specific repeat structures and phylogenetically related CiHHV-6A coding sequences indicating common ancestral origins. Overall CiHHV-6A genomes were similar, but distinct from known exogenous HHV-6A virus, and have the capacity to reactivate as emerging virus infections.

  5. Complete Genome Sequence of Germline Chromosomally Integrated Human Herpesvirus 6A and Analyses Integration Sites Define a New Human Endogenous Virus with Potential to Reactivate as an Emerging Infection

    Science.gov (United States)

    Tweedy, Joshua; Spyrou, Maria Alexandra; Pearson, Max; Lassner, Dirk; Kuhl, Uwe; Gompels, Ursula A.

    2016-01-01

    Human herpesvirus-6A and B (HHV-6A, HHV-6B) have recently defined endogenous genomes, resulting from integration into the germline: chromosomally-integrated “CiHHV-6A/B”. These affect approximately 1.0% of human populations, giving potential for virus gene expression in every cell. We previously showed that CiHHV-6A was more divergent than CiHHV-6B by examining four genes in 44 European CiHHV-6A/B cardiac/haematology patients. There was evidence for gene expression/reactivation, implying functional non-defective genomes. To further define the relationship between HHV-6A and CiHHV-6A we used next-generation sequencing to characterize genomes from three CiHHV-6A cardiac patients. Comparisons to known exogenous HHV-6A showed CiHHV-6A genomes formed a separate clade; including all 85 non-interrupted genes and necessary cis-acting signals for reactivation as infectious virus. Greater single nucleotide polymorphism (SNP) density was defined in 16 genes and the direct repeats (DR) terminal regions. Using these SNPs, deep sequencing analyses demonstrated superinfection with exogenous HHV-6A in two of the CiHHV-6A patients with recurrent cardiac disease. Characterisation of the integration sites in twelve patients identified the human chromosome 17p subtelomere as a prevalent site, which had specific repeat structures and phylogenetically related CiHHV-6A coding sequences indicating common ancestral origins. Overall CiHHV-6A genomes were similar, but distinct from known exogenous HHV-6A virus, and have the capacity to reactivate as emerging virus infections. PMID:26784220

  6. Integrative proteomics, genomics, and translational immunology approaches reveal mutated forms of Proteolipid Protein 1 (PLP1) and mutant-specific immune response in multiple sclerosis.

    Science.gov (United States)

    Qendro, Veneta; Bugos, Grace A; Lundgren, Debbie H; Glynn, John; Han, May H; Han, David K

    2017-03-01

    In order to gain mechanistic insights into multiple sclerosis (MS) pathogenesis, we utilized a multi-dimensional approach to test the hypothesis that mutations in myelin proteins lead to immune activation and central nervous system autoimmunity in MS. Mass spectrometry-based proteomic analysis of human MS brain lesions revealed seven unique mutations of PLP1; a key myelin protein that is known to be destroyed in MS. Surprisingly, in-depth genomic analysis of two MS patients at the genomic DNA and mRNA confirmed mutated PLP1 in RNA, but not in the genomic DNA. Quantification of wild type and mutant PLP RNA levels by qPCR further validated the presence of mutant PLP RNA in the MS patients. To seek evidence linking mutations in abundant myelin proteins and immune-mediated destruction of myelin, specific immune response against mutant PLP1 in MS patients was examined. Thus, we have designed paired, wild type and mutant peptide microarrays, and examined antibody response to multiple mutated PLP1 in sera from MS patients. Consistent with the idea of different patients exhibiting unique mutation profiles, we found that 13 out of 20 MS patients showed antibody responses against specific but not against all the mutant-PLP1 peptides. Interestingly, we found mutant PLP-directed antibody response against specific mutant peptides in the sera of pre-MS controls. The results from integrative proteomic, genomic, and immune analyses reveal a possible mechanism of mutation-driven pathogenesis in human MS. The study also highlights the need for integrative genomic and proteomic analyses for uncovering pathogenic mechanisms of human diseases. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Forensic performance of Investigator DIPplex indels genotyping kit in native, immigrant, and admixed populations in South Africa.

    Science.gov (United States)

    Hefke, Gwynneth; Davison, Sean; D'Amato, Maria Eugenia

    2015-12-01

    The utilization of binary markers in human individual identification is gaining ground in forensic genetics. We analyzed the polymorphisms from the first commercial indel kit Investigator DIPplex (Qiagen) in 512 individuals from Afrikaner, Indian, admixed Cape Colored, and the native Bantu Xhosa and Zulu origin in South Africa and evaluated forensic and population genetics parameters for their forensic application in South Africa. The levels of genetic diversity in population and forensic parameters in South Africa are similar to other published data, with lower diversity values for the native Bantu. Departures from Hardy-Weinberg expectations were observed in HLD97 in Indians, Admixed and Bantus, along with 6.83% null homozygotes in the Bantu populations. Sequencing of the flanking regions showed a previously reported transition G>A in rs17245568. Strong population structure was detected with Fst, AMOVA, and the Bayesian unsupervised clustering method in STRUCTURE. Therefore we evaluated the efficiency of individual assignments to population groups using the ancestral membership proportions from STRUCTURE and the Bayesian classification algorithm in Snipper App Suite. Both methods showed low cross-assignment error (0-4%) between Bantus and either Afrikaners or Indians. The differentiation between populations seems to be driven by four loci under positive selection pressure. Based on these results, we draw recommendations for the application of this kit in SA. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Autosomal InDel polymorphisms for population genetic structure and differentiation analysis of Chinese Kazak ethnic group

    Science.gov (United States)

    Kong, Tingting; Chen, Yahao; Guo, Yuxin; Wei, Yuanyuan; Jin, Xiaoye; Xie, Tong; Mu, Yuling; Dong, Qian; Wen, Shaoqing; Zhou, Boyan; Zhang, Li; Shen, Chunmei; Zhu, Bofeng

    2017-01-01

    In the present study, we assessed the genetic diversities of the Chinese Kazak ethnic group on the basis of 30 well-chosen autosomal insertion and deletion loci and explored the genetic relationships between Kazak and 23 reference groups. We detected the level of the expected heterozygosity ranging from 0.3605 at HLD39 locus to 0.5000 at HLD136 locus and the observed heterozygosity ranging from 0.3548 at HLD39 locus to 0.5283 at HLD136 locus. The combined power of discrimination and the combined power of exclusion for all 30 loci in the studied Kazak group were 0.999999999999128 and 0.9945, respectively. The dataset generated in this study indicated the panel of 30 InDels was highly efficient in forensic individual identifcation but may not have enough power in paternity cases. The results of the interpopulation differentiations, PCA plots, phylogenetic trees and STRUCTURE analyses showed a close genetic affiliation between the Kazak and Uigur group. PMID:28915619

  9. Individual capacity for DNA repair and maintenance of genomic integrity: a fertile ground for studies in the field of assisted reproduction

    Directory of Open Access Journals (Sweden)

    Radoslava Vazharova

    2016-05-01

    Full Text Available Many factors may affect the chances for successful pregnancy, especially at a later age. Fertility evaluations including genetic analysis are recommended to couples that have not achieved pregnancy within 6–12 months of unprotected intercourse. This review discusses some of the common polymorphisms in genes coding for proteins functioning in DNA damage identification and repair and maintenance of genomic integrity that may affect the chances of success in natural conception as well as in assisted reproduction (AR. Common polymorphisms in genes coding for proteins functioning in DNA damage identification and repair and maintenance of genomic integrity may affect the chances of success in assisted reproduction as well as in natural conception. The effects of carriership of different alleles of key genes of DNA repair may have differential effects in men and women and at different ages, suggesting complex interactions with the mechanisms controlling cell and tissue aging and programmed cell death. Future studies in the field are needed in order to elucidate the genotype–phenotype relationships and to translate the knowledge about individual repair capacity and maintenance of genomic integrity to potential clinical applications. Abbreviations: aCGH: microarray-based comparative genomic hybridization; AR: assisted reproduction; ATM: ataxia-telangiectasia mutated; ATP: adenosine triphosphate; BER: base excision repair; BFE: basic fertility evaluation; DMSO: dimethyl sulfoxide; FSH: follicle-stimulating hormone; GNRHR: gonadotropin-releasing hormone receptor; HMG: high-mobility group; ICSI: intracytoplasmic sperm injection; IUI: intrauterine insemination; IVF: in vitro fertilization; LH: luteinizing hormone; LIF: leukaemia inhibitory factor; MTR: methionine synthase; MTRR: methionine synthase reductase; NGS: next-generation sequencing; NER: nucleotide excision repair; NHEJ: non-homologous end joining; PAH: polycyclic aromatic hydrocarbons; PCOS

  10. [Association analysis of SNP-63 and indel-19 variant in the calpain-10 gene with polycystic ovary syndrome in women of reproductive age].

    Science.gov (United States)

    Flores-Martínez, Silvia Esperanza; Castro-Martínez, Anna Gabriela; López-Quintero, Andrés; García-Zapién, Alejandra Guadalupe; Torres-Rodríguez, Ruth Noemí; Sánchez-Corona, José

    2015-01-01

    Polycystic ovary syndrome is a complex and heterogeneous disease involving both reproductive and metabolic problems. It has been suggested a genetic predisposition in the etiology of this syndrome. The identification of calpain-10 gene (CAPN10) as the first candidate gene for type 2 diabetes mellitus, has focused the interest in investigating their possible relation with the polycystic ovary syndrome, because this syndrome is associated with hyperinsulinemia and insulin resistance, two metabolic abnormalities associated with type 2 diabetes mellitus. To investigate if there is association between the SNP-63 and the variant indel-19 of the CAPN10 gene and polycystic ovary syndrome in women of reproductive age. This study included 101 women (55 with polycystic ovary syndrome and 46 without polycystic ovary syndrome). The genetic variant indel-19 was identified by electrophoresis of the amplified fragments by PCR, and the SNP-63 by PCR-RFLP. The allele and genotype frequencies of the two variants do not differ significatly between women with polycystic ovary syndrome and control women group. The haplotype 21 (defined by the insertion allele of indel-19 variant and C allele of SNP-63) was found with higher frequency in both study groups, being more frequent in the polycystic ovary syndrome patients group, however, this difference was not statistically significant (p = 0.8353). The results suggest that SNP-63 and indel-19 variant of the CAPN10 gene do not represent a risk factor for polycystic ovary syndrome in our patients group. Copyright © 2015. Published by Masson Doyma México S.A.

  11. Complete Genome Sequence of Germline Chromosomally Integrated Human Herpesvirus 6A and Analyses Integration Sites Define a New Human Endogenous Virus with Potential to Reactivate as an Emerging Infection.

    OpenAIRE

    Tweedy, J; Spyrou, MA; Pearson, M; Lassner, D; Kuhl, U; Gompels, UA

    2016-01-01

    Human herpesvirus-6A and B (HHV-6A, HHV-6B) have recently defined endogenous genomes, resulting from integration into the germline: chromosomally-integrated "CiHHV-6A/B". These affect approximately 1.0% of human populations, giving potential for virus gene expression in every cell. We previously showed that CiHHV-6A was more divergent than CiHHV-6B by examining four genes in 44 European CiHHV-6A/B cardiac/haematology patients. There was evidence for gene expression/reactivation, imp...

  12. Complete genome sequence and integrated protein localization and interaction map for alfalfa dwarf virus, which combines properties of both cytoplasmic and nuclear plant rhabdoviruses

    Energy Technology Data Exchange (ETDEWEB)

    Bejerman, Nicolás, E-mail: n.bejerman@uq.edu.au [Instituto de Patología Vegetal (IPAVE), Centro de Investigaciones Agropecuarias (CIAP), Instituto Nacional de Tecnología Agropecuaria INTA, Camino a 60 Cuadras k 5,5, Córdoba X5020ICA (Argentina); Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072 (Australia); Giolitti, Fabián; Breuil, Soledad de; Trucco, Verónica; Nome, Claudia; Lenardon, Sergio [Instituto de Patología Vegetal (IPAVE), Centro de Investigaciones Agropecuarias (CIAP), Instituto Nacional de Tecnología Agropecuaria INTA, Camino a 60 Cuadras k 5,5, Córdoba X5020ICA (Argentina); Dietzgen, Ralf G. [Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072 (Australia)

    2015-09-15

    Summary: We have determined the full-length 14,491-nucleotide genome sequence of a new plant rhabdovirus, alfalfa dwarf virus (ADV). Seven open reading frames (ORFs) were identified in the antigenomic orientation of the negative-sense, single-stranded viral RNA, in the order 3′-N-P-P3-M-G-P6-L-5′. The ORFs are separated by conserved intergenic regions and the genome coding region is flanked by complementary 3′ leader and 5′ trailer sequences. Phylogenetic analysis of the nucleoprotein amino acid sequence indicated that this alfalfa-infecting rhabdovirus is related to viruses in the genus Cytorhabdovirus. When transiently expressed as GFP fusions in Nicotiana benthamiana leaves, most ADV proteins accumulated in the cell periphery, but unexpectedly P protein was localized exclusively in the nucleus. ADV P protein was shown to have a homotypic, and heterotypic nuclear interactions with N, P3 and M proteins by bimolecular fluorescence complementation. ADV appears unique in that it combines properties of both cytoplasmic and nuclear plant rhabdoviruses. - Highlights: • The complete genome of alfalfa dwarf virus is obtained. • An integrated localization and interaction map for ADV is determined. • ADV has a genome sequence similarity and evolutionary links with cytorhabdoviruses. • ADV protein localization and interaction data show an association with the nucleus. • ADV combines properties of both cytoplasmic and nuclear plant rhabdoviruses.

  13. Complete genome sequence and integrated protein localization and interaction map for alfalfa dwarf virus, which combines properties of both cytoplasmic and nuclear plant rhabdoviruses

    International Nuclear Information System (INIS)

    Bejerman, Nicolás; Giolitti, Fabián; Breuil, Soledad de; Trucco, Verónica; Nome, Claudia; Lenardon, Sergio; Dietzgen, Ralf G.

    2015-01-01

    Summary: We have determined the full-length 14,491-nucleotide genome sequence of a new plant rhabdovirus, alfalfa dwarf virus (ADV). Seven open reading frames (ORFs) were identified in the antigenomic orientation of the negative-sense, single-stranded viral RNA, in the order 3′-N-P-P3-M-G-P6-L-5′. The ORFs are separated by conserved intergenic regions and the genome coding region is flanked by complementary 3′ leader and 5′ trailer sequences. Phylogenetic analysis of the nucleoprotein amino acid sequence indicated that this alfalfa-infecting rhabdovirus is related to viruses in the genus Cytorhabdovirus. When transiently expressed as GFP fusions in Nicotiana benthamiana leaves, most ADV proteins accumulated in the cell periphery, but unexpectedly P protein was localized exclusively in the nucleus. ADV P protein was shown to have a homotypic, and heterotypic nuclear interactions with N, P3 and M proteins by bimolecular fluorescence complementation. ADV appears unique in that it combines properties of both cytoplasmic and nuclear plant rhabdoviruses. - Highlights: • The complete genome of alfalfa dwarf virus is obtained. • An integrated localization and interaction map for ADV is determined. • ADV has a genome sequence similarity and evolutionary links with cytorhabdoviruses. • ADV protein localization and interaction data show an association with the nucleus. • ADV combines properties of both cytoplasmic and nuclear plant rhabdoviruses

  14. Germs, genomics and global public health: How can advances in genomic sciences be integrated into public health in the developing world to deal with infectious diseases?

    Science.gov (United States)

    Pang, T

    2009-12-01

    Scientific and technological advances derived from the genomics revolution have a central role to play in dealing with continuing infectious disease threats in the developing world caused by emerging and re-emerging pathogens. These techniques, coupled with increasing knowledge of host-pathogen interactions, can assist in the early identification and containment of outbreaks as well as in the development of preventive vaccination and therapeutic interventions, including the urgent need for new antibiotics. However, the effective application of genomics technologies faces key barriers and challenges which occur at three stages: from the research to the products, from the products to individual patients, and, finally, from patients to entire populations. There needs to be an emphasis on research in areas of greatest need, in facilitating the translation of research into interventions and, finally, the effective delivery of such interventions to those in greatest need. Ultimate success will depend on bringing together science, society and policy to develop effective public health implementation strategies to provide health security and health equity for all peoples.

  15. Multiple-integrations of HPV16 genome and altered transcription of viral oncogenes and cellular genes are associated with the development of cervical cancer.

    Directory of Open Access Journals (Sweden)

    Xulian Lu

    Full Text Available The constitutive expression of the high-risk HPV E6 and E7 viral oncogenes is the major cause of cervical cancer. To comprehensively explore the composition of HPV16 early transcripts and their genomic annotation, cervical squamous epithelial tissues from 40 HPV16-infected patients were collected for analysis of papillomavirus oncogene transcripts (APOT. We observed different transcription patterns of HPV16 oncogenes in progression of cervical lesions to cervical cancer and identified one novel transcript. Multiple-integration events in the tissues of cervical carcinoma (CxCa are significantly more often than those of low-grade squamous intraepithelial lesions (LSIL and high-grade squamous intraepithelial lesions (HSIL. Moreover, most cellular genes within or near these integration sites are cancer-associated genes. Taken together, this study suggests that the multiple-integrations of HPV genome during persistent viral infection, which thereby alters the expression patterns of viral oncogenes and integration-related cellular genes, play a crucial role in progression of cervical lesions to cervix cancer.

  16. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolat