WorldWideScience

Sample records for genome variation analysis

  1. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  2. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    Science.gov (United States)

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  3. Transcriptome, methylome and genomic variations analysis of ectopic thyroid glands.

    Directory of Open Access Journals (Sweden)

    Rasha Abu-Khudir

    Full Text Available BACKGROUND: Congenital hypothyroidism from thyroid dysgenesis (CHTD is predominantly a sporadic disease characterized by defects in the differentiation, migration or growth of thyroid tissue. Of these defects, incomplete migration resulting in ectopic thyroid tissue is the most common (up to 80%. Germinal mutations in the thyroid-related transcription factors NKX2.1, FOXE1, PAX-8, and NKX2.5 have been identified in only 3% of patients with sporadic CHTD. Moreover, a survey of monozygotic twins yielded a discordance rate of 92%, suggesting that somatic events, genetic or epigenetic, probably play an important role in the etiology of CHTD. METHODOLOGY/PRINCIPAL FINDINGS: To assess the role of somatic genetic or epigenetic processes in CHTD, we analyzed gene expression, genome-wide methylation, and structural genome variations in normal versus ectopic thyroid tissue. In total, 1011 genes were more than two-fold induced or repressed. Expression array was validated by quantitative real-time RT-PCR for 100 genes. After correction for differences in thyroid activation state, 19 genes were exclusively associated with thyroid ectopy, among which genes involved in embryonic development (e.g. TXNIP and in the Wnt pathway (e.g. SFRP2 and FRZB were observed. None of the thyroid related transcription factors (FOXE1, HHEX, NKX2.1, NKX2.5 showed decreased expression, whereas PAX8 expression was associated with thyroid activation state. Finally, the expression profile was independent of promoter and CpG island methylation and of structural genome variations. CONCLUSIONS/SIGNIFICANCE: This is the first integrative molecular analysis of ectopic thyroid tissue. Ectopic thyroids show a differential gene expression compared to that of normal thyroids, although molecular basis could not be defined. Replication of this pilot study on a larger cohort could lead to unraveling the elusive cause of defective thyroid migration during embryogenesis.

  4. [Phylogenetic relationships and intraspecific variation of D-genome Aegilops L. as revealed by RAPD analysis].

    Science.gov (United States)

    Goriunova, S V; Kochieva, E Z; Chikida, N N; Pukhal'skiĭ, V A

    2004-05-01

    RAPD analysis was carried out to study the genetic variation and phylogenetic relationships of polyploid Aegilops species, which contain the D genome as a component of the alloploid genome, and diploid Aegilops tauschii, which is a putative donor of the D genome for common wheat. In total, 74 accessions of six D-genome Aegilops species were examined. The highest intraspecific variation (0.03-0.21) was observed for Ae. tauschii. Intraspecific distances between accessions ranged 0.007-0.067 in Ae. cylindrica, 0.017-0.047 in Ae. vavilovii, and 0.00-0.053 in Ae. juvenalis. Likewise, Ae. ventricosa and Ae. crassa showed low intraspecific polymorphism. The among-accession difference in alloploid Ae. ventricosa (genome DvNv) was similar to that of one parental species, Ae. uniaristata (N), and substantially lower than in the other parent, Ae. tauschii (D). The among-accession difference in Ae. cylindrica (CcDc) was considerably lower than in either parent, Ae. tauschii (D) or Ae. caudata (C). With the exception of Ae. cylindrica, all D-genome species--Ae. tauschii (D), Ae. ventricosa (DvNv), Ae. crassa (XcrDcrl and XcrDcrlDcr2), Ae. juvenalis (XjDjUj), and Ae. vavilovii (XvaDvaSva)--formed a single polymorphic cluster, which was distinct from clusters of other species. The only exception, Ae. cylindrica, did not group with the other D-genome species, but clustered with Ae. caudata (C), a donor of the C genome. The cluster of these two species was clearly distinct from the cluster of the other D-genome species and close to a cluster of Ae. umbellulata (genome U) and Ae. ovata (genome UgMg). Thus, RAPD analysis for the first time was used to estimate and to compare the interpopulation polymorphism and to establish the phylogenetic relationships of all diploid and alloploid D-genome Aegilops species.

  5. Genomic analysis of QTLs and genes altering natural variation in stochastic noise.

    Science.gov (United States)

    Jimenez-Gomez, Jose M; Corwin, Jason A; Joseph, Bindu; Maloof, Julin N; Kliebenstein, Daniel J

    2011-09-01

    Quantitative genetic analysis has long been used to study how natural variation of genotype can influence an organism's phenotype. While most studies have focused on genetic determinants of phenotypic average, it is rapidly becoming understood that stochastic noise is genetically determined. However, it is not known how many traits display genetic control of stochastic noise nor how broadly these stochastic loci are distributed within the genome. Understanding these questions is critical to our understanding of quantitative traits and how they relate to the underlying causal loci, especially since stochastic noise may be directly influenced by underlying changes in the wiring of regulatory networks. We identified QTLs controlling natural variation in stochastic noise of glucosinolates, plant defense metabolites, as well as QTLs for stochastic noise of related transcripts. These loci included stochastic noise QTLs unique for either transcript or metabolite variation. Validation of these loci showed that genetic polymorphism within the regulatory network alters stochastic noise independent of effects on corresponding average levels. We examined this phenomenon more globally, using transcriptomic datasets, and found that the Arabidopsis transcriptome exhibits significant, heritable differences in stochastic noise. Further analysis allowed us to identify QTLs that control genomic stochastic noise. Some genomic QTL were in common with those altering average transcript abundance, while others were unique to stochastic noise. Using a single isogenic population, we confirmed that natural variation at ELF3 alters stochastic noise in the circadian clock and metabolism. Since polymorphisms controlling stochastic noise in genomic phenotypes exist within wild germplasm for naturally selected phenotypes, this suggests that analysis of Arabidopsis evolution should account for genetic control of stochastic variance and average phenotypes. It remains to be determined if natural

  6. Genomic analysis of natural selection and phenotypic variation in high-altitude mongolians.

    Directory of Open Access Journals (Sweden)

    Jinchuan Xing

    Full Text Available Deedu (DU Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR, neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1, as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG. Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1 shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments.

  7. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    Science.gov (United States)

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  8. ViVar: a comprehensive platform for the analysis and visualization of structural genomic variation.

    Science.gov (United States)

    Sante, Tom; Vergult, Sarah; Volders, Pieter-Jan; Kloosterman, Wigard P; Trooskens, Geert; De Preter, Katleen; Dheedene, Annelies; Speleman, Frank; De Meyer, Tim; Menten, Björn

    2014-01-01

    Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end/single-read sequencing has become an important technique for the detection and exploration of structural variation. Several analysis tools exist to handle different parts and aspects of such sequencing based structural variation analyses pipelines. A comprehensive analysis platform to handle all steps, from processing the sequencing data, to the discovery and visualization of structural variants, is missing. The ViVar platform is built to handle the discovery of structural variants, from Depth Of Coverage analysis, aberrant read pair clustering to split read analysis. ViVar provides you with powerful visualization options, enables easy reporting of results and better usability and data management. The platform facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data, enabling the rapid identification of disease loci or genes. ViVar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe and is easy expandable as analysis techniques advance. URL: https://www.cmgg.be/vivar/

  9. ViVar: a comprehensive platform for the analysis and visualization of structural genomic variation.

    Directory of Open Access Journals (Sweden)

    Tom Sante

    Full Text Available Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end/single-read sequencing has become an important technique for the detection and exploration of structural variation. Several analysis tools exist to handle different parts and aspects of such sequencing based structural variation analyses pipelines. A comprehensive analysis platform to handle all steps, from processing the sequencing data, to the discovery and visualization of structural variants, is missing. The ViVar platform is built to handle the discovery of structural variants, from Depth Of Coverage analysis, aberrant read pair clustering to split read analysis. ViVar provides you with powerful visualization options, enables easy reporting of results and better usability and data management. The platform facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data, enabling the rapid identification of disease loci or genes. ViVar allows you to scale your analysis with your work load over multiple (cloud servers, has user access control to keep your data safe and is easy expandable as analysis techniques advance. URL: https://www.cmgg.be/vivar/

  10. Analysis of genetic variation and potential applications in genome-scale metabolic modeling

    Directory of Open Access Journals (Sweden)

    João Gonçalo Rocha Cardoso

    2015-02-01

    Full Text Available Genetic variation is the motor of evolution and allows organisms to overcome the environmental challenges they encounter. It can be both beneficial and harmful in the process of engineering cell factories for the production of proteins and chemicals. Throughout the history of biotechnology, there have been efforts to exploit genetic variation in our favor to create strains with favorable phenotypes. Genetic variation can either be present in natural populations or it can be artificially created by mutagenesis and selection or adaptive laboratory evolution. On the other hand, unintended genetic variation during a long term production process may lead to significant economic losses and it is important to understand how to control this type of variation. With the emergence of next-generation sequencing technologies, genetic variation in microbial strains can now be determined on an unprecedented scale and resolution by re-sequencing thousands of strains systematically. In this article, we review challenges in the integration and analysis of large-scale re-sequencing data, present an extensive overview of bioinformatics methods for predicting the effects of genetic variants on protein function, and discuss approaches for interfacing existing bioinformatics approaches with genome-scale models of cellular processes in order to predict effects of sequence variation on cellular phenotypes.

  11. Genome-wide analysis of copy number variation in type 1 diabetes.

    Directory of Open Access Journals (Sweden)

    Britney L Grayson

    Full Text Available Type 1 diabetes (T1D tends to cluster in families, suggesting there may be a genetic component predisposing to disease. However, a recent large-scale genome-wide association study concluded that identified genetic factors, single nucleotide polymorphisms, do not account for overall familiality. Another class of genetic variation is the amplification or deletion of >1 kilobase segments of the genome, also termed copy number variations (CNVs. We performed genome-wide CNV analysis on a cohort of 20 unrelated adults with T1D and a control (Ctrl cohort of 20 subjects using the Affymetrix SNP Array 6.0 in combination with the Birdsuite copy number calling software. We identified 39 CNVs as enriched or depleted in T1D versus Ctrl. Additionally, we performed CNV analysis in a group of 10 monozygotic twin pairs discordant for T1D. Eleven of these 39 CNVs were also respectively enriched or depleted in the Twin cohort, suggesting that these variants may be involved in the development of islet autoimmunity, as the presently unaffected twin is at high risk for developing islet autoimmunity and T1D in his or her lifetime. These CNVs include a deletion on chromosome 6p21, near an HLA-DQ allele. CNVs were found that were both enriched or depleted in patients with or at high risk for developing T1D. These regions may represent genetic variants contributing to development of islet autoimmunity in T1D.

  12. Structural variations in pig genomes

    NARCIS (Netherlands)

    Paudel, Y.

    2015-01-01

    Abstract Paudel, Y. (2015). Structural variations in pig genomes. PhD thesis, Wageningen University, the Netherlands Structural variations are chromosomal rearrangements such as insertions-deletions (INDELs), duplications, inversions, translocations, and copy number variations (CNVs

  13. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control.

    Science.gov (United States)

    Bartha, István; Carlson, Jonathan M; Brumme, Chanson J; McLaren, Paul J; Brumme, Zabrina L; John, Mina; Haas, David W; Martinez-Picado, Javier; Dalmau, Judith; López-Galíndez, Cecilio; Casado, Concepción; Rauch, Andri; Günthard, Huldrych F; Bernasconi, Enos; Vernazza, Pietro; Klimkait, Thomas; Yerly, Sabine; O'Brien, Stephen J; Listgarten, Jennifer; Pfeifer, Nico; Lippert, Christoph; Fusi, Nicolo; Kutalik, Zoltán; Allen, Todd M; Müller, Viktor; Harrigan, P Richard; Heckerman, David; Telenti, Amalio; Fellay, Jacques

    2013-10-29

    HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (pgenome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.

  14. A genome-wide analysis of putative functional and exonic variation associated with extremely high intelligence.

    Science.gov (United States)

    Spain, S L; Pedroso, I; Kadeva, N; Miller, M B; Iacono, W G; McGue, M; Stergiakouli, E; Smith, G D; Putallaz, M; Lubinski, D; Meaburn, E L; Plomin, R; Simpson, M A

    2016-08-01

    Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case-control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence.

  15. [RAPD analysis of the intraspecific and interspecific variation and phylogenetic relationships of Aegilops L. species with the U genome].

    Science.gov (United States)

    Goriunova, S V; Chikida, N N; Kochieva, E Z

    2010-07-01

    RAPD analysis was used to study the genetic variation and phylogenetic relationships of polyploid Aegilops species with the U genome. In total, 115 DNA samples of eight polyploid species containing the U genome and the diploid species Ae. umbellulata (U) were examined. Substantial interspecific polymorphism was observed for the majority of the polyploid species with the U genome (interspecific differences, 0.01-0,2; proportion of polymorphic loci, 56.6-88.2%). Aegilops triuncialis was identified as the only alloploid species with low interspecific polymorphism (interspecific differences, 0-0.01, P = 50%) in the U-genome group. The U-genome Aegilops species proved to be separated from other species of the genus. The phylogenetic relationships were established for the U-genome species. The greatest separation within the U-genome group was observed for the US-genome species Ae. kotschyi and Ae. variabilis. The tetraploid species Ae. triaristata and Ae. columnaris, which had the UX genome, and the hexaploid species Ae. recta (UXN) were found to be related to each other and separate from the UM-genome species. A similarity was observed between the U M-genome species Ae. ovata and Ae. biuncialis, which had the UM genome, and the ancestral diploid U-genome species Ae. umbellulata. The UC-genome species Ae. triuncialis was rather separate and slightly similar to the UX-genome species.

  16. FROG - Fingerprinting Genomic Variation Ontology.

    Directory of Open Access Journals (Sweden)

    E Abinaya

    Full Text Available Genetic variations play a crucial role in differential phenotypic outcomes. Given the complexity in establishing this correlation and the enormous data available today, it is imperative to design machine-readable, efficient methods to store, label, search and analyze this data. A semantic approach, FROG: "FingeRprinting Ontology of Genomic variations" is implemented to label variation data, based on its location, function and interactions. FROG has six levels to describe the variation annotation, namely, chromosome, DNA, RNA, protein, variations and interactions. Each level is a conceptual aggregation of logically connected attributes each of which comprises of various properties for the variant. For example, in chromosome level, one of the attributes is location of variation and which has two properties, allosomes or autosomes. Another attribute is variation kind which has four properties, namely, indel, deletion, insertion, substitution. Likewise, there are 48 attributes and 278 properties to capture the variation annotation across six levels. Each property is then assigned a bit score which in turn leads to generation of a binary fingerprint based on the combination of these properties (mostly taken from existing variation ontologies. FROG is a novel and unique method designed for the purpose of labeling the entire variation data generated till date for efficient storage, search and analysis. A web-based platform is designed as a test case for users to navigate sample datasets and generate fingerprints. The platform is available at http://ab-openlab.csir.res.in/frog.

  17. Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora.

    Science.gov (United States)

    Steige, Kim A; Laenen, Benjamin; Reimegård, Johan; Scofield, Douglas G; Slotte, Tanja

    2017-01-31

    Understanding the causes of cis-regulatory variation is a long-standing aim in evolutionary biology. Although cis-regulatory variation has long been considered important for adaptation, we still have a limited understanding of the selective importance and genomic determinants of standing cis-regulatory variation. To address these questions, we studied the prevalence, genomic determinants, and selective forces shaping cis-regulatory variation in the outcrossing plant Capsella grandiflora We first identified a set of 1,010 genes with common cis-regulatory variation using analyses of allele-specific expression (ASE). Population genomic analyses of whole-genome sequences from 32 individuals showed that genes with common cis-regulatory variation (i) are under weaker purifying selection and (ii) undergo less frequent positive selection than other genes. We further identified genomic determinants of cis-regulatory variation. Gene body methylation (gbM) was a major factor constraining cis-regulatory variation, whereas presence of nearby transposable elements (TEs) and tissue specificity of expression increased the odds of ASE. Our results suggest that most common cis-regulatory variation in C. grandiflora is under weak purifying selection, and that gene-specific functional constraints are more important for the maintenance of cis-regulatory variation than genome-scale variation in the intensity of selection. Our results agree with previous findings that suggest TE silencing affects nearby gene expression, and provide evidence for a link between gbM and cis-regulatory constraint, possibly reflecting greater dosage sensitivity of body-methylated genes. Given the extensive conservation of gbM in flowering plants, this suggests that gbM could be an important predictor of cis-regulatory variation in a wide range of plant species.

  18. Integrated analysis of copy number variation and genome-wide expression profiling in colorectal cancer tissues.

    Science.gov (United States)

    Ali Hassan, Nur Zarina; Mokhtar, Norfilza Mohd; Kok Sin, Teow; Mohamed Rose, Isa; Sagap, Ismail; Harun, Roslan; Jamal, Rahman

    2014-01-01

    Integrative analyses of multiple genomic datasets for selected samples can provide better insight into the overall data and can enhance our knowledge of cancer. The objective of this study was to elucidate the association between copy number variation (CNV) and gene expression in colorectal cancer (CRC) samples and their corresponding non-cancerous tissues. Sixty-four paired CRC samples from the same patients were subjected to CNV profiling using the Illumina HumanOmni1-Quad assay, and validation was performed using multiplex ligation probe amplification method. Genome-wide expression profiling was performed on 15 paired samples from the same group of patients using the Affymetrix Human Gene 1.0 ST array. Significant genes obtained from both array results were then overlapped. To identify molecular pathways, the data were mapped to the KEGG database. Whole genome CNV analysis that compared primary tumor and non-cancerous epithelium revealed gains in 1638 genes and losses in 36 genes. Significant gains were mostly found in chromosome 20 at position 20q12 with a frequency of 45.31% in tumor samples. Examples of genes that were associated at this cytoband were PTPRT, EMILIN3 and CHD6. The highest number of losses was detected at chromosome 8, position 8p23.2 with 17.19% occurrence in all tumor samples. Among the genes found at this cytoband were CSMD1 and DLC1. Genome-wide expression profiling showed 709 genes to be up-regulated and 699 genes to be down-regulated in CRC compared to non-cancerous samples. Integration of these two datasets identified 56 overlapping genes, which were located in chromosomes 8, 20 and 22. MLPA confirmed that the CRC samples had the highest gains in chromosome 20 compared to the reference samples. Interpretation of the CNV data in the context of the transcriptome via integrative analyses may provide more in-depth knowledge of the genomic landscape of CRC.

  19. Integrated analysis of copy number variation and genome-wide expression profiling in colorectal cancer tissues.

    Directory of Open Access Journals (Sweden)

    Nur Zarina Ali Hassan

    Full Text Available Integrative analyses of multiple genomic datasets for selected samples can provide better insight into the overall data and can enhance our knowledge of cancer. The objective of this study was to elucidate the association between copy number variation (CNV and gene expression in colorectal cancer (CRC samples and their corresponding non-cancerous tissues. Sixty-four paired CRC samples from the same patients were subjected to CNV profiling using the Illumina HumanOmni1-Quad assay, and validation was performed using multiplex ligation probe amplification method. Genome-wide expression profiling was performed on 15 paired samples from the same group of patients using the Affymetrix Human Gene 1.0 ST array. Significant genes obtained from both array results were then overlapped. To identify molecular pathways, the data were mapped to the KEGG database. Whole genome CNV analysis that compared primary tumor and non-cancerous epithelium revealed gains in 1638 genes and losses in 36 genes. Significant gains were mostly found in chromosome 20 at position 20q12 with a frequency of 45.31% in tumor samples. Examples of genes that were associated at this cytoband were PTPRT, EMILIN3 and CHD6. The highest number of losses was detected at chromosome 8, position 8p23.2 with 17.19% occurrence in all tumor samples. Among the genes found at this cytoband were CSMD1 and DLC1. Genome-wide expression profiling showed 709 genes to be up-regulated and 699 genes to be down-regulated in CRC compared to non-cancerous samples. Integration of these two datasets identified 56 overlapping genes, which were located in chromosomes 8, 20 and 22. MLPA confirmed that the CRC samples had the highest gains in chromosome 20 compared to the reference samples. Interpretation of the CNV data in the context of the transcriptome via integrative analyses may provide more in-depth knowledge of the genomic landscape of CRC.

  20. Strainer: software for analysis of population variation in community genomic datasets

    Directory of Open Access Journals (Sweden)

    Tyson Gene W

    2007-10-01

    Full Text Available Abstract Background Metagenomic analyses of microbial communities that are comprehensive enough to provide multiple samples of most loci in the genomes of the dominant organism types will also reveal patterns of genetic variation within natural populations. New bioinformatic tools will enable visualization and comprehensive analysis of this sequence variation and inference of recent evolutionary and ecological processes. Results We have developed a software package for analysis and visualization of genetic variation in populations and reconstruction of strain variants from otherwise co-assembled sequences. Sequencing reads can be clustered by matching patterns of single nucleotide polymorphisms to generate predicted gene and protein variant sequences, identify conserved intergenic regulatory sequences, and determine the quantity and distribution of recombination events. Conclusion The Strainer software, a first generation metagenomic bioinformatics tool, facilitates comprehension and analysis of heterogeneity intrinsic in natural communities. The program reveals the degree of clustering among closely related sequence variants and provides a rapid means to generate gene and protein sequences for functional, ecological, and evolutionary analyses.

  1. Connecting Anxiety and Genomic Copy Number Variation: A Genome-Wide Analysis in CD-1 Mice.

    Directory of Open Access Journals (Sweden)

    Julia Brenndörfer

    Full Text Available Genomic copy number variants (CNVs have been implicated in multiple psychiatric disorders, but not much is known about their influence on anxiety disorders specifically. Using next-generation sequencing (NGS and two additional array-based genotyping approaches, we detected CNVs in a mouse model consisting of two inbred mouse lines showing high (HAB and low (LAB anxiety-related behavior, respectively. An influence of CNVs on gene expression in the central (CeA and basolateral (BLA amygdala, paraventricular nucleus (PVN, and cingulate cortex (Cg was shown by a two-proportion Z-test (p = 1.6 x 10-31, with a positive correlation in the CeA (p = 0.0062, PVN (p = 0.0046 and Cg (p = 0.0114, indicating a contribution of CNVs to the genetic predisposition to trait anxiety in the specific context of HAB/LAB mice. In order to confirm anxiety-relevant CNVs and corresponding genes in a second mouse model, we further examined CD-1 outbred mice. We revealed the distribution of CNVs by genotyping 64 CD 1 individuals using a high-density genotyping array (Jackson Laboratory. 78 genes within those CNVs were identified to show nominally significant association (48 genes, or a statistical trend in their association (30 genes with the time animals spent on the open arms of the elevated plus-maze (EPM. Fifteen of them were considered promising candidate genes of anxiety-related behavior as we could show a significant overlap (permutation test, p = 0.0051 with genes within HAB/LAB CNVs. Thus, here we provide what is to our knowledge the first extensive catalogue of CNVs in CD-1 mice and potential corresponding candidate genes linked to anxiety-related behavior in mice.

  2. CONAN: copy number variation analysis software for genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Wichmann Heinz-Erich

    2010-06-01

    Full Text Available Abstract Background Genome-wide association studies (GWAS based on single nucleotide polymorphisms (SNPs revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs. Results CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data. Conclusions CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the 'missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at http://genepi-conan.i-med.ac.at.

  3. Genome-wide copy number variation analysis in a Chinese autism spectrum disorder cohort

    Science.gov (United States)

    Guo, Hui; Peng, Yu; Hu, Zhengmao; Li, Ying; Xun, Guanglei; Ou, Jianjun; Sun, Liangdan; Xiong, Zhimin; Liu, Yanling; Wang, Tianyun; Chen, Jingjing; Xia, Lu; Bai, Ting; Shen, Yidong; Tian, Qi; Hu, Yiqiao; Shen, Lu; Zhao, Rongjuan; Zhang, Xuejun; Zhang, Fengyu; Zhao, Jingping; Zou, Xiaobing; Xia, Kun

    2017-01-01

    Autism spectrum disorder (ASD) describes a group of neurodevelopmental disorders with high heritability, although the underlying genetic determinants of ASDs remain largely unknown. Large-scale whole-genome studies of copy number variation in Han Chinese samples are still lacking. We performed a genome-wide copy number variation analysis of 343 ASD trios, 203 patients with sporadic cases and 988 controls in a Chinese population using Illumina genotyping platforms to identify CNVs and related genes that may contribute to ASD risk. We identified 32 rare CNVs larger than 1 Mb in 31 patients. ASD patients were found to carry a higher global burden of rare, large CNVs than controls. Recurrent de novo or case-private CNVs were found at 15q11-13, Xp22.3, 15q13.1–13.2, 3p26.3 and 2p12. The de novo 15q11–13 duplication was more prevalent in this Chinese population than in those with European ancestry. Several genes, including GRAMD2 and STAM, were implicated as novel ASD risk genes when integrating whole-genome CNVs and whole-exome sequencing data. We also identified several CNVs that include known ASD genes (SHANK3, CDH10, CSMD1) or genes involved in nervous system development (NYAP2, ST6GAL2, GRM6). Besides, our study also implicated Contactins-NYAPs-WAVE1 pathway in ASD pathogenesis. Our findings identify ASD-related CNVs in a Chinese population and implicate novel ASD risk genes and related pathway for further study. PMID:28281572

  4. Genome-wide transcriptome analysis revealed organelle specific responses to temperature variations in algae

    Science.gov (United States)

    Shin, HyeonSeok; Hong, Seong-Joo; Yoo, Chan; Han, Mi-Ae; Lee, Hookeun; Choi, Hyung-Kyoon; Cho, Suhyung; Lee, Choul-Gyun; Cho, Byung-Kwan

    2016-01-01

    Temperature is a critical environmental factor that affects microalgal growth. However, microalgal coping mechanisms for temperature variations are unclear. Here, we determined changes in transcriptome, total carbohydrate, total fatty acid methyl ester, and fatty acid composition of Tetraselmis sp. KCTC12432BP, a strain with a broad temperature tolerance range, to elucidate the tolerance mechanisms in response to large temperature variations. Owing to unavailability of genome sequence information, de novo transcriptome assembly coupled with BLAST analysis was performed using strand specific RNA-seq data. This resulted in 26,245 protein-coding transcripts, of which 83.7% could be annotated to putative functions. We identified more than 681 genes differentially expressed, suggesting an organelle-specific response to temperature variation. Among these, the genes related to the photosynthetic electron transfer chain, which are localized in the plastid thylakoid membrane, were upregulated at low temperature. However, the transcripts related to the electron transport chain and biosynthesis of phosphatidylethanolamine localized in mitochondria were upregulated at high temperature. These results show that the low energy uptake by repressed photosynthesis under low and high temperature conditions is compensated by different mechanisms, including photosystem I and mitochondrial oxidative phosphorylation, respectively. This study illustrates that microalgae tolerate different temperature conditions through organelle specific mechanisms. PMID:27883062

  5. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis

    Directory of Open Access Journals (Sweden)

    Mezey Jason G

    2010-01-01

    Full Text Available Abstract Background The success achieved by genome-wide association (GWA studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability. Results V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap. Conclusions V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.

  6. Genomic analysis of local variation and recent evolution in Plasmodium vivax.

    Science.gov (United States)

    Pearson, Richard D; Amato, Roberto; Auburn, Sarah; Miotto, Olivo; Almagro-Garcia, Jacob; Amaratunga, Chanaki; Suon, Seila; Mao, Sivanna; Noviyanti, Rintis; Trimarsanto, Hidayat; Marfurt, Jutta; Anstey, Nicholas M; William, Timothy; Boni, Maciej F; Dolecek, Christiane; Tran, Hien Tinh; White, Nicholas J; Michon, Pascal; Siba, Peter; Tavul, Livingstone; Harrison, Gabrielle; Barry, Alyssa; Mueller, Ivo; Ferreira, Marcelo U; Karunaweera, Nadira; Randrianarivelojosia, Milijaona; Gao, Qi; Hubbart, Christina; Hart, Lee; Jeffery, Ben; Drury, Eleanor; Mead, Daniel; Kekre, Mihir; Campino, Susana; Manske, Magnus; Cornelius, Victoria J; MacInnis, Bronwyn; Rockett, Kirk A; Miles, Alistair; Rayner, Julian C; Fairhurst, Rick M; Nosten, Francois; Price, Ric N; Kwiatkowski, Dominic P

    2016-08-01

    The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for the elimination of malaria. To characterize the genetic diversity of this parasite in individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region and analyzed data on >300,000 SNPs and nine regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at new loci, and these varied markedly between geographical locations. These findings demonstrate a dynamic landscape of local evolutionary adaptation in the parasite population and provide a foundation for genomic surveillance to guide effective strategies for control and elimination of P. vivax.

  7. Whole-genome copy number variation analysis in anophthalmia and microphthalmia.

    Science.gov (United States)

    Schilter, K F; Reis, L M; Schneider, A; Bardakjian, T M; Abdul-Rahman, O; Kozel, B A; Zimmerman, H H; Broeckel, U; Semina, E V

    2013-11-01

    Anophthalmia/microphthalmia (A/M) represent severe developmental ocular malformations. Currently, mutations in known genes explain less than 40% of A/M cases. We performed whole-genome copy number variation analysis in 60 patients affected with isolated or syndromic A/M. Pathogenic deletions of 3q26 (SOX2) were identified in four independent patients with syndromic microphthalmia. Other variants of interest included regions with a known role in human disease (likely pathogenic) as well as novel rearrangements (uncertain significance). A 2.2-Mb duplication of 3q29 in a patient with non-syndromic anophthalmia and an 877-kb duplication of 11p13 (PAX6) and a 1.4-Mb deletion of 17q11.2 (NF1) in two independent probands with syndromic microphthalmia and other ocular defects were identified; while ocular anomalies have been previously associated with 3q29 duplications, PAX6 duplications, and NF1 mutations in some cases, the ocular phenotypes observed here are more severe than previously reported. Three novel regions of possible interest included a 2q14.2 duplication which cosegregated with microphthalmia/microcornea and congenital cataracts in one family, and 2q21 and 15q26 duplications in two additional cases; each of these regions contains genes that are active during vertebrate ocular development. Overall, this study identified causative copy number mutations and regions with a possible role in ocular disease in 17% of A/M cases.

  8. Whole Genome Analysis of 132 Clinical Saccharomyces cerevisiae Strains Reveals Extensive Ploidy Variation

    Science.gov (United States)

    Zhu, Yuan O.; Sherlock, Gavin; Petrov, Dmitri A.

    2016-01-01

    Budding yeast has undergone several independent transitions from commercial to clinical lifestyles. The frequency of such transitions suggests that clinical yeast strains are derived from environmentally available yeast populations, including commercial sources. However, despite their important role in adaptive evolution, the prevalence of polyploidy and aneuploidy has not been extensively analyzed in clinical strains. In this study, we have looked for patterns governing the transition to clinical invasion in the largest screen of clinical yeast isolates to date. In particular, we have focused on the hypothesis that ploidy changes have influenced adaptive processes. We sequenced 144 yeast strains, 132 of which are clinical isolates. We found pervasive large-scale genomic variation in both overall ploidy (34% of strains identified as 3n/4n) and individual chromosomal copy numbers (36% of strains identified as aneuploid). We also found evidence for the highly dynamic nature of yeast genomes, with 35 strains showing partial chromosomal copy number changes and eight strains showing multiple independent chromosomal events. Intriguingly, a lineage identified to be baker’s/commercial derived with a unique damaging mutation in NDC80 was particularly prone to polyploidy, with 83% of its members being triploid or tetraploid. Polyploidy was in turn associated with a >2× increase in aneuploidy rates as compared to other lineages. This dataset provides a rich source of information on the genomics of clinical yeast strains and highlights the potential importance of large-scale genomic copy variation in yeast adaptation. PMID:27317778

  9. Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

    Directory of Open Access Journals (Sweden)

    Walker M Andrew

    2006-09-01

    Full Text Available Abstract Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c, 54 (Dixon, 83 (Ann1 and 9 (Temecula-1. A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes

  10. Epigenetic Variation in Monozygotic Twins: A Genome-Wide Analysis of DNA Methylation in Buccal Cells

    Directory of Open Access Journals (Sweden)

    Jenny van Dongen

    2014-05-01

    Full Text Available DNA methylation is one of the most extensively studied epigenetic marks in humans. Yet, it is largely unknown what causes variation in DNA methylation between individuals. The comparison of DNA methylation profiles of monozygotic (MZ twins offers a unique experimental design to examine the extent to which such variation is related to individual-specific environmental influences and stochastic events or to familial factors (DNA sequence and shared environment. We measured genome-wide DNA methylation in buccal samples from ten MZ pairs (age 8–19 using the Illumina 450k array and examined twin correlations for methylation level at 420,921 CpGs after QC. After selecting CpGs showing the most variation in the methylation level between subjects, the mean genome-wide correlation (rho was 0.54. The correlation was higher, on average, for CpGs within CpG islands (CGIs, compared to CGI shores, shelves and non-CGI regions, particularly at hypomethylated CpGs. This finding suggests that individual-specific environmental and stochastic influences account for more variation in DNA methylation in CpG-poor regions. Our findings also indicate that it is worthwhile to examine heritable and shared environmental influences on buccal DNA methylation in larger studies that also include dizygotic twins.

  11. Complete genome sequence analysis of goatpox virus isolated from China shows high variation.

    Science.gov (United States)

    Zeng, Xiancheng; Chi, Xuelin; Li, Wei; Hao, Wenbo; Li, Ming; Huang, Xiaohong; Huang, Yifan; Rock, Daniel L; Luo, Shuhong; Wang, Shihua

    2014-09-17

    Goatpox virus (GTPV), a member of the Capripoxvirus genus of the Poxviridae family, is the causative agent of variolo caprina (goatpox). GTPV can cause significant economic losses of domestic ruminants in endemic regions and can threaten breeding stocks. In this study, we report on the compilation of the complete genomic sequence of an isolated GTPV field strain FZ (GTPV_FZ). The 150,194bp GTPV genome consists of a central coding region bounded by two identical 2301bp inverted terminal repeats and contains 151 putative genes. Comparative genomic analysis reveals the apparent genetic relationships among Capripoxviruses are close, but sufficient genomic variants in the field isolate strain FZ have been identified to distinguish it from other GTPV strains and other Capripoxvirus species. Phylogenetic analysis based on the p32 and complete GTPV genome can be used to differentiate SPPVs, GTPVs and LSDVs. These data may contribute to the epidemiological study of the Chinese capripoxvirus and help to develop more specific detection methods to distinguish GTPVs, SPPVs and LSDVs.

  12. Genomics technologies to study structural variations in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Cardone Maria Francesca

    2016-01-01

    Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.

  13. Genome sequence variation analysis of two SARS coronavirus isolates after passage in Vero cell culture

    Institute of Scientific and Technical Information of China (English)

    JIN Weiwu; LI Ning; HU Liangxiang; DU Zhenglin; GAO Qiang; GAO Hong; NING Ye; FENG Jidong; ZHANG Jiansan; YIN Weidong

    2004-01-01

    SARS coronavirus is an RNA virus whose replication is error-prone, which provides possibility for escape of host defenses, and even leads to evolution of new viral strains during the passage or the transmission. Lots of variations have been detected among different SARS-CoV strains. And a study on these variations is helpful for development of efficient vaccine. Moreover, the test of nucleic acid characterization and genetic stability of SARS-CoV is important in the research of inactivated vaccine. The whole genome sequences of two SARS coronavirus strains after passage in Vero cell culture were determined and were compared with those of early passages, respectively. Results showed that both SARS coronavirus strains have high genetic stability, although nearly 10 generations were passed. Four nucleotide variations were observed between the second passage and the 11th passage of Sino1 strain for identification of SARS inactivated vaccine. Moreover, only one nucleotide was different between the third passage and the 10th passage of Sino3 strain for SARS inactivated vaccine. Therefore, this study suggested it was possible to develop inactivated vaccine against SARS-CoV in the future.

  14. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  15. Genome-wide copy number variation analysis in adult attention-deficit and hyperactivity disorder.

    Science.gov (United States)

    Ramos-Quiroga, Josep-Antoni; Sánchez-Mora, Cristina; Casas, Miguel; Garcia-Martínez, Iris; Bosch, Rosa; Nogueira, Mariana; Corrales, Montse; Palomar, Gloria; Vidal, Raquel; Coll-Tané, Mireia; Bayés, Mònica; Cormand, Bru; Ribasés, Marta

    2014-02-01

    Attention-deficit and hyperactivity disorder (ADHD) is a common psychiatric disorder with a worldwide prevalence of 5-6% in children and 4.4% in adults. Recently, copy number variations (CNVs) have been implicated in different neurodevelopmental disorders such as ADHD. Based on these previous reports that focused on pediatric cohorts, we hypothesize that structural variants may also contribute to adult ADHD and that such genomic variation may be enriched for CNVs previously identified in children with ADHD. To address this issue, we performed for the first time a whole-genome CNV study on 400 adults with ADHD and 526 screened controls. In agreement with recent reports in children with ADHD or in other psychiatric disorders, we identified a significant excess of insertions in ADHD patients compared to controls. The overall rate of CNVs >100 kb was 1.33 times higher in ADHD subjects than in controls (p = 2.4e-03), an observation mainly driven by a higher proportion of small events (from 100 kb to 500 kb; 1.35-fold; p = 1.3e-03). These differences remained significant when we considered CNVs that overlap genes or when structural variants spanning candidate genes for psychiatric disorders were evaluated, with duplications showing the greatest difference (1.41-fold, p = 0.024 and 2.85-fold, p = 8.5e-03, respectively). However, no significant enrichment was detected in our ADHD cohort for childhood ADHD-associated CNVs, CNVs previously identified in at least one ADHD patient or CNVs previously implicated in autism or schizophrenia. In conclusion, our study provides tentative evidence for a higher rate of CNVs in adults with ADHD compared to controls and contributes to the growing list of structural variants potentially involved in the etiology of the disease.

  16. Clinical Interpretation of Genomic Variations.

    Science.gov (United States)

    Sayitoğlu, Müge

    2016-09-05

    Novel high-throughput sequencing technologies generate large-scale genomic data and are used extensively for disease mapping of monogenic and/or complex disorders, personalized treatment, and pharmacogenomics. Next-generation sequencing is rapidly becoming routine tool for diagnosis and molecular monitoring of patients to evaluate therapeutic efficiency. The next-generation sequencing platforms generate huge amounts of genetic variation data and it remains a challenge to interpret the variations that are identified. Such data interpretation needs close collaboration among bioinformaticians, clinicians, and geneticists. There are several problems that must be addressed, such as the generation of new algorithms for mapping and annotation, harmonization of the terminology, correct use of nomenclature, reference genomes for different populations, rare disease variant databases, and clinical reports.

  17. Array comparative genomic hybridization profiling analysis reveals deoxyribonucleic acid copy number variations associated with premature ovarian failure.

    Science.gov (United States)

    Aboura, Azzedine; Dupas, Claire; Tachdjian, Gérard; Portnoï, Marie-France; Bourcigaux, Nathalie; Dewailly, Didier; Frydman, René; Fauser, Bart; Ronci-Chaix, Nathalie; Donadille, Bruno; Bouchard, Philippe; Christin-Maitre, Sophie

    2009-11-01

    Premature ovarian failure (POF) is defined by amenorrhea of at least 4- to 6-month duration, occurring before 40 yr of age, with two FSH levels in the postmenopausal range. Its etiology remains unknown in more than 80% of cases. Standard karyotypes, having a resolution of 5-10 Mb, have identified critical chromosomal regions, mainly located on the long arm of the X chromosome. Array comparative genomic hybridization (a-CGH) analysis is able to detect submicroscopic chromosomal rearrangements with a higher genomic resolution. We searched for copy number variations (CNVs), using a-CGH analysis with a resolution of approximately 0.7 Mb, in a cohort of patients with POF. We prospectively included 99 women. Our study included a conventional karyotype and DNA microarrays comprising 4500 bacterial artificial chromosome clones spread on the entire genome. Thirty-one CNVs have been observed, three on the X chromosome and 28 on autosomal chromosomes. Data have been compared to control populations obtained from the Database of Genomic Variants (http://projects.tcag.ca/variation). Eight statistically significantly different CNVs have been identified in chromosomal regions 1p21.1, 5p14.3, 5q13.2, 6p25.3, 14q32.33, 16p11.2, 17q12, and Xq28. We report the first study of CNV analysis in a large cohort of Caucasian POF patients. In the eight statistically significant CNVs we report, we found five genes involved in reproduction, thus representing potential candidate genes in POF. The current study along with emerging information regarding CNVs, as well as data on their potential association with human diseases, emphasizes the importance of assessing CNVs in cohorts of POF women.

  18. GFVO: the Genomic Feature and Variation Ontology.

    Science.gov (United States)

    Baran, Joachim; Durgahee, Bibi Sehnaaz Begum; Eilbeck, Karen; Antezana, Erick; Hoehndorf, Robert; Dumontier, Michel

    2015-01-01

    Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology's GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.

  19. GFVO: the Genomic Feature and Variation Ontology

    KAUST Repository

    Baran, Joachim

    2015-05-05

    Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology’s GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.

  20. GFVO: the Genomic Feature and Variation Ontology

    Directory of Open Access Journals (Sweden)

    Joachim Baran

    2015-05-01

    Full Text Available Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA, GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations.Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology’s GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations; examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0 and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.

  1. Genome size variation in Begonia.

    Science.gov (United States)

    Dewitte, Angelo; Leus, Leen; Eeckhaut, Tom; Vanstechelman, Ives; Van Huylenbroeck, Johan; Van Bockstaele, Erik

    2009-10-01

    The genome sizes of a Begonia collection comprising 37 species and 23 hybrids of African, Asiatic, Middle American, and South American origin were screened using flow cytometry. Within the collection, 1C values varied between 0.23 and 1.46 pg DNA. Genome sizes were, in most cases, not positively correlated with chromosome number, but with pollen size. A 12-fold difference in mean chromosome size was found between the genotypes with the largest and smallest chromosomes. In general, chromosomes from South American genotypes were smaller than chromosomes of African, Asian, or Middle American genotypes, except for B. boliviensis and B. pearcei. Cytological chromosome studies in different genotypes showed variable chromosome numbers, length, width, and total chromosome volume, which confirmed the diversity in genome size. Large secondary constrictions were present in several investigated genotypes. These data show that chromosome number and structure exhibit a great deal of variation within the genus Begonia, and likely help to explain the large number of taxa found within the genus.

  2. From genomic variation to personalized medicine

    DEFF Research Database (Denmark)

    Wesolowska, Agata; Schmiegelow, Kjeld

    Genomic variation is the basis of interindividual differences in observable traits and disease susceptibility. Genetic studies are the driving force of personalized medicine, as many of the differences in treatment efficacy can be attributed to our genomic background. The rapid development of nex...... alternative to data-driven genome-wide association studies. Finally, the findings of the presented studies set new directions for future pharmacognenetic investigations and provide a framework for future implementation of personalized medicine.......Genomic variation is the basis of interindividual differences in observable traits and disease susceptibility. Genetic studies are the driving force of personalized medicine, as many of the differences in treatment efficacy can be attributed to our genomic background. The rapid development...... the thesis and includes some final remarks on the perspectives of genomic variation research and personalized medicine. In summary, this thesis demonstrates the feasibility of integrative analyses of genomic variations and introduces large-scale hypothesis-driven SNP exploration studies as an emerging...

  3. Analysis of genetic variation and potential applications in genome-scale metabolic modeling

    DEFF Research Database (Denmark)

    Cardoso, Joao; Andersen, Mikael Rørdam; Herrgard, Markus;

    2015-01-01

    Genetic variation is the motor of evolution and allows organisms to overcome the environmental challenges they encounter. It can be both beneficial and harmful in the process of engineering cell factories for the production of proteins and chemicals. Throughout the history of biotechnology, there...

  4. Genome-Wide Analysis Shows Increased Frequency of Copy Number Variation Deletions in Dutch Schizophrenia Patients

    NARCIS (Netherlands)

    Buizer-Voskamp, Jacobine E.; Muntjewerff, Jan-Willem; Strengman, Eric; Sabatti, Chiara; Stefansson, Hreinn; Vorstman, Jacob A. S.; Ophoff, Roel A.; GROUP investigators, [No Value

    2011-01-01

    Background: Since 2008, multiple studies have reported on copy number variations (CNVs) in schizophrenia. However, many regions are unique events with minimal overlap between studies. This makes it difficult to gain a comprehensive overview of all CNVs involved in the etiology of schizophrenia. We p

  5. Genome-wide mapping of copy number variation in humans: comparative analysis of high resolution array platforms.

    Directory of Open Access Journals (Sweden)

    Rajini R Haraksingh

    Full Text Available Accurate and efficient genome-wide detection of copy number variants (CNVs is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH, Single Nucleotide Polymorphism (SNP genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.

  6. Copy number variation identification and analysis of the chicken genome using a 60K SNP BeadChip.

    Science.gov (United States)

    Rao, Y S; Li, J; Zhang, R; Lin, X R; Xu, J G; Xie, L; Xu, Z Q; Wang, L; Gan, J K; Xie, X J; He, J; Zhang, X Q

    2016-08-01

    Copy number variation (CNV) is an important source of genetic variation in organisms and a main factor that affects phenotypic variation. A comprehensive study of chicken CNV can provide valuable information on genetic diversity and facilitate future analyses of associations between CNV and economically important traits in chickens. In the present study, an F2 full-sib chicken population (554 individuals), established from a cross between Xinghua and White Recessive Rock chickens, was used to explore CNV in the chicken genome. Genotyping was performed using a chicken 60K SNP BeadChip. A total of 1,875 CNV were detected with the PennCNV algorithm, and the average number of CNV was 3.42 per individual. The CNV were distributed across 383 independent CNV regions (CNVR) and covered 41 megabases (3.97%) of the chicken genome. Seven CNVR in 108 individuals were validated by quantitative real-time PCR, and 81 of these individuals (75%) also were detected with the PennCNV algorithm. In total, 274 CNVR (71.54%) identified in the current study were previously reported. Of these, 147 (38.38%) were reported in at least 2 studies. Additionally, 109 of the CNVR (28.46%) discovered here are novel. A total of 709 genes within or overlapping with the CNVR was retrieved. Out of the 2,742 quantitative trait loci (QTL) collected in the chicken QTL database, 43 QTL had confidence intervals overlapping with the CNVR, and 32 CNVR encompassed one or more functional genes. The functional genes located in the CNVR are likely to be the QTG that are associated with underlying economic traits. This study considerably expands our insight into the structural variation in the genome of chickens and provides an important resource for genomic variation, especially for genomic structural variation related to economic traits in chickens.

  7. Genomic and proteomic analysis of soybean heritable variations induced by space flight

    Institute of Scientific and Technical Information of China (English)

    HE Jie; GAO Yong; SUN Ye-qing

    2009-01-01

    To analyze the biological effects of space environment, the diversity of genomic DNA between the space flight soybean 194(4126) with phenotype of good yield and good fruit quality induced by space flight and the soybean with ground control was studied by amplified fragment length polymorphism (AFLP) method, and the polymorphism of space flight soybean 194(4126) was 3.56%. The differences of protein expression of seeds and leaves between the two kinds of soybeans were analysed by two-dimensional electrophoresis, PDQuest software and MALDI-TOF mass spectrometry. Results show that the loss and decrease of protein expression in 194(4126) soybean are subjected to the space fight of seeds, and three special proteins including Dehydrin, MAT1 and ceQORH are identified. It is concluded that the space environment changes the phenotype and geno-type of soybeans due to the space flight of seeds.

  8. Metabolic and genomic analysis elucidates strain-level variation in Microbacterium spp. isolated from chromate contaminated sediment

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data is in the form of genomic sequences deposited in a public database, growth curves, and bioinformatic analysis of sequences. This dataset is associated with...

  9. Burkholderia pseudomallei genome plasticity associated with genomic island variation

    Directory of Open Access Journals (Sweden)

    Currie Bart J

    2008-04-01

    Full Text Available Abstract Background Burkholderia pseudomallei is a soil-dwelling saprophyte and the cause of melioidosis. Horizontal gene transfer contributes to the genetic diversity of this pathogen and may be an important determinant of virulence potential. The genome contains genomic island (GI regions that encode a broad array of functions. Although there is some evidence for the variable distribution of genomic islands in B. pseudomallei isolates, little is known about the extent of variation between related strains or their association with disease or environmental survival. Results Five islands from B. pseudomallei strain K96243 were chosen as representatives of different types of genomic islands present in this strain, and their presence investigated in other B. pseudomallei. In silico analysis of 10 B. pseudomallei genome sequences provided evidence for the variable presence of these regions, together with micro-evolutionary changes that generate GI diversity. The diversity of GIs in 186 isolates from NE Thailand (83 environmental and 103 clinical isolates was investigated using multiplex PCR screening. The proportion of all isolates positive by PCR ranged from 12% for a prophage-like island (GI 9, to 76% for a metabolic island (GI 16. The presence of each of the five GIs did not differ between environmental and disease-associated isolates (p > 0.05 for all five islands. The cumulative number of GIs per isolate for the 186 isolates ranged from 0 to 5 (median 2, IQR 1 to 3. The distribution of cumulative GI number did not differ between environmental and disease-associated isolates (p = 0.27. The presence of GIs was defined for the three largest clones in this collection (each defined as a single sequence type, ST, by multilocus sequence typing; these were ST 70 (n = 15 isolates, ST 54 (n = 11, and ST 167 (n = 9. The rapid loss and/or acquisition of gene islands was observed within individual clones. Comparisons were drawn between isolates obtained

  10. Genomic variations of Mycoplasma capricolum subsp capripneumoniae detected by amplified fragment length polymorphism (AFLP) analysis

    DEFF Research Database (Denmark)

    Kokotovic, Branko; Bolske, G.; Ahrens, Peter;

    2000-01-01

    The genetic diversity of Mycoplasma capricolum subsp. capripneumoniae strains based on determination of amplified fragment length polymorphisms (AFLP) is described. AFLP fingerprints of 38 strains derived from different countries in Africa and the Middle East consisted of over 100 bands in the size...... found by 16S rDNA analysis. The present data support previous observations regarding genetic homogeneity of M. capricolum subsp. capripneumoniae, and confirm the two evolutionary lines of descent found by analysis of 16S rRNA genes....

  11. Functional Genomic Analysis of Variation on Beef Tenderness Induced by Acute Stress in Angus Cattle

    Directory of Open Access Journals (Sweden)

    Chunping Zhao

    2012-01-01

    Full Text Available Beef is one of the leading sources of protein, B vitamins, iron, and zinc in human food. Beef palatability is based on three general criteria: tenderness, juiciness, and flavor, of which tenderness is thought to be the most important factor. In this study, we found that beef tenderness, measured by the Warner-Bratzler shear force (WBSF, was dramatically increased by acute stress. Microarray analysis and qPCR identified a variety of genes that were differentially expressed. Pathway analysis showed that these genes were involved in immune response and regulation of metabolism process as activators or repressors. Further analysis identified that these changes may be related with CpG methylation of several genes. Therefore, the results from this study provide an enhanced understanding of the mechanisms that genetic and epigenetic regulations control meat quality and beef tenderness.

  12. Genomic variation in Salmonella enterica core genes for epidemiological typing

    Directory of Open Access Journals (Sweden)

    Leekitcharoenphon Pimlapas

    2012-03-01

    Full Text Available Abstract Background Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over time. The core genes--the genes that are conserved in all (or most members of a genus or species--are potentially good candidates for investigating genomic variation in phylogeny and epidemiology. Results We identify a set of 2,882 core genes clusters based on 73 publicly available Salmonella enterica genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher confidence. The core genes can be divided into two categories: a few highly variable genes and a larger set of conserved core genes, with low variance. For the most variable core genes, the variance in amino acid sequences is higher than for the corresponding nucleotide sequences, suggesting that there is a positive selection towards mutations leading to amino acid changes. Conclusions Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important especially in trend analysis.

  13. Variation, evolution, and correlation analysis of C+G content and genome or chromosome size in different kingdoms and phyla.

    Science.gov (United States)

    Li, Xiu-Qing; Du, Donglei

    2014-01-01

    C+G content (GC content or G+C content) is known to be correlated with genome/chromosome size in bacteria but the relationship for other kingdoms remains unclear. This study analyzed genome size, chromosome size, and base composition in most of the available sequenced genomes in various kingdoms. Genome size tends to increase during evolution in plants and animals, and the same is likely true for bacteria. The genomic C+G contents were found to vary greatly in microorganisms but were quite similar within each animal or plant subkingdom. In animals and plants, the C+G contents are ranked as follows: monocot plants>mammals>non-mammalian animals>dicot plants. The variation in C+G content between chromosomes within species is greater in animals than in plants. The correlation between average chromosome C+G content and chromosome length was found to be positive in Proteobacteria, Actinobacteria (but not in other analyzed bacterial phyla), Ascomycota fungi, and likely also in some plants; negative in some animals, insignificant in two protist phyla, and likely very weak in Archaea. Clearly, correlations between C+G content and chromosome size can be positive, negative, or not significant depending on the kingdoms/groups or species. Different phyla or species exhibit different patterns of correlation between chromosome-size and C+G content. Most chromosomes within a species have a similar pattern of variation in C+G content but outliers are common. The data presented in this study suggest that the C+G content is under genetic control by both trans- and cis- factors and that the correlation between C+G content and chromosome length can be positive, negative, or not significant in different phyla.

  14. Analysis of copy number variation in the rhesus macaque genome identifies candidate loci for evolutionary and human disease studies.

    Science.gov (United States)

    Lee, Arthur S; Gutiérrez-Arcelus, María; Perry, George H; Vallender, Eric J; Johnson, Welkin E; Miller, Gregory M; Korbel, Jan O; Lee, Charles

    2008-04-15

    Copy number variants (CNVs) are heritable gains and losses of genomic DNA in normal individuals. While copy number variation is widely studied in humans, our knowledge of CNVs in other mammalian species is more limited. We have designed a custom array-based comparative genomic hybridization (aCGH) platform with 385 000 oligonucleotide probes based on the reference genome sequence of the rhesus macaque (Macaca mulatta), the most widely studied non-human primate in biomedical research. We used this platform to identify 123 CNVs among 10 unrelated macaque individuals, with 24% of the CNVs observed in multiple individuals. We found that segmental duplications were significantly enriched at macaque CNV loci. We also observed significant overlap between rhesus macaque and human CNVs, suggesting that certain genomic regions are prone to recurrent CNV formation and instability, even across a total of approximately 50 million years of primate evolution ( approximately 25 million years in each lineage). Furthermore, for eight of the CNVs that were observed in both humans and macaques, previous human studies have reported a relationship between copy number and gene expression or disease susceptibility. Therefore, the rhesus macaque offers an intriguing, non-human primate outbred model organism with which hypotheses concerning the specific functions of phenotypically relevant human CNVs can be tested.

  15. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    Science.gov (United States)

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing.

  16. Copy number variation in the bovine genome

    DEFF Research Database (Denmark)

    Fadista, João; Thomsen, Bo; Holm, Lars-Erik;

    2010-01-01

    to genetic variation in cattle. Results We designed and used a set of NimbleGen CGH arrays that tile across the assayable portion of the cattle genome with approximately 6.3 million probes, at a median probe spacing of 301 bp. This study reports the highest resolution map of copy number variation...... in the cattle genome, with 304 CNV regions (CNVRs) being identified among the genomes of 20 bovine samples from 4 dairy and beef breeds. The CNVRs identified covered 0.68% (22 Mb) of the genome, and ranged in size from 1.7 to 2,031 kb (median size 16.7 kb). About 20% of the CNVs co-localized with segmental...

  17. A multivariate analysis of variation in genome size and endoreduplication in angiosperms reveals strong phylogenetic signal and association with phenotypic traits.

    Science.gov (United States)

    Bainard, Jillian D; Bainard, Luke D; Henry, Thomas A; Fazekas, Aron J; Newmaster, Steven G

    2012-12-01

    Genome size (C-value) and endopolyploidy (endoreduplication index, EI) are known to correlate with various morphological and ecological traits, in addition to phylogenetic placement. A phylogenetically controlled multivariate analysis was used to explore the relationships between DNA content and phenotype in angiosperms. Seeds from 41 angiosperm species (17 families) were grown in a common glasshouse experiment. Genome size (2C-value and 1Cx-value) and EI (in four tissues: leaf, stem, root, petal) were determined using flow cytometry. The phylogenetic signal was calculated for each measure of DNA content, and phylogenetic canonical correlation analysis (PCCA) explored how the variation in genome size and EI was correlated with 18 morphological and ecological traits. Phylogenetic signal (λ) was strongest for EI in all tissues, and λ was stronger for the 2C-value than the 1Cx-value. PCCA revealed that EI was correlated with pollen length, stem height, seed mass, dispersal mechanism, arbuscular mycorrhizal association, life history and flowering time, and EI and genome size were both correlated with stem height and life history. PCCA provided an effective way to explore multiple factors of DNA content variation and phenotypic traits in a phylogenetic context. Traits that were correlated significantly with DNA content were linked to plant competitive ability. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.

  18. Progress in the detection of human genome structural variations

    Institute of Scientific and Technical Information of China (English)

    WU XueMei; XIAO HuaSheng

    2009-01-01

    The emerging of high.throughput and high-resolution genomic technologies led to the detection of submicroscopic variants ranging from 1 kb to 3 Mb in the human genome. These variants include copy number variations (CNVs), inversions, insertions, deletions and other complex rearrangements of DNA sequences. This paper briefly reviews the commonly used technologies to discover both genomic structural variants and their potential influences. Particularly, we highlight the array-based, PCR-based and sequencing-based assays, including array-based comparative genomic hybridization (aCGH),representational oligonucleotide microarray analysis (ROMA), multiplex amplifiable probe hybridization (MAPH), multiplex ligation-dependent probe amplification (MLPA), paired-end mapping (PEM), and next-generation DNA sequencing technologies. Furthermore, we discuss the limitations and challenges of current assays and give advices on how to make the database of genomic variations more reliable.

  19. Progress in the detection of human genome structural variations

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    The emerging of high-throughput and high-resolution genomic technologies led to the detection of submicroscopic variants ranging from 1 kb to 3 Mb in the human genome.These variants include copy number variations(CNVs),inversions,insertions,deletions and other complex rearrangements of DNA sequences.This paper briefly reviews the commonly used technologies to discover both genomic structural variants and their potential influences.Particularly,we highlight the array-based,PCR-based and sequencing-based assays,including array-based comparative genomic hybridization(aCGH),representational oligonucleotide microarray analysis(ROMA),multiplex amplifiable probe hybridization(MAPH),multiplex ligation-dependent probe amplification(MLPA),paired-end mapping(PEM),and next-generation DNA sequencing technologies.Furthermore,we discuss the limitations and challenges of current assays and give advices on how to make the database of genomic variations more reliable.

  20. Detection of genomic variations and DNA polymorphisms and impact on analysis of meiotic recombination and genetic mapping.

    Science.gov (United States)

    Qi, Ji; Chen, Yamao; Copenhaver, Gregory P; Ma, Hong

    2014-07-08

    DNA polymorphisms are important markers in genetic analyses and are increasingly detected by using genome resequencing. However, the presence of repetitive sequences and structural variants can lead to false positives in the identification of polymorphic alleles. Here, we describe an analysis strategy that minimizes false positives in allelic detection and present analyses of recently published resequencing data from Arabidopsis meiotic products and individual humans. Our analysis enables the accurate detection of sequencing errors, small insertions and deletions (indels), and structural variants, including large reciprocal indels and copy number variants, from comparisons between the resequenced and reference genomes. We offer an alternative interpretation of the sequencing data of meiotic products, including the number and type of recombination events, to illustrate the potential for mistakes in single-nucleotide polymorphism calling. Using these examples, we propose that the detection of DNA polymorphisms using resequencing data needs to account for nonallelic homologous sequences.

  1. Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing

    Directory of Open Access Journals (Sweden)

    Chen Zuozhou

    2010-11-01

    Full Text Available Abstract Background Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing. Results Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms. Conclusions We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.

  2. ENGINES: exploring single nucleotide variation in entire human genomes

    Directory of Open Access Journals (Sweden)

    Salas Antonio

    2011-04-01

    Full Text Available Abstract Background Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. Description We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs, population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs, as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP repositories such as HapMap or Perlegen. Conclusions ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart

  3. Genome-wide analysis of DNA methylation, copy number variation, and gene expression in monozygotic twins discordant for primary biliary cirrhosis

    Directory of Open Access Journals (Sweden)

    Carlo eSelmi

    2014-03-01

    Full Text Available Primary biliary cirrhosis (PBC is an uncommon autoimmune disease with a homogeneous clinical phenotype that reflects incomplete disease concordance in monozygotic (MZ twins. We have taken advantage of a unique collection consisting of genomic DNA and mRNA from peripheral blood cells of female MZ twins (n=3 sets and sisters of similar age (n=8 pairs discordant for disease. We performed a genome-wide study to investigate differences in (i DNA methylation (using a custom tiled 4-plex array containing tiled 50-mers 19,084 randomly chosen methylation sites, (ii copy number variation (CNV (with a chip including markers derived from the 1000 Genomes Project, all three HapMap phases, and recently published studies, and/or (iii gene expression (by whole-genome expression arrays. Based on the results obtained from these three approaches we utilized quantitative PCR to compare the expression of candidate genes. Importantly, our data support consistent differences in discordant twins and siblings for the (i methylation profiles of 60 gene regions, (ii CNV of 10 genes, and (iii the expression of 2 interferon-dependent genes. Quantitative PCR analysis showed that 17 of these genes are differentially expressed in discordant sibling pairs. In conclusion, we report that MZ twins and sisters discordant for PBC manifest particular epigenetic differences and highlight the value of the epigenetic study of twins.

  4. Online resources for genomic structural variation.

    Science.gov (United States)

    Sneddon, Tam P; Church, Deanna M

    2012-01-01

    Genomic structural variation (SV) can be thought of on a continuum from a single base pair insertion/deletion (INDEL) to large megabase-scale rearrangements involving insertions, deletions, duplications, inversions, or translocations of whole chromosomes or chromosome arms. These variants can occur in coding or noncoding DNA, they can be inherited or arise sporadically in the germline or somatic cells. Many of these events are segregating in the population and can be considered common alleles while others are new alleles and thus rare events. All species studied to date harbor structural variants and these may be benign, contributing to phenotypes such as sensory perception and immunity, or pathogenic resulting in genomic disorders including DiGeorge/velocardiofacial, Smith-Margenis, Williams-Beuren, and Prader-Willi syndromes. As structural variants are identified, validated, and their significance, origin, and prevalence are elucidated, it is of critical importance that these data be collected and collated in a way that can be easily accessed and analyzed. This chapter describes current structural variation online resources (see Fig. 1 and Table 1), highlights the challenges in capturing, storing, and displaying SV data, and discusses how dbVar and DGVa, the genomic structural variation databases developed at NCBI and EBI, respectively, were designed to address these issues.

  5. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    Science.gov (United States)

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-05

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.

  6. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  7. Genome-wide analysis of ZmDREB genes and their association with natural variation in drought tolerance at seedling stage of Zea mays L.

    Directory of Open Access Journals (Sweden)

    Shengxue Liu

    Full Text Available The worldwide production of maize (Zea mays L. is frequently impacted by water scarcity and as a result, increased drought tolerance is a priority target in maize breeding programs. While DREB transcription factors have been demonstrated to play a central role in desiccation tolerance, whether or not natural sequence variations in these genes are associated with the phenotypic variability of this trait is largely unknown. In the present study, eighteen ZmDREB genes present in the maize B73 genome were cloned and systematically analyzed to determine their phylogenetic relationship, synteny with rice, maize and sorghum genomes; pattern of drought-responsive gene expression, and protein transactivation activity. Importantly, the association between the nucleic acid variation of each ZmDREB gene with drought tolerance was evaluated using a diverse population of maize consisting of 368 varieties from tropical and temperate regions. A significant association between the genetic variation of ZmDREB2.7 and drought tolerance at seedling stage was identified. Further analysis found that the DNA polymorphisms in the promoter region of ZmDREB2.7, but not the protein coding region itself, was associated with different levels of drought tolerance among maize varieties, likely due to distinct patterns of gene expression in response to drought stress. In vitro, protein-DNA binding assay demonstrated that ZmDREB2.7 protein could specifically interact with the target DNA sequences. The transgenic Arabidopsis overexpressing ZmDREB2.7 displayed enhanced tolerance to drought stress. Moreover, a favorable allele of ZmDREB2.7, identified in the drought-tolerant maize varieties, was effective in imparting plant tolerance to drought stress. Based upon these findings, we conclude that natural variation in the promoter of ZmDREB2.7 contributes to maize drought tolerance, and that the gene and its favorable allele may be an important genetic resource for the genetic

  8. Patterns of genome size variation in snapping shrimp.

    Science.gov (United States)

    Jeffery, Nicholas W; Hultgren, Kristin; Chak, Solomon Tin Chi; Gregory, T Ryan; Rubenstein, Dustin R

    2016-06-01

    Although crustaceans vary extensively in genome size, little is known about how genome size may affect the ecology and evolution of species in this diverse group, in part due to the lack of large genome size datasets. Here we investigate interspecific, intraspecific, and intracolony variation in genome size in 39 species of Synalpheus shrimps, representing one of the largest genome size datasets for a single genus within crustaceans. We find that genome size ranges approximately 4-fold across Synalpheus with little phylogenetic signal, and is not related to body size. In a subset of these species, genome size is related to chromosome size, but not to chromosome number, suggesting that despite large genomes, these species are not polyploid. Interestingly, there appears to be 35% intraspecific genome size variation in Synalpheus idios among geographic regions, and up to 30% variation in Synalpheus duffyi genome size within the same colony.

  9. An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes.

    Science.gov (United States)

    Chen, Chih-Hao; Lee, Hsing-Chung; Ling, Qingdong; Chen, Hsiao-Rong; Ko, Yi-An; Tsou, Tsong-Shan; Wang, Sun-Chong; Wu, Li-Ching; Lee, H C

    2011-07-01

    Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ∼1 s and a 1.8 million-probe array in ∼8 s.

  10. Genome-wide analysis of CNV (copy number variation) and their associations with narcolepsy in a Japanese population.

    Science.gov (United States)

    Yamasaki, Maria; Miyagawa, Taku; Toyoda, Hiromi; Khor, Seik-Soon; Koike, Asako; Nitta, Aino; Akiyama, Kumi; Sasaki, Tsukasa; Honda, Yutaka; Honda, Makoto; Tokunaga, Katsushi

    2014-05-01

    In humans, narcolepsy with cataplexy (narcolepsy) is a sleep disorder that is characterized by sleepiness, cataplexy and rapid eye movement (REM) sleep abnormalities. Narcolepsy is caused by a reduction in the number of neurons that produce hypocretin (orexin) neuropeptide. Both genetic and environmental factors contribute to the development of narcolepsy.Rare and large copy number variations (CNVs) reportedly play a role in the etiology of a number of neuropsychiatric disorders. Narcolepsy is considered a neurological disorder; therefore, we sought to investigate any possible association between rare and large CNVs and human narcolepsy. We used DNA microarray data and a CNV detection software application, PennCNV-Affy, to detect CNVs in 426 Japanese narcoleptic patients and 562 healthy individuals. Overall, we found a significant enrichment of rare and large CNVs (frequency ≤1%, size ≥100 kb) in the patients (case-control ratio of CNV count=1.54, P=5.00 × 10(-4)). Next, we extended a region-based association analysis by including CNVs with its size ≥30 kb. Rare and large CNVs in PARK2 region showed a significant association with narcolepsy. Four patients were assessed to carry duplications of the gene region, whereas no controls carried the duplication, which was further confirmed by quantitative PCR assay. This duplication was also found in 2 essential hypersomnia (EHS) patients out of 171 patients. Furthermore, a pathway analysis revealed enrichments of gene disruptions by rare and large CNVs in immune response, acetyltransferase activity, cell cycle regulation and regulation of cell development. This study constitutes the first report on the risk association between multiple rare and large CNVs and the pathogenesis of narcolepsy. In the future, replication studies are needed to confirm the associations.

  11. Phenotypic impact of genomic structural variation

    DEFF Research Database (Denmark)

    Weischenfeldt, Joachim; Symmons, Orsolya; Spitz, François;

    2013-01-01

    Genomic structural variants have long been implicated in phenotypic diversity and human disease, but dissecting the mechanisms by which they exert their functional impact has proven elusive. Recently however, developments in high-throughput DNA sequencing and chromosomal engineering technology have...... facilitated the analysis of structural variants in human populations and model systems in unprecedented detail. In this Review, we describe how structural variants can affect molecular and cellular processes, leading to complex organismal phenotypes, including human disease. We further present advances...

  12. Genomic variation in Salmonella enterica core genes for epidemiological typing

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis

    2012-01-01

    Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...

  13. Genomic variation landscape of the human gut microbiome

    DEFF Research Database (Denmark)

    Schloissnig, Siegfried; Arumugam, Manimozhiyan; Sunagawa, Shinichi

    2013-01-01

    Whereas large-scale efforts have rapidly advanced the understanding and practical impact of human genomic variation, the practical impact of variation is largely unexplored in the human microbiome. We therefore developed a framework for metagenomic variation analysis and applied it to 252 faecal...... metagenomes of 207 individuals from Europe and North America. Using 7.4 billion reads aligned to 101 reference species, we detected 10.3 million single nucleotide polymorphisms (SNPs), 107,991 short insertions/deletions, and 1,051 structural variants. The average ratio of non-synonymous to synonymous...... polymorphism rates of 0.11 was more variable between gut microbial species than across human hosts. Subjects sampled at varying time intervals exhibited individuality and temporal stability of SNP variation patterns, despite considerable composition changes of their gut microbiota. This indicates...

  14. Genome-wide analysis of copy number variations reveals that aging processes influence body fat distribution in Korea Associated Resource (KARE) cohorts.

    Science.gov (United States)

    Lee, Bo-Young; Shin, Dong Hyun; Cho, Seoae; Seo, Kang-Seok; Kim, Heebal

    2012-11-01

    Many anthropometric measures, including body mass index (BMI), waist-to-hip ratio (WHR), and subcutaneous fat thickness, are used as indicators of nutritional status, fertility and predictors of future health outcomes. While BMI is currently the best available estimate of body adiposity, WHR and skinfold thickness at various sites (biceps, triceps, suprailiac, and subscapular) are used as indices of body fat distribution. Copy number variation (CNV) is an attractive emerging approach to the study of associations with various diseases. In this study, we investigated the dosage effect of genes in the CNV genome widely associated with fat distribution phenotypes in large cohorts. We used the Affymetrix genome-wide human SNP Array 5.0 data of 8,842 healthy unrelated adults in KARE cohorts and identified CNVs associated with BMI and fat distribution-related traits including WHR and subcutaneous skinfold thickness at suprailiac (SUP) and subscapular (SUB) sites. CNV segmentation of each chromosome was performed using Golden Helix SVS 7.0, and single regression analysis was used to identify CNVs associated with each phenotype. We found one CNV for BMI, 287 for WHR, 2,157 for SUP, and 2,102 for SUB at the 5% significance level after Holm-Bonferroni correction. Genes included in the CNV were used for the analysis of functional annotations using the Database for Annotation, Visualization and Integrated Discovery (DAVID v6.7b) tool. Functional gene classification analysis identified five significant gene clusters (metallothionein, ATP-binding proteins, ribosomal proteins, kinesin family members, and zinc finger proteins) for SUP, three (keratin-associated proteins, zinc finger proteins, keratins) for SUB, and one (protamines) for WHR. BMI was excluded from this analysis because the entire structure of no gene was identified in the CNV. Based on the analysis of genes enriched in the clusters, the fat distribution traits of KARE cohorts were related to the fat redistribution

  15. Discrepancy variation of dinucleotide microsatellite repeats in eukaryotic genomes.

    Science.gov (United States)

    Gao, Huan; Cai, Shengli; Yan, Binlun; Chen, Baiyao; Yu, Fei

    2009-01-01

    To address whether there are differences of variation among repeat motif types and among taxonomic groups, we present here an analysis of variation and correlation of dinucleotide microsatellite repeats in eukaryotic genomes. Ten taxonomic groups were compared, those being primates, mammalia (excluding primates and rodentia), rodentia, birds, fish, amphibians and reptiles, insects, molluscs, plants and fungi, respectively. The data used in the analysis is from the literature published in the Journal of Molecular Ecology Notes. Analysis of variation reveals that there are no significant differences between AC and AG repeat motif types. Moreover, the number of alleles correlates positively with the copy number in both AG and AC repeats. Similar conclusions can be obtained from each taxonomic group. These results strongly suggest that the increase of SSR variation is almost linear with the increase of the copy number of each repeat motif. As well, the results suggest that the variability of SSR in the genomes of low-ranking species seem to be more than that of high-ranking species, excluding primates and fungi.

  16. Genome size variation in the genus Avena.

    Science.gov (United States)

    Yan, Honghai; Martin, Sara L; Bekele, Wubishet A; Latta, Robert G; Diederichsen, Axel; Peng, Yuanying; Tinker, Nicholas A

    2016-03-01

    Genome size is an indicator of evolutionary distance and a metric for genome characterization. Here, we report accurate estimates of genome size in 99 accessions from 26 species of Avena. We demonstrate that the average genome size of C genome diploid species (2C = 10.26 pg) is 15% larger than that of A genome species (2C = 8.95 pg), and that this difference likely accounts for a progression of size among tetraploid species, where AB genome configuration had similar genome sizes (average 2C = 25.74 pg). Genome size was mostly consistent within species and in general agreement with current information about evolutionary distance among species. Results also suggest that most of the polyploid species in Avena have experienced genome downsizing in relation to their diploid progenitors. Genome size measurements could provide additional quality control for species identification in germplasm collections, especially in cases where diploid and polyploid species have similar morphology.

  17. Genetic variation and the de novo assembly of human genomes.

    Science.gov (United States)

    Chaisson, Mark J P; Wilson, Richard K; Eichler, Evan E

    2015-11-01

    The discovery of genetic variation and the assembly of genome sequences are both inextricably linked to advances in DNA-sequencing technology. Short-read massively parallel sequencing has revolutionized our ability to discover genetic variation but is insufficient to generate high-quality genome assemblies or resolve most structural variation. Full resolution of variation is only guaranteed by complete de novo assembly of a genome. Here, we review approaches to genome assembly, the nature of gaps or missing sequences, and biases in the assembly process. We describe the challenges of generating a complete de novo genome assembly using current technologies and the impact that being able to perfectly sequence the genome would have on understanding human disease and evolution. Finally, we summarize recent technological advances that improve both contiguity and accuracy and emphasize the importance of complete de novo assembly as opposed to read mapping as the primary means to understanding the full range of human genetic variation.

  18. Genome-Wide Associations of Gene Expression Variation in Humans.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  19. Genome-wide associations of gene expression variation in humans.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  20. Intrapopulation Genome Size Variation in D. melanogaster Reflects Life History Variation and Plasticity

    Science.gov (United States)

    Ellis, Lisa L.; Huang, Wen; Quinn, Andrew M.; Ahuja, Astha; Alfrejd, Ben; Gomez, Francisco E.; Hjelmen, Carl E.; Moore, Kristi L.; Mackay, Trudy F. C.; Johnston, J. Spencer; Tarone, Aaron M.

    2014-01-01

    We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions. PMID:25057905

  1. Intrapopulation genome size variation in D. melanogaster reflects life history variation and plasticity.

    Directory of Open Access Journals (Sweden)

    Lisa L Ellis

    2014-07-01

    Full Text Available We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions.

  2. Copy number variation in the horse genome.

    Directory of Open Access Journals (Sweden)

    Sharmila Ghosh

    2014-10-01

    Full Text Available We constructed a 400K WG tiling oligoarray for the horse and applied it for the discovery of copy number variations (CNVs in 38 normal horses of 16 diverse breeds, and the Przewalski horse. Probes on the array represented 18,763 autosomal and X-linked genes, and intergenic, sub-telomeric and chrY sequences. We identified 258 CNV regions (CNVRs across all autosomes, chrX and chrUn, but not in chrY. CNVs comprised 1.3% of the horse genome with chr12 being most enriched. American Miniature horses had the highest and American Quarter Horses the lowest number of CNVs in relation to Thoroughbred reference. The Przewalski horse was similar to native ponies and draft breeds. The majority of CNVRs involved genes, while 20% were located in intergenic regions. Similar to previous studies in horses and other mammals, molecular functions of CNV-associated genes were predominantly in sensory perception, immunity and reproduction. The findings were integrated with previous studies to generate a composite genome-wide dataset of 1476 CNVRs. Of these, 301 CNVRs were shared between studies, while 1174 were novel and require further validation. Integrated data revealed that to date, 41 out of over 400 breeds of the domestic horse have been analyzed for CNVs, of which 11 new breeds were added in this study. Finally, the composite CNV dataset was applied in a pilot study for the discovery of CNVs in 6 horses with XY disorders of sexual development. A homozygous deletion involving AKR1C gene cluster in chr29 in two affected horses was considered possibly causative because of the known role of AKR1C genes in testicular androgen synthesis and sexual development. While the findings improve and integrate the knowledge of CNVs in horses, they also show that for effective discovery of variants of biomedical importance, more breeds and individuals need to be analyzed using comparable methodological approaches.

  3. Copy number variation in the horse genome.

    Science.gov (United States)

    Ghosh, Sharmila; Qu, Zhipeng; Das, Pranab J; Fang, Erica; Juras, Rytis; Cothran, E Gus; McDonell, Sue; Kenney, Daniel G; Lear, Teri L; Adelson, David L; Chowdhary, Bhanu P; Raudsepp, Terje

    2014-10-01

    We constructed a 400K WG tiling oligoarray for the horse and applied it for the discovery of copy number variations (CNVs) in 38 normal horses of 16 diverse breeds, and the Przewalski horse. Probes on the array represented 18,763 autosomal and X-linked genes, and intergenic, sub-telomeric and chrY sequences. We identified 258 CNV regions (CNVRs) across all autosomes, chrX and chrUn, but not in chrY. CNVs comprised 1.3% of the horse genome with chr12 being most enriched. American Miniature horses had the highest and American Quarter Horses the lowest number of CNVs in relation to Thoroughbred reference. The Przewalski horse was similar to native ponies and draft breeds. The majority of CNVRs involved genes, while 20% were located in intergenic regions. Similar to previous studies in horses and other mammals, molecular functions of CNV-associated genes were predominantly in sensory perception, immunity and reproduction. The findings were integrated with previous studies to generate a composite genome-wide dataset of 1476 CNVRs. Of these, 301 CNVRs were shared between studies, while 1174 were novel and require further validation. Integrated data revealed that to date, 41 out of over 400 breeds of the domestic horse have been analyzed for CNVs, of which 11 new breeds were added in this study. Finally, the composite CNV dataset was applied in a pilot study for the discovery of CNVs in 6 horses with XY disorders of sexual development. A homozygous deletion involving AKR1C gene cluster in chr29 in two affected horses was considered possibly causative because of the known role of AKR1C genes in testicular androgen synthesis and sexual development. While the findings improve and integrate the knowledge of CNVs in horses, they also show that for effective discovery of variants of biomedical importance, more breeds and individuals need to be analyzed using comparable methodological approaches.

  4. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  5. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  6. Exploring functional elements and genomic variation in the noncoding genome

    NARCIS (Netherlands)

    van Heesch, S.A.A.C.

    2014-01-01

    Gene expression regulation is a delicate process that depends on multiple aspects including genome structure and transcription factor binding to DNA elements. The majority of our genome consists of noncoding DNA, which was shown to be crucial in providing the correct context for genome function. Alt

  7. Exploring functional elements and genomic variation in the noncoding genome

    NARCIS (Netherlands)

    van Heesch, S.A.A.C.|info:eu-repo/dai/nl/336463286

    2014-01-01

    Gene expression regulation is a delicate process that depends on multiple aspects including genome structure and transcription factor binding to DNA elements. The majority of our genome consists of noncoding DNA, which was shown to be crucial in providing the correct context for genome function. Alt

  8. Genome-wide profiling of structural genomic variations in Korean HapMap individuals.

    Directory of Open Access Journals (Sweden)

    Joon Seol Bae

    Full Text Available BACKGROUND: Structural genomic variation study, along with microarray technology development has provided many genomic resources related with architecture of human genome, and led to the fact that human genome structure is a lot more complicated than previously thought. METHODOLOGY/PRINCIPAL FINDINGS: In the case of International HapMap Project, Epstein-Barr various immortalized cell lines were preferably used over blood in order to get a larger number of genomic DNA. However, genomic aberration stemming from immortalization process, biased representation of the donor tissue, and culture process may influence the accuracy of SNP genotypes. In order to identify chromosome aberrations including loss of heterozygosity (LOH, large-scale and small-scale copy number variations, we used Illumina HumanHap500 BeadChip (555,352 markers on Korean HapMap individuals (n = 90 to obtain Log R ratio and B allele frequency information, and then utilized the data with various programs including Illumina ChromoZone, cnvParition and PennCNV. As a result, we identified 28 LOHs (>3 mb and 35 large-scale CNVs (>1 mb, with 4 samples having completely duplicated chromosome. In addition, after checking the sample quality (standard deviation of log R ratio <0.30, we selected 79 samples and used both signal intensity and B allele frequency simultaneously for identification of small-scale CNVs (<1 mb to discover 4,989 small-scale CNVs. Identified CNVs in this study were successfully validated using visual examination of the genoplot images, overlapping analysis with previously reported CNVs in DGV, and quantitative PCR. CONCLUSION/SIGNIFICANCE: In this study, we describe the result of the identified chromosome aberrations in Korean HapMap individuals, and expect that these findings will provide more meaningful information on the human genome.

  9. Identification of Sesame Genomic Variations from Genome Comparison of Landrace and Variety.

    Science.gov (United States)

    Wei, Xin; Zhu, Xiaodong; Yu, Jingyin; Wang, Linhai; Zhang, Yanxin; Li, Donghua; Zhou, Rong; Zhang, Xiurong

    2016-01-01

    Sesame (Sesamum indicum L.) is one of the main oilseed crops, providing vegetable oil and protein to human. Landrace is the gene source of variety, carrying many desire alleles for genetic improvement. Despite the importance of sesame landrace, genome of sesame landrace remains unexplored and genomic variations between landrace and variety still is not clear. To identify the genomic variations between sesame landrace and variety, two representative sesame landrace accessions, "Baizhima" and "Mishuozhima," were selected and re-sequenced. The genome sequencing and de novo assembling of the two sesame landraces resulted in draft genomes of 267 Mb and 254 Mb, respectively, with the contig N50 more than 47 kb. Totally, 1,332,025 SNPs and 506,245 InDels were identified from the genome of "Baizhima" and "Mishuozhima" by comparison of the genome of a variety "Zhongzhi13." Among the genomic variations, 70,018 SNPs and 8311 InDels were located in the coding regions of genes. Genomic variations may contribute to variation of sesame agronomic traits such as flowering time, plant height, and oil content. The identified genomic variations were successfully used in the QTL mapping and the black pigment synthesis gene, PPO, was found to be the candidate gene of sesame seed coat color. The comprehensively compared genomes of sesame landrace and modern variety produced massive useful genomic information, constituting a powerful tool to support genetic research, and molecular breeding of sesame.

  10. Identification of Sesame Genomic Variations from Genome Comparison of Landrace and Variety

    Science.gov (United States)

    Wei, Xin; Zhu, Xiaodong; Yu, Jingyin; Wang, Linhai; Zhang, Yanxin; Li, Donghua; Zhou, Rong; Zhang, Xiurong

    2016-01-01

    Sesame (Sesamum indicum L.) is one of the main oilseed crops, providing vegetable oil and protein to human. Landrace is the gene source of variety, carrying many desire alleles for genetic improvement. Despite the importance of sesame landrace, genome of sesame landrace remains unexplored and genomic variations between landrace and variety still is not clear. To identify the genomic variations between sesame landrace and variety, two representative sesame landrace accessions, “Baizhima” and “Mishuozhima,” were selected and re-sequenced. The genome sequencing and de novo assembling of the two sesame landraces resulted in draft genomes of 267 Mb and 254 Mb, respectively, with the contig N50 more than 47 kb. Totally, 1,332,025 SNPs and 506,245 InDels were identified from the genome of “Baizhima” and “Mishuozhima” by comparison of the genome of a variety “Zhongzhi13.” Among the genomic variations, 70,018 SNPs and 8311 InDels were located in the coding regions of genes. Genomic variations may contribute to variation of sesame agronomic traits such as flowering time, plant height, and oil content. The identified genomic variations were successfully used in the QTL mapping and the black pigment synthesis gene, PPO, was found to be the candidate gene of sesame seed coat color. The comprehensively compared genomes of sesame landrace and modern variety produced massive useful genomic information, constituting a powerful tool to support genetic research, and molecular breeding of sesame. PMID:27536315

  11. Characterization of copy number variation in genomic regions containing STR loci using array comparative genomic hybridization.

    Science.gov (United States)

    Repnikova, Elena A; Rosenfeld, Jill A; Bailes, Andrea; Weber, Cecilia; Erdman, Linda; McKinney, Aimee; Ramsey, Sarah; Hashimoto, Sayaka; Lamb Thrush, Devon; Astbury, Caroline; Reshmi, Shalini C; Shaffer, Lisa G; Gastier-Foster, Julie M; Pyatt, Robert E

    2013-09-01

    Short tandem repeat (STR) loci are commonly used in forensic casework, familial analysis for human identification, and for monitoring hematopoietic cell engraftment after bone marrow transplant. Unexpected genetic variation leading to sequence and length differences in STR loci can complicate STR typing, and presents challenges in casework interpretation. Copy number variation (CNV) is a relatively recently identified form of genetic variation consisting of genomic regions present at variable copy numbers within an individual compared to a reference genome. Large scale population studies have demonstrated that likely all individuals carry multiple regions with CNV of 1kb in size or greater in their genome. To date, no study correlating genomic regions containing STR loci with CNV has been conducted. In this study, we analyzed results from 32,850 samples sent for clinical array comparative genomic hybridization (CGH) analysis for the presence of CNV at regions containing the 13 CODIS (Combined DNA Index System) STR, and the Amelogenin X (AMELX) and Amelogenin Y (AMELY) loci. Thirty-two individuals with CNV involving STR loci on chromosomes 2, 4, 7, 11, 12, 13, 16, and 21, and twelve with CNV involving the AMELX/AMELY loci were identified. These results were correlated with data from publicly available databases housing information on CNV identified in normal populations and additional clinical cases. These collective results demonstrate the presence of CNV in regions containing 9 of the 13 CODIS STR and AMELX/Y loci. Further characterization of STR profiles within regions of CNV, additional cataloging of these variants in multiple populations, and contributing such examples to the public domain will provide valuable information for reliable use of these loci.

  12. Genome-Wide Copy Number Variation Analysis in Extended Families and Unrelated Individuals Characterized for Musical Aptitude and Creativity in Music

    OpenAIRE

    Ukkola-Vuoti, Liisa; Kanduri, Chakravarthi; Oikkonen, Jaana; Buck, Gemma; Blancher, Christine; Raijas, Pirre; Karma, Kai; Lähdesmäki, Harri; Järvelä, Irma

    2013-01-01

    Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs) in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the...

  13. Genomic variability in Mexican chicken population using Copy Number Variation

    Directory of Open Access Journals (Sweden)

    Erica Gorla

    2017-05-01

    Full Text Available Copy number variants (CNVs are polymorphisms which influence phenotypic variation and are an important source of genetic variability [1]. In Mexico the backyard poultry population is a unique widespread Creole chicken (Gallus gallus domesticus population, an undefined cross among different breeds brought to Mexico from Europe and under natural selection for almost 500 years [2-3]. The aim of this study was to investigate genomic variation in the Mexican chicken population using CNVs. A total of 256 DNA samples genotyped with Axiom® Genome-Wide Chicken Genotyping Array were used in the analyses. The individual CNV calling, based on log-R ratio and B-allele frequency values, was performed using the Hidden Markov Model (HMM of PennCNV software on the autosomes [4-5]. CNVs were summarized to CNV regions (CNVRs at a population level (i.e. overlapping CNVs, using BEDTools. The HMM detected a total of 1924 CNVs in the genome of 256 samples resulting, at population level, in 1216 CNV regions, of which 959 gains, 226 losses and 31 complex CNVRs (i.e. containing both losses and gains, covering a total of 47 Mb of sequence length corresponding to 5,12 % of the chicken galGal4 assembly autosome. A comparison among this study and 7 previous reports about CNVs in chicken was performed, finding that the 1,216 CNVRs detected in this study overlap with 617 regions (51% mapped by others studies.   This study allowed a deep insight into the structural variation in the genome of unselected Mexican chicken population, which up to now has not been never genetically characterized with SNP markers. Based on a cluster analysis (pvclust – R package on CNV markers the population, even if presenting extreme morphological variation, does not resulted divided in differentiated genetic subpopulations. Finally this study provides a CNV map based on the 600K SNP chip array jointly with a genome-wide gene copy number estimates in Mexican chicken population.

  14. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations

    Science.gov (United States)

    McNally, Alan; Oren, Yaara; Kelly, Darren; Sreecharan, Tristan; Vehkala, Minna; Välimäki, Niko; Prentice, Michael B.; Ashour, Amgad; Avram, Oren; Pupko, Tal; Literak, Ivan; Guenther, Sebastian; Schaufler, Katharina; Wieler, Lothar H.; Zhiyong, Zong; Sheppard, Samuel K.; Corander, Jukka

    2016-01-01

    The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug–resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements. PMID:27618184

  15. Large-scale genetic variation of the symbiosis-required megaplasmid pSymA revealed by comparative genomic analysis of Sinorhizobium meliloti natural strains

    Directory of Open Access Journals (Sweden)

    Landry Christian R

    2005-11-01

    Full Text Available Abstract Background Sinorhizobium meliloti is a soil bacterium that forms nitrogen-fixing nodules on the roots of leguminous plants such as alfalfa (Medicago sativa. This species occupies different ecological niches, being present as a free-living soil bacterium and as a symbiont of plant root nodules. The genome of the type strain Rm 1021 contains one chromosome and two megaplasmids for a total genome size of 6 Mb. We applied comparative genomic hybridisation (CGH on an oligonucleotide microarrays to estimate genetic variation at the genomic level in four natural strains, two isolated from Italian agricultural soil and two from desert soil in the Aral Sea region. Results From 4.6 to 5.7 percent of the genes showed a pattern of hybridisation concordant with deletion, nucleotide divergence or ORF duplication when compared to the type strain Rm 1021. A large number of these polymorphisms were confirmed by sequencing and Southern blot. A statistically significant fraction of these variable genes was found on the pSymA megaplasmid and grouped in clusters. These variable genes were found to be mainly transposases or genes with unknown function. Conclusion The obtained results allow to conclude that the symbiosis-required megaplasmid pSymA can be considered the major hot-spot for intra-specific differentiation in S. meliloti.

  16. Common genetic variation near the phospholamban gene is associated with cardiac repolarisation : Meta-analysis of three genome-wide association studies

    NARCIS (Netherlands)

    I.M. Nolte (Ilja); C. Wallace (Chris); S.J. Newhouse (Stephen); D. Waggott (Daryl); J. Fu (Jingyuan); N. Soranzo (Nicole); R. Gwilliam (Rhian); S. Demissie (Serkalem); I. Savelieva (Irina); D. Zheng (Dongling); C. Dalageorgou (Chrysoula); M. Farrall (Martin); N.J. Samani (Nilesh); J. Connell (John); M.J. Brown (Morris); A. Dominiczak (Anna); M. Lathrop (Mark); E. Zeggini (Eleftheria); L.V. Wain (Louise); C. Newton-Cheh (Christopher); M. Eijgelsheim (Mark); K. Rice (Kenneth); P.I.W. de Bakker (Paul); A. Pfeufer (Arne); S. Sanna (Serena); D.E. Arking (Dan); F.W. Asselbergs (Folkert); T.D. Spector (Tim); N.D. Carter (Nicholas); S. Jeffery (Steve); M. Tobin (Martin); M. Caulfield (Mark); H. Snieder (Harold); A.D. Paterson (Andrew); P. Munroe (Patricia); Y. Jamshidi (Yalda)

    2009-01-01

    textabstractTo identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the Tw

  17. Common Genetic Variation Near the Phospholamban Gene Is Associated with Cardiac Repolarisation : Meta-Analysis of Three Genome-Wide Association Studies

    NARCIS (Netherlands)

    Nolte, Ilja M.; Wallace, Chris; Newhouse, Stephen J.; Waggott, Daryl; Fu, Jingyuan; Soranzo, Nicole; Gwilliam, Rhian; Deloukas, Panos; Savelieva, Irina; Zheng, Dongling; Dalageorgou, Chrysoula; Farrall, Martin; Samani, Nilesh J.; Connell, John; Brown, Morris; Dominiczak, Anna; Lathrop, Mark; Zeggini, Eleftheria; Wain, Louise V.; Newton-Cheh, Christopher; Eijgelsheim, Mark; Rice, Kenneth; de Bakker, Paul I. W.; Pfeufer, Arne; Sanna, Serena; Arking, Dan E.; Asselbergs, Folkert W.; Spector, Tim D.; Carter, Nicholas D.; Jeffery, Steve; Tobin, Martin; Caulfield, Mark; Snieder, Harold; Paterson, Andrew D.; Munroe, Patricia B.; Jamshidi, Yalda

    2009-01-01

    To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the TwinsUK and BR

  18. Common genetic variation near the phospholamban gene is associated with cardiac repolarisation : Meta-analysis of three genome-wide association studies

    NARCIS (Netherlands)

    I.M. Nolte (Ilja); C. Wallace (Chris); S.J. Newhouse (Stephen); D. Waggott (Daryl); J. Fu (Jingyuan); N. Soranzo (Nicole); R. Gwilliam (Rhian); S. Demissie (Serkalem); I. Savelieva (Irina); D. Zheng (Dongling); C. Dalageorgou (Chrysoula); M. Farrall (Martin); N.J. Samani (Nilesh); J. Connell (John); M.J. Brown (Morris); A. Dominiczak (Anna); M. Lathrop (Mark); E. Zeggini (Eleftheria); L.V. Wain (Louise); C. Newton-Cheh (Christopher); M. Eijgelsheim (Mark); K. Rice (Kenneth); P.I.W. de Bakker (Paul); A. Pfeufer (Arne); S. Sanna (Serena); D.E. Arking (Dan); F.W. Asselbergs (Folkert); T.D. Spector (Tim); N.D. Carter (Nicholas); S. Jeffery (Steve); M. Tobin (Martin); M. Caulfield (Mark); H. Snieder (Harold); A.D. Paterson (Andrew); P. Munroe (Patricia); Y. Jamshidi (Yalda)

    2009-01-01

    textabstractTo identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the

  19. Common Genetic Variation Near the Phospholamban Gene Is Associated with Cardiac Repolarisation : Meta-Analysis of Three Genome-Wide Association Studies

    NARCIS (Netherlands)

    Nolte, Ilja M.; Wallace, Chris; Newhouse, Stephen J.; Waggott, Daryl; Fu, Jingyuan; Soranzo, Nicole; Gwilliam, Rhian; Deloukas, Panos; Savelieva, Irina; Zheng, Dongling; Dalageorgou, Chrysoula; Farrall, Martin; Samani, Nilesh J.; Connell, John; Brown, Morris; Dominiczak, Anna; Lathrop, Mark; Zeggini, Eleftheria; Wain, Louise V.; Newton-Cheh, Christopher; Eijgelsheim, Mark; Rice, Kenneth; de Bakker, Paul I. W.; Pfeufer, Arne; Sanna, Serena; Arking, Dan E.; Asselbergs, Folkert W.; Spector, Tim D.; Carter, Nicholas D.; Jeffery, Steve; Tobin, Martin; Caulfield, Mark; Snieder, Harold; Paterson, Andrew D.; Munroe, Patricia B.; Jamshidi, Yalda

    2009-01-01

    To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the TwinsUK and

  20. AGAPE (Automated Genome Analysis PipelinE for pan-genome analysis of Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Giltae Song

    Full Text Available The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  1. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

    Science.gov (United States)

    Song, Giltae; Dickins, Benjamin J A; Demeter, Janos; Engel, Stacia; Gallagher, Jennifer; Choe, Kisurb; Dunn, Barbara; Snyder, Michael; Cherry, J Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  2. Genome Architecture and Its Roles in Human Copy Number Variation

    Directory of Open Access Journals (Sweden)

    Lu Chen

    2014-12-01

    Full Text Available Besides single-nucleotide variants in the human genome, large-scale genomic variants, such as copy number variations (CNVs, are being increasingly discovered as a genetic source of human diversity and the pathogenic factors of diseases. Recent experimental findings have shed light on the links between different genome architectures and CNV mutagenesis. In this review, we summarize various genomic features and discuss their contributions to CNV formation. Genomic repeats, including both low-copy and high-copy repeats, play important roles in CNV instability, which was initially known as DNA recombination events. Furthermore, it has been found that human genomic repeats can also induce DNA replication errors and consequently result in CNV mutations. Some recent studies showed that DNA replication timing, which reflects the high-order information of genomic organization, is involved in human CNV mutations. Our review highlights that genome architecture, from DNA sequence to high-order genomic organization, is an important molecular factor in CNV mutagenesis and human genomic instability.

  3. Salmon and steelhead genetics and genomics - Epigenetic and genomic variation in salmon and steelhead

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Conduct analyses of epigenetic and genomic variation in Chinook salmon and steelhead to determine influence on phenotypic expression of life history traits. Genetic,...

  4. Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate.

    Science.gov (United States)

    Lasky, Jesse R; Des Marais, David L; McKay, John K; Richards, James H; Juenger, Thomas E; Keitt, Timothy H

    2012-11-01

    Arabidopsis thaliana inhabits diverse climates and exhibits varied phenology across its range. Although A. thaliana is an extremely well-studied model species, the relationship between geography, growing season climate and its genetic variation is poorly characterized. We used redundancy analysis (RDA) to quantify the association of genomic variation [214 051 single nucleotide polymorphisms (SNPs)] with geography and climate among 1003 accessions collected from 447 locations in Eurasia. We identified climate variables most correlated with genomic variation, which may be important selective gradients related to local adaptation across the species range. Climate variation among sites of origin explained slightly more genomic variation than geographical distance. Large-scale spatial gradients and early spring temperatures explained the most genomic variation, while growing season and summer conditions explained the most after controlling for spatial structure. SNP variation in Scandinavia showed the greatest climate structure among regions, possibly because of relatively consistent phenology and life history of populations in this region. Climate variation explained more variation among nonsynonymous SNPs than expected by chance, suggesting that much of the climatic structure of SNP correlations is due to changes in coding sequence that may underlie local adaptation.

  5. Genetic contributions to variation in general cognitive function: A meta-analysis of genome-wide association studies in the CHARGE consortium (N=53 949)

    NARCIS (Netherlands)

    G. Davies (Gail); N.J. Armstrong (Nicola J.); J.C. Bis (Joshua); J. Bressler (Jan); V. Chouraki (Vincent); S. Giddaluru (Sudheer); E. Hofer; C.A. Ibrahim-Verbaas (Carla); M. Kirin (Mirna); J. Lahti; S. van der Lee (Sven); S. Le Hellard (Stephanie); T. Liu; R.E. Marioni (Riccardo); C. Oldmeadow (Christopher); D. Postmus (Douwe); G.D. Smith; J.A. Smith (Jennifer A); A. Thalamuthu (Anbupalam); R. Thomson (Russell); V. Vitart (Veronique); J. Wang; L. Yu; L. Zgaga (Lina); W. Zhao (Wei); R. Boxall (Ruth); S.E. Harris (Sarah); W.D. Hill (W. David); D.C. Liewald (David C.); M. Luciano (Michelle); H.H.H. Adams (Hieab); D. Ames; N. Amin (Najaf); P. Amouyel (Philippe); A.A. Assareh; R. Au; J.T. Becker; A. Beiser; C. Berr (Claudine); L. Bertram (Lars); E.A. Boerwinkle (Eric); B.M. Buckley (Brendan M.); H. Campbell (Harry); J. Corley; P.L. De Jager; C. Dufouil (Carole); J.G. Eriksson (Johan G.); T. Espeseth (Thomas); J.D. Faul; I. Ford; G. Scotland (Generation); R.F. Gottesman (Rebecca); M.D. Griswold (Michael); V. Gudnason (Vilmundur); T.B. Harris; G. Heiss (Gerardo); A. Hofman (Albert); E.G. Holliday (Elizabeth); J.E. Huffman (Jennifer); S.L.R. Kardia (Sharon); N.A. Kochan (Nicole A.); D.S. Knopman (David); J.B. Kwok; J.-C. Lambert; T. Lee; G. Li; S.-C. Li; M. Loitfelder (Marisa); O.L. Lopez (Oscar); A.J. Lundervold; A. Lundqvist; R. Mather; S.S. Mirza (Saira S.aeed); L. Nyberg; B.A. Oostra (Ben); A. Palotie (Aarno); G. Papenberg; A. Pattie (Alison); K. Petrovic (Katja); O. Polasek (Ozren); B.M. Psaty (Bruce); P. Redmond (Paul); S. Reppermund; J.I. Rotter; R. Schmidt (Reinhold); M. Schuur (Maaike); P.W. Schofield; R.J. Scott; V.M. Steen (Vidar); D.J. Stott (David J.); J.C. van Swieten (John); K.D. Taylor (Kent); J. Trollor; S. Trompet (Stella); A.G. Uitterlinden (André); G. Weinstein; E. Widen (Elisabeth); B.G. Windham (B Gwen); J.W. Jukema (Jan Wouter); A. Wright (Alan); M.J. Wright (Margaret); Q. Yang (Qiong Fang); H. Amieva (Hélène); J. Attia (John); D.A. Bennett (David); H. Brodaty (Henry); A.J. de Craen (Anton); C. Hayward; M.A. Ikram (Arfan); U. Lindenberger; L.-G. Nilsson; D.J. Porteous (David J.); K. Räikkönen (Katri); I. Reinvang (Ivar); I. Rudan (Igor); P.S. Sachdev (Perminder); R. Schmidt; P. Schofield (Peter); V. Srikanth; J.M. Starr (John); S.T. Turner (Stephen); D.R. Weir (David R.); J.F. Wilson (James F); C.M. van Duijn (Cornelia M.); L.J. Launer (Lenore); A.L. Fitzpatrick (Annette); S. Seshadri (Sudha); T.H. Mosley (Thomas H.); I.J. Deary (Ian J.)

    2015-01-01

    textabstractGeneral cognitive function is substantially heritable across the human life course from adolescence to old age. We investigated the genetic contribution to variation in this important, health- and well-being-related trait in middle-aged and older adults. We conducted a meta-analysis of g

  6. Using multilocus sequence typing to study bacterial variation: prospects in the genomic era.

    Science.gov (United States)

    Jolley, Keith A; Maiden, Martin C J

    2014-01-01

    Multilocus sequence typing (MLST) indexes the sequence variation present in a small number (usually seven) of housekeeping gene fragments located around the bacterial genome. Unique alleles at these loci are assigned arbitrary integer identifiers, which effectively summarizes the variation present in several thousand base pairs of genome sequence information as a series of numbers. Comparing bacterial isolates using allele-based methods efficiently corrects for the effects of lateral gene transfer present in many bacterial populations and is computationally efficient. This 'gene-by-gene' approach can be applied to larger collections of loci, such as the ribosomal protein genes used in ribosomal MLST (rMLST), up to and including the complete set of coding sequences present in a genome, whole-genome MLST (wgMLST), providing scalable, efficient and readily interpreted genome analysis.

  7. Genome-wide linkage and copy number variation analysis reveals 710 kb duplication on chromosome 1p31.3 responsible for autosomal dominant omphalocele

    Science.gov (United States)

    Radhakrishna, Uppala; Nath, Swapan K; McElreavey, Ken; Ratnamala, Uppala; Sun, Celi; Maiti, Amit K; Gagnebin, Maryline; Béna, Frédérique; Newkirk, Heather L; Sharp, Andrew J; Everman, David B; Murray, Jeffrey C; Schwartz, Charles E; Antonarakis, Stylianos E; Butler, Merlin G

    2017-01-01

    Background Omphalocele is a congenital birth defect characterised by the presence of internal organs located outside of the ventral abdominal wall. The purpose of this study was to identify the underlying genetic mechanisms of a large autosomal dominant Caucasian family with omphalocele. Methods and findings A genetic linkage study was conducted in a large family with an autosomal dominant transmission of an omphalocele using a genome-wide single nucleotide polymorphism (SNP) array. The analysis revealed significant evidence of linkage (non-parametric NPL = 6.93, p=0.0001; parametric logarithm of odds (LOD) = 2.70 under a fully penetrant dominant model) at chromosome band 1p31.3. Haplotype analysis narrowed the locus to a 2.74 Mb region between markers rs2886770 (63014807 bp) and rs1343981 (65757349 bp). Molecular characterisation of this interval using array comparative genomic hybridisation followed by quantitative microsphere hybridisation analysis revealed a 710 kb duplication located at 63.5–64.2 Mb. All affected individuals who had an omphalocele and shared the haplotype were positive for this duplicated region, while the duplication was absent from all normal individuals of this family. Multipoint linkage analysis using the duplication as a marker yielded a maximum LOD score of 3.2 at 1p31.3 under a dominant model. The 710 kb duplication at 1p31.3 band contains seven known genes including FOXD3, ALG6, ITGB3BP, KIAA1799, DLEU2L, PGM1, and the proximal portion of ROR1. Importantly, this duplication is absent from the database of genomic variants. Conclusions The present study suggests that development of an omphalocele in this family is controlled by overexpression of one or more genes in the duplicated region. To the authors’ knowledge, this is the first reported association of an inherited omphalocele condition with a chromosomal rearrangement. PMID:22499347

  8. Spectrogram Analysis of Genomes

    Directory of Open Access Journals (Sweden)

    David Sussillo

    2004-01-01

    Full Text Available We performed frequency-domain analysis in the genomes of various organisms using tricolor spectrograms, identifying several types of distinct visual patterns characterizing specific DNA regions. We relate patterns and their frequency characteristics to the sequence characteristics of the DNA. At times, the spectrogram patterns could be related to the structure of the corresponding protein region by using various public databases such as GenBank. Some patterns are explained from the biological nature of the corresponding regions, which relate to chromosome structure and protein coding, and some patterns have yet unknown biological significance. We found biologically meaningful patterns, on the scale of millions of base pairs, to a few hundred base pairs. Chromosome-wide patterns include periodicities ranging from 2 to 300. The color of the spectrogram depends on the nucleotide content at specific frequencies, and therefore can be used as a local indicator of CG content and other measures of relative base content. Several smaller-scale patterns were found to represent different types of domains made up of various tandem repeats.

  9. Are we Genomic Mosaics? Variations of the Genome of Somatic Cells can Contribute to Diversify our Phenotypes.

    Science.gov (United States)

    Astolfi, P A; Salamini, F; Sgaramella, V

    2010-09-01

    Theoretical and experimental evidences support the hypothesis that the genomes and the epigenomes may be different in the somatic cells of complex organisms. In the genome, the differences range from single base substitutions to chromosome number; in the epigenome, they entail multiple postsynthetic modifications of the chromatin. Somatic genome variations (SGV) may accumulate during development in response both to genetic programs, which may differ from tissue to tissue, and to environmental stimuli, which are often undetected and generally irreproducible. SGV may jeopardize physiological cellular functions, but also create novel coding and regulatory sequences, to be exposed to intraorganismal Darwinian selection. Genomes acknowledged as comparatively poor in genes, such as humans', could thus increase their pristine informational endowment. A better understanding of SGV will contribute to basic issues such as the "nature vs nurture" dualism and the inheritance of acquired characters. On the applied side, they may explain the low yield of cloning via somatic cell nuclear transfer, provide clues to some of the problems associated with transdifferentiation, and interfere with individual DNA analysis. SGV may be unique in the different cells types and in the different developmental stages, and thus explain the several hundred gaps persisting in the human genomes "completed" so far. They may compound the variations associated to our epigenomes and make of each of us an "(epi)genomic" mosaic. An ensuing paradigm is the possibility that a single genome (the ephemeral one assembled at fertilization) has the capacity to generate several different brains in response to different environments.

  10. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

    DEFF Research Database (Denmark)

    Zhan, Bujie; Fadista, João; Thomsen, Bo

    2011-01-01

    sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were...... of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation...

  11. Comparative genome analysis of VSP-II and SNPs reveals heterogenic variation in contemporary strains of Vibrio cholerae O1 isolated from cholera patients in Kolkata, India.

    Science.gov (United States)

    Imamura, Daisuke; Morita, Masatomo; Sekizuka, Tsuyoshi; Mizuno, Tamaki; Takemura, Taichiro; Yamashiro, Tetsu; Chowdhury, Goutam; Pazhani, Gururaja P; Mukhopadhyay, Asish K; Ramamurthy, Thandavarayan; Miyoshi, Shin-Ichi; Kuroda, Makoto; Shinoda, Sumio; Ohnishi, Makoto

    2017-02-13

    Cholera is an acute diarrheal disease and a major public health problem in many developing countries in Asia, Africa, and Latin America. Since the Bay of Bengal is considered the epicenter for the seventh cholera pandemic, it is important to understand the genetic dynamism of Vibrio cholerae from Kolkata, as a representative of the Bengal region. We analyzed whole genome sequence data of V. cholerae O1 isolated from cholera patients in Kolkata, India, from 2007 to 2014 and identified the heterogeneous genomic region in these strains. In addition, we carried out a phylogenetic analysis based on the whole genome single nucleotide polymorphisms to determine the genetic lineage of strains in Kolkata. This analysis revealed the heterogeneity of the Vibrio seventh pandemic island (VSP)-II in Kolkata strains. The ctxB genotype was also heterogeneous and was highly related to VSP-II types. In addition, phylogenetic analysis revealed the shifts in predominant strains in Kolkata. Two distinct lineages, 1 and 2, were found between 2007 and 2010. However, the proportion changed markedly in 2010 and lineage 2 strains were predominant thereafter. Lineage 2 can be divided into four sublineages, I, II, III and IV. The results of this study indicate that lineages 1 and 2-I were concurrently prevalent between 2007 and 2009, and lineage 2-III observed in 2010, followed by the predominance of lineage 2-IV in 2011 and continued until 2014. Our findings demonstrate that the epidemic of cholera in Kolkata was caused by several distinct strains that have been constantly changing within the genetic lineages of V. cholerae O1 in recent years.

  12. The African Genome Variation Project shapes medical genetics in Africa.

    Science.gov (United States)

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

    2015-01-15

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

  13. Genomic Copy Number Variation in Disorders of Cognitive Development

    Science.gov (United States)

    Morrow, Eric M.

    2010-01-01

    Objective: To highlight recent discoveries in the area of genomic copy number variation in neuropsychiatric disorders including intellectual disability, autism, and schizophrenia. To emphasize new principles emerging from this area, involving the genetic architecture of disease, pathophysiology, and diagnosis. Method: Review of studies published…

  14. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip;

    2011-01-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, ...

  15. Repetitive elements, architects of genomic variation in Verticillium

    Science.gov (United States)

    Vascular wilt pathogens in the genus Verticillium show considerable variation with respect to their host ranges, genomic organization, and the variety and number of transposable elements (TEs) that they carry. These families of TE sequences were first documented in the wide host range, plant pathog...

  16. Nuclear DNA content in Sinningia (Gesneriaceae); intraspecific genome size variation and genome characterization in S. speciosa.

    Science.gov (United States)

    Zaitlin, David; Pierce, Andrew J

    2010-12-01

    The Gesneriaceae (Lamiales) is a family of flowering plants comprising >3000 species of mainly tropical origin, the most familiar of which is the cultivated African violet (Saintpaulia spp.). Species of Gesneriaceae are poorly represented in the lists of taxa sampled for genome size estimation; measurements are available for three species of Ramonda and one each of Haberlea, Saintpaulia, and Streptocarpus, all species of Old World origin. We report here nuclear genome size estimates for 10 species of Sinningia, a neotropical genus largely restricted to Brazil. Flow cytometry of leaf cell nuclei showed that holoploid genome size in Sinningia is very small (approximately two times the size of the Arabidopsis genome), and is small compared to the other six species of Gesneriaceae with genome size estimates. We also documented intraspecific genome size variation of 21%-26% within a group of wild Sinningia speciosa (Lodd.) Hiern collections. In addition, we analyzed 1210 genome survey sequences from S. speciosa to characterize basic features of the nuclear genome such as guanine-cytosine content, types of repetitive elements, numbers of protein-coding sequences, and sequences unique to S. speciosa. We included several other angiosperm species as genome size standards, one of which was the snapdragon (Antirrhinum majus L.; Veronicaceae, Lamiales). Multiple measurements on three accessions indicated that the genome size of A. majus is ~633 × 10⁶ base pairs, which is approximately 40% of the previously published estimate.

  17. Genomic and functional characteristics of copy number variations in Angus cattle selected for resistance or susceptibility to gastrointestinal nematodes

    Science.gov (United States)

    Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to intestinal nematodes. In this study, we performed a large sca...

  18. Personal and population genomics of human regulatory variation.

    Science.gov (United States)

    Vernot, Benjamin; Stergachis, Andrew B; Maurano, Matthew T; Vierstra, Jeff; Neph, Shane; Thurman, Robert E; Stamatoyannopoulos, John A; Akey, Joshua M

    2012-09-01

    The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.

  19. Genomic variation at the tips of the adaptive radiation of Darwin's finches.

    Science.gov (United States)

    Chaves, Jaime A; Cooper, Elizabeth A; Hendry, Andrew P; Podos, Jeffrey; De León, Luis F; Raeymaekers, Joost A M; MacMillan, W Owen; Uy, J Albert C

    2016-11-01

    Adaptive radiation unfolds as selection acts on the genetic variation underlying functional traits. The nature of this variation can be revealed by studying the tips of an ongoing adaptive radiation. We studied genomic variation at the tips of the Darwin's finch radiation; specifically focusing on polymorphism within, and variation among, three sympatric species of the genus Geospiza. Using restriction site-associated DNA (RAD-seq), we characterized 32 569 single-nucleotide polymorphisms (SNPs), from which 11 outlier SNPs for beak and body size were uncovered by a genomewide association study (GWAS). Principal component analysis revealed that these 11 SNPs formed four statistically linked groups. Stepwise regression then revealed that the first PC score, which included 6 of the 11 top SNPs, explained over 80% of the variation in beak size, suggesting that selection on these traits influences multiple correlated loci. The two SNPs most strongly associated with beak size were near genes associated with beak morphology across deeper branches of the radiation: delta-like 1 homologue (DLK1) and high-mobility group AT-hook 2 (HMGA2). Our results suggest that (i) key adaptive traits are associated with a small fraction of the genome (11 of 32 569 SNPs), (ii) SNPs linked to the candidate genes are dispersed throughout the genome (on several chromosomes), and (iii) micro- and macro-evolutionary variation (roots and tips of the radiation) involve some shared and some unique genomic regions. © 2016 John Wiley & Sons Ltd.

  20. Whole Genome Pathway Analysis Identifies an Association of Cadmium Response Gene Loss with Copy Number Variation in Mutant p53 Bearing Uterine Endometrial Carcinomas.

    Directory of Open Access Journals (Sweden)

    Joe Ryan Delaney

    Full Text Available Massive chromosomal aberrations are a signature of advanced cancer, although the factors promoting the pervasive incidence of these copy number alterations (CNAs are poorly understood. Gatekeeper mutations, such as p53, contribute to aneuploidy, yet p53 mutant tumors do not always display CNAs. Uterine Corpus Endometrial Carcinoma (UCEC offers a unique system to begin to evaluate why some cancers acquire high CNAs while others evolve another route to oncogenesis, since about half of p53 mutant UCEC tumors have a relatively flat CNA landscape and half have 20-90% of their genome altered in copy number.We extracted copy number information from 68 UCEC genomes mutant in p53 by the GISTIC2 algorithm. GO term pathway analysis, via GOrilla, was used to identify suppressed pathways. Genes within these pathways were mapped for focal or wide distribution. Deletion hotspots were evaluated for temporal incidence.Multiple pathways contributed to the development of pervasive CNAs, including developmental, metabolic, immunological, cell adhesion and cadmium response pathways. Surprisingly, cadmium response pathway genes are predicted as the earliest loss events within these tumors: in particular, the metallothionein genes involved in heavy metal sequestration. Loss of cadmium response genes were associated with copy number changes and poorer prognosis, contrasting with 'copy number flat' tumors which instead exhibited substantive mutation.Metallothioneins are lost early in the development of high CNA endometrial cancer, providing a potential mechanism and biological rationale for increased incidence of endometrial cancer with cadmium exposure. Developmental and metabolic pathways are altered later in tumor progression.

  1. Genome-Wide Copy Number Variation Analysis in Extended Families and Unrelated Individuals Characterized for Musical Aptitude and Creativity in Music

    Science.gov (United States)

    Oikkonen, Jaana; Buck, Gemma; Blancher, Christine; Raijas, Pirre; Karma, Kai; Lähdesmäki, Harri; Järvelä, Irma

    2013-01-01

    Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs) in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the scores of three music tests (COMB scores): auditory structuring ability, Seashores test for pitch and for time. Data on creativity in music (herein composing, improvising and/or arranging music) was surveyed using a web-based questionnaire. Several CNVRs containing genes that affect neurodevelopment, learning and memory were detected. A deletion at 5q31.1 covering the protocadherin-α gene cluster (Pcdha 1-9) was found co-segregating with low music test scores (COMB) in both sample sets. Pcdha is involved in neural migration, differentiation and synaptogenesis. Creativity in music was found to co-segregate with a duplication covering glucose mutarotase gene (GALM) at 2p22. GALM has influence on serotonin release and membrane trafficking of the human serotonin transporter. Interestingly, genes related to serotonergic systems have been shown to associate not only with psychiatric disorders but also with creativity and music perception. Both, Pcdha and GALM, are related to the serotonergic systems influencing cognitive and motor functions, important for music perception and practice. Finally, a 1.3 Mb duplication was identified in a subject with low COMB scores in the region previously linked with absolute pitch (AP) at 8q24. No differences in the CNV burden was detected among the high/low music test scores or creative/non-creative groups. In summary, CNVs and genes found in this study are related to cognitive functions. Our result suggests new candidate genes for music

  2. Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music.

    Science.gov (United States)

    Ukkola-Vuoti, Liisa; Kanduri, Chakravarthi; Oikkonen, Jaana; Buck, Gemma; Blancher, Christine; Raijas, Pirre; Karma, Kai; Lähdesmäki, Harri; Järvelä, Irma

    2013-01-01

    Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs) in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the scores of three music tests (COMB scores): auditory structuring ability, Seashores test for pitch and for time. Data on creativity in music (herein composing, improvising and/or arranging music) was surveyed using a web-based questionnaire.Several CNVRs containing genes that affect neurodevelopment, learning and memory were detected. A deletion at 5q31.1 covering the protocadherin-α gene cluster (Pcdha 1-9) was found co-segregating with low music test scores (COMB) in both sample sets. Pcdha is involved in neural migration, differentiation and synaptogenesis. Creativity in music was found to co-segregate with a duplication covering glucose mutarotase gene (GALM) at 2p22. GALM has influence on serotonin release and membrane trafficking of the human serotonin transporter. Interestingly, genes related to serotonergic systems have been shown to associate not only with psychiatric disorders but also with creativity and music perception. Both, Pcdha and GALM, are related to the serotonergic systems influencing cognitive and motor functions, important for music perception and practice. Finally, a 1.3 Mb duplication was identified in a subject with low COMB scores in the region previously linked with absolute pitch (AP) at 8q24. No differences in the CNV burden was detected among the high/low music test scores or creative/non-creative groups. In summary, CNVs and genes found in this study are related to cognitive functions. Our result suggests new candidate genes for music perception

  3. Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music.

    Directory of Open Access Journals (Sweden)

    Liisa Ukkola-Vuoti

    Full Text Available Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the scores of three music tests (COMB scores: auditory structuring ability, Seashores test for pitch and for time. Data on creativity in music (herein composing, improvising and/or arranging music was surveyed using a web-based questionnaire.Several CNVRs containing genes that affect neurodevelopment, learning and memory were detected. A deletion at 5q31.1 covering the protocadherin-α gene cluster (Pcdha 1-9 was found co-segregating with low music test scores (COMB in both sample sets. Pcdha is involved in neural migration, differentiation and synaptogenesis. Creativity in music was found to co-segregate with a duplication covering glucose mutarotase gene (GALM at 2p22. GALM has influence on serotonin release and membrane trafficking of the human serotonin transporter. Interestingly, genes related to serotonergic systems have been shown to associate not only with psychiatric disorders but also with creativity and music perception. Both, Pcdha and GALM, are related to the serotonergic systems influencing cognitive and motor functions, important for music perception and practice. Finally, a 1.3 Mb duplication was identified in a subject with low COMB scores in the region previously linked with absolute pitch (AP at 8q24. No differences in the CNV burden was detected among the high/low music test scores or creative/non-creative groups. In summary, CNVs and genes found in this study are related to cognitive functions. Our result suggests new candidate genes for

  4. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2002-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies. Usi

  5. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2003-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies. Usi

  6. Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Ilja M Nolte

    Full Text Available To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS including 3,558 subjects from the TwinsUK and BRIGHT cohorts in the UK and the DCCT/EDIC cohort from North America. Five loci were significantly associated with QT interval at P<1x10(-6. To validate these findings we performed an in silico comparison with data from two QT consortia: QTSCD (n = 15,842 and QTGEN (n = 13,685. Analysis confirmed the association between common variants near NOS1AP (P = 1.4x10(-83 and the phospholamban (PLN gene (P = 1.9x10(-29. The most associated SNP near NOS1AP (rs12143842 explains 0.82% variance; the SNP near PLN (rs11153730 explains 0.74% variance of QT interval duration. We found no evidence for interaction between these two SNPs (P = 0.99. PLN is a key regulator of cardiac diastolic function and is involved in regulating intracellular calcium cycling, it has only recently been identified as a susceptibility locus for QT interval. These data offer further mechanistic insights into genetic influence on the QT interval which may predispose to life threatening arrhythmias and sudden cardiac death.

  7. Genomic and karyotypic variation in Drosophila parasitoids (Hymenoptera, Cynipoidea, Figitidae

    Directory of Open Access Journals (Sweden)

    Vladimir Gokhman

    2011-08-01

    Full Text Available Drosophila melanogaster Meigen, 1830 has served as a model insect for over a century. Sequencing of the 11 additional Drosophila Fallen, 1823 species marks substantial progress in comparative genomics of this genus. By comparison, practically nothing is known about the genome size or genome sequences of parasitic wasps of Drosophila. Here, we present the first comparative analysis of genome size and karyotype structures of Drosophila parasitoids of the Leptopilina Förster, 1869 and Ganaspis Förster, 1869 species. The gametic genome size of Ganaspis xanthopoda (Ashmead, 1896 is larger than those of the three Leptopilina species studied. The genome sizes of all parasitic wasps studied here are also larger than those known for all Drosophila species. Surprisingly, genome sizes of these Drosophila parasitoids exceed the average value known for all previously studied Hymenoptera. The haploid chromosome number of both Leptopilina heterotoma (Thomson, 1862 and L. victoriae Nordlander, 1980 is ten. A chromosomal fusion appears to have produced a distinct karyotype for L. boulardi (Barbotin, Carton et Keiner-Pillault, 1979 (n = 9, whose genome size is smaller than that of wasps of the L. heterotoma clade. Like L. boulardi, the haploid chromosome number for G. xanthopoda is also nine. Our studies reveal a positive, but non linear, correlation between the genome size and total chromosome length in Drosophila parasitoids. These Drosophila parasitoids differ widely in their host range, and utilize different infection strategies to overcome host defense. Their comparative genomics, in relation to their exceptionally well-characterized hosts, will prove to be valuable for understanding the molecular basis of the host-parasite arms race and how such mechanisms shape the genetic structures of insect communities.

  8. Evolution and Variation of the SARS-CoV Genome

    Institute of Scientific and Technical Information of China (English)

    Jianfei Hu; Zizhang Zhang; Wei Wei; Songgang Li; Jun Wang; Jian Wang; Jun Yu; Huanming Yang; Jing Wang; Jing Xu; Wei Li; Yujun Han; Yan Li; Jia Ji; Jia Ye; Zhao Xu

    2003-01-01

    Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARSCoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.

  9. Genomic Variation in Natural Populations of Drosophila melanogaster

    Science.gov (United States)

    Langley, Charles H.; Stevens, Kristian; Cardeno, Charis; Lee, Yuh Chwen G.; Schrider, Daniel R.; Pool, John E.; Langley, Sasha A.; Suarez, Charlyn; Corbett-Detig, Russell B.; Kolaczkowski, Bryan; Fang, Shu; Nista, Phillip M.; Holloway, Alisha K.; Kern, Andrew D.; Dewey, Colin N.; Song, Yun S.; Hahn, Matthew W.; Begun, David J.

    2012-01-01

    This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5′- and 3′-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species. PMID:22673804

  10. Potential Value of Genomic Copy Number Variations in Schizophrenia

    Directory of Open Access Journals (Sweden)

    Chuanjun Zhuo

    2017-06-01

    Full Text Available Schizophrenia is a devastating neuropsychiatric disorder affecting approximately 1% of the global population, and the disease has imposed a considerable burden on families and society. Although, the exact cause of schizophrenia remains unknown, several lines of scientific evidence have revealed that genetic variants are strongly correlated with the development and early onset of the disease. In fact, the heritability among patients suffering from schizophrenia is as high as 80%. Genomic copy number variations (CNVs are one of the main forms of genomic variations, ubiquitously occurring in the human genome. An increasing number of studies have shown that CNVs account for population diversity and genetically related diseases, including schizophrenia. The last decade has witnessed rapid advances in the development of novel genomic technologies, which have led to the identification of schizophrenia-associated CNVs, insight into the roles of the affected genes in their intervals in schizophrenia, and successful manipulation of the target CNVs. In this review, we focus on the recent discoveries of important CNVs that are associated with schizophrenia and outline the potential values that the study of CNVs will bring to the areas of schizophrenia research, diagnosis, and therapy. Furthermore, with the help of the novel genetic tool known as the Clustered Regularly Interspaced Short Palindromic Repeats-associated nuclease 9 (CRISPR/Cas9 system, the pathogenic CNVs as genomic defects could be corrected. In conclusion, the recent novel findings of schizophrenia-associated CNVs offer an exciting opportunity for schizophrenia research to decipher the pathological mechanisms underlying the onset and development of schizophrenia as well as to provide potential clinical applications in genetic counseling, diagnosis, and therapy for this complex mental disease.

  11. Intra-genomic variation in the ribosomal repeats of nematodes.

    Directory of Open Access Journals (Sweden)

    Holly M Bik

    Full Text Available Ribosomal loci represent a major tool for investigating environmental diversity and community structure via high-throughput marker gene studies of eukaryotes (e.g. 18S rRNA. Since the estimation of species' abundance is a major goal of environmental studies (by counting numbers of sequences, understanding the patterns of rRNA copy number across species will be critical for informing such high-throughput approaches. Such knowledge is critical, given that ribosomal RNA genes exist within multi-copy repeated arrays in a genome. Here we measured the repeat copy number for six nematode species by mapping the sequences from whole genome shotgun libraries against reference sequences for their rRNA repeat. This revealed a 6-fold variation in repeat copy number amongst taxa investigated, with levels of intragenomic variation ranging from 56 to 323 copies of the rRNA array. By applying the same approach to four C. elegans mutation accumulation lines propagated by repeated bottlenecking for an average of ~400 generations, we find on average a 2-fold increase in repeat copy number (rate of increase in rRNA estimated at 0.0285-0.3414 copies per generation, suggesting that rRNA repeat copy number is subject to selection. Within each Caenorhabditis species, the majority of intragenomic variation found across the rRNA repeat was observed within gene regions (18S, 28S, 5.8S, suggesting that such intragenomic variation is not a product of selection for rRNA coding function. We find that the dramatic variation in repeat copy number among these six nematode genomes would limit the use of rRNA in estimates of organismal abundance. In addition, the unique pattern of variation within a single genome was uncorrelated with patterns of divergence between species, reflecting a strong signature of natural selection for rRNA function. A better understanding of the factors that control or affect copy number in these arrays, as well as their rates and patterns of evolution

  12. Genome downsizing and karyotype constancy in diploid and polyploid congeners: a model of genome size variation.

    Science.gov (United States)

    Poggio, Lidia; Realini, María Florencia; Fourastié, María Florencia; García, Ana María; González, Graciela Esther

    2014-06-26

    Evolutionary chromosome change involves significant variation in DNA amount in diploids and genome downsizing in polyploids. Genome size and karyotype parameters of Hippeastrum species with different ploidy level were analysed. In Hippeastrum, polyploid species show less DNA content per basic genome than diploid species. The rate of variation is lower at higher ploidy levels. All the species have a basic number x = 11 and bimodal karyotypes. The basic karyotypes consist of four short metacentric chromosomes and seven large chromosomes (submetacentric and subtelocentric). The bimodal karyotype is preserved maintaining the relative proportions of members of the haploid chromosome set, even in the presence of genome downsizing. The constancy of the karyotype is maintained because changes in DNA amount are proportional to the length of the whole-chromosome complement and vary independently in the long and short sets of chromosomes. This karyotype constancy in taxa of Hippeastrum with different genome size and ploidy level indicates that the distribution of extra DNA within the complement is not at random and suggests the presence of mechanisms selecting for constancy, or against changes, in karyotype morphology.

  13. Bioinformatics for Genome Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gary J. Olsen

    2005-06-30

    Nesbo, Boucher and Doolittle (2001) used phylogenetic trees of four taxa to assess whether euryarchaeal genes share a common history. They have suggested that of the 521 genes examined, each of the three possible tree topologies relating the four taxa was supported essentially equal numbers of times. They suggest that this might be the result of numerous horizontal gene transfer events, essentially randomizing the relationships between gene histories (as inferred in the 521 gene trees) and organismal relationships (which would be a single underlying tree). Motivated by the fact that the order in which sequences are added to a multiple sequence alignment influences the alignment, and ultimately inferred tree, they were interested in the extent to which the variations among inferred trees might be due to variations in the alignment order. This bears directly on their efforts to evaluate and improve upon methods of multiple sequence alignment. They set out to analyze the influence of alignment order on the tree inferred for 43 genes shared among these same 4 taxa. Because alignments produced by CLUSTALW are directed by a rooted guide tree (the denderogram), there are 15 possible alignment orders of 4 taxa. For each gene they tested all 15 alignment orders, and as a 16th option, allowed CLUSTALW to generate its own guide tree. If we supply all 15 possible rooted guide trees, they expected that at least one of them should be as good at CLUSTAL's own guide tree, but most of the time they differed (sometimes being better than CLUSTAL's default tree and sometimes being worse). The difference seems to be that the user-supplied tree is not given meaningful branch lengths, which effect the assumed probability of amino acid changes. They examined the practicality of modifying CLUSTALW to improve its treatment of user-supplied guide trees. This work became ever increasing bogged down in finding and repairing minor bugs in the CLUSTALW code. This effort was put on hold

  14. Detecting microsatellites within genomes: significant variation among algorithms

    Directory of Open Access Journals (Sweden)

    Rivals Eric

    2007-04-01

    Full Text Available Abstract Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker. Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp, regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.

  15. Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.

    Science.gov (United States)

    Teo, Yik-Ying; Sim, Xueling; Ong, Rick T H; Tan, Adrian K S; Chen, Jieming; Tantoso, Erwin; Small, Kerrin S; Ku, Chee-Seng; Lee, Edmund J D; Seielstad, Mark; Chia, Kee-Seng

    2009-11-01

    The Singapore Genome Variation Project (SGVP) provides a publicly available resource of 1.6 million single nucleotide polymorphisms (SNPs) genotyped in 268 individuals from the Chinese, Malay, and Indian population groups in Southeast Asia. This online database catalogs information and summaries on genotype and phased haplotype data, including allele frequencies, assessment of linkage disequilibrium (LD), and recombination rates in a format similar to the International HapMap Project. Here, we introduce this resource and describe the analysis of human genomic variation upon agglomerating data from the HapMap and the Human Genome Diversity Project, providing useful insights into the population structure of the three major population groups in Asia. In addition, this resource also surveyed across the genome for variation in regional patterns of LD between the HapMap and SGVP populations, and for signatures of positive natural selection using two well-established metrics: iHS and XP-EHH. The raw and processed genetic data, together with all population genetic summaries, are publicly available for download and browsing through a web browser modeled with the Generic Genome Browser.

  16. VarB Plus: An Integrated Tool for Visualization of Genome Variation Datasets

    KAUST Repository

    Hidayah, Lailatul

    2012-07-01

    Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive

  17. Regulatory hotspots in the malaria parasite genome dictate transcriptional variation.

    Directory of Open Access Journals (Sweden)

    Joseph M Gonzales

    2008-09-01

    Full Text Available The determinants of transcriptional regulation in malaria parasites remain elusive. The presence of a well-characterized gene expression cascade shared by different Plasmodium falciparum strains could imply that transcriptional regulation and its natural variation do not contribute significantly to the evolution of parasite drug resistance. To clarify the role of transcriptional variation as a source of stain-specific diversity in the most deadly malaria species and to find genetic loci that dictate variations in gene expression, we examined genome-wide expression level polymorphisms (ELPs in a genetic cross between phenotypically distinct parasite clones. Significant variation in gene expression is observed through direct co-hybridizations of RNA from different P. falciparum clones. Nearly 18% of genes were regulated by a significant expression quantitative trait locus. The genetic determinants of most of these ELPs resided in hotspots that are physically distant from their targets. The most prominent regulatory locus, influencing 269 transcripts, coincided with a Chromosome 5 amplification event carrying the drug resistance gene, pfmdr1, and 13 other genes. Drug selection pressure in the Dd2 parental clone lineage led not only to a copy number change in the pfmdr1 gene but also to an increased copy number of putative neighboring regulatory factors that, in turn, broadly influence the transcriptional network. Previously unrecognized transcriptional variation, controlled by polymorphic regulatory genes and possibly master regulators within large copy number variants, contributes to sweeping phenotypic evolution in drug-resistant malaria parasites.

  18. Structural genomic variation as risk factor for idiopathic recurrent miscarriage

    DEFF Research Database (Denmark)

    Nagirnaja, Liina; Palta, Priit; Kasak, Laura;

    2014-01-01

    within RM study group revealed significant enrichment of loci related to innate immunity and immunoregulatory pathways essential for immune tolerance at fetomaternal interface. As a major finding, we report a multicopy duplication (61.6 kb) at 5p13.3 conferring increased maternal risk to RM in Estonia...... and identify common rearrangements modulating risk to RM. Genome-wide screening of Estonian RM patients and fertile controls identified excessive cumulative burden of CNVs (5.4 and 6.1 Mb per genome) in two RM cases possibly increasing their individual disease risk. Functional profiling of all rearranged genes...... and Denmark (meta-analysis, n = 309/205, odds ratio = 4.82, P = 0.012). Comparison to Estonian population-based cohort (total, n = 1000) confirmed the risk for Estonian female cases (P = 7.9 × 10(-4) ). Datasets of four cohorts from the Database of Genomic Variants (total, n = 5,846 subjects) exhibited...

  19. Variation in genomic methylation in natural populations of chinese white poplar.

    Science.gov (United States)

    Ma, Kaifeng; Song, Yuepeng; Yang, Xiaohui; Zhang, Zhiyi; Zhang, Deqiang

    2013-01-01

    It is thought that methylcytosine can be inherited through meiosis and mitosis, and that epigenetic variation may be under genetic control or correlation may be caused by neutral drift. However, DNA methylation also varies with tissue, developmental stage, and environmental factors. Eliminating these factors, we analyzed the levels and patterns, diversity and structure of genomic methylcytosine in the xylem of nine natural populations of Chinese white poplar. On average, the relative total methylation and non-methylation levels were approximately 26.567% and 42.708% (Pdifferentiation (GST  = 0.159) were assessed by Shannon's diversity index. Co-inertia analysis indicated that methylation-sensitive polymorphism (MSP) and genomic methylation pattern (CG-CNG) profiles gave similar distributions. Using a between-group eigen analysis, we found that the Hebei and Shanxi populations were independent of each other, but the Henan population intersected with the other populations, to some degree. Genome methylation in Populus tomentosa presented tissue-specific characteristics and the relative 5'-CCGG methylation level was higher in xylem than in leaves. Meanwhile, the genome methylation in the xylem shows great epigenetic variation and could be fixed and inherited though mitosis. Compared to genetic structure, data suggest that epigenetic and genetic variation do not completely match.

  20. Marked variation in predicted and observed variability of tandem repeat loci across the human genome

    Directory of Open Access Journals (Sweden)

    Shields Denis C

    2008-04-01

    Full Text Available Abstract Background Tandem repeat (TR variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. Results We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p Conclusion Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21.

  1. The integrated microbial genome resource of analysis.

    Science.gov (United States)

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  2. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point...... mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome....

  3. Genomic copy number variation associated with clinical outcome in canine cutaneous mast cell tumors

    DEFF Research Database (Denmark)

    Jark, Paulo C; Mundin, Deborah B P; de Carvalho, Marcio

    2017-01-01

    from Group ST>12 and six from Group STGenomic DNA was extracted, and aCGH was performed using Agilent Canine Genome CGH Microarray 4×180 (ID-252 552 - Agilent, USA). Data analysis was carried out using Nexus program version 5.0 (Biodiscovery, USA). The group ST>12 presented 11±3.3 CNVs, while...... in DNA isolated from tumor cells by array comparative genomic hybridization (aCGH). The aim of this study was to compare copy number variations (CNVs) in cutaneous mast cell tumors of dogs that survived less than six (ST12months (ST>12) from the date of diagnosis. Ten animals were used: four...

  4. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  5. Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves.

    Science.gov (United States)

    Hedrick, Philip W; Kardos, Marty; Peterson, Rolf O; Vucetich, John A

    2017-03-01

    Inbreeding, relatedness, and ancestry have traditionally been estimated with pedigree information, however, molecular genomic data can provide more detailed examination of these properties. For example, pedigree information provides estimation of the expected value of these measures but molecular genomic data can estimate the realized values of these measures in individuals. Here, we generate the theoretical distribution of inbreeding, relatedness, and ancestry for the individuals in the pedigree of the Isle Royale wolves, the first examination of such variation in a wild population with a known pedigree. We use the 38 autosomes of the dog genome and their estimated map lengths in our genomic analysis. Although it is known that the remaining wolves are highly inbred, closely related, and descend from only 3 ancestors, our analyses suggest that there is significant variation in the realized inbreeding and relatedness around pedigree expectations. For example, the expected inbreeding in a hypothetical offspring from the 2 remaining wolves is 0.438 but the realized 95% genomic confidence interval is from 0.311 to 0.565. For individual chromosomes, a substantial proportion of the whole chromosomes are completely identical by descent. This examination provides a background to use when analyzing molecular genomic data for individual levels of inbreeding, relatedness, and ancestry. The level of variation in these measures is a function of the time to the common ancestor(s), the number of chromosomes, and the rate of recombination. In the Isle Royale wolf population, the few generations to a common ancestor results in the high variance in genomic inbreeding. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. PolyTB: A genomic variation map for Mycobacterium tuberculosis

    KAUST Repository

    Coll, Francesc

    2014-02-15

    Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest. 2014 Elsevier Ltd. All rights reserved.

  7. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines.

    Science.gov (United States)

    Huang, Wen; Massouras, Andreas; Inoue, Yutaka; Peiffer, Jason; Ràmia, Miquel; Tarone, Aaron M; Turlapati, Lavanya; Zichner, Thomas; Zhu, Dianhui; Lyman, Richard F; Magwire, Michael M; Blankenburg, Kerstin; Carbone, Mary Anna; Chang, Kyle; Ellis, Lisa L; Fernandez, Sonia; Han, Yi; Highnam, Gareth; Hjelmen, Carl E; Jack, John R; Javaid, Mehwish; Jayaseelan, Joy; Kalra, Divya; Lee, Sandy; Lewis, Lora; Munidasa, Mala; Ongeri, Fiona; Patel, Shohba; Perales, Lora; Perez, Agapito; Pu, LingLing; Rollmann, Stephanie M; Ruth, Robert; Saada, Nehad; Warner, Crystal; Williams, Aneisa; Wu, Yuan-Qing; Yamamoto, Akihiko; Zhang, Yiqing; Zhu, Yiming; Anholt, Robert R H; Korbel, Jan O; Mittelman, David; Muzny, Donna M; Gibbs, Richard A; Barbadilla, Antonio; Johnston, J Spencer; Stone, Eric A; Richards, Stephen; Deplancke, Bart; Mackay, Trudy F C

    2014-07-01

    The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.

  8. Localising loci underlying complex trait variation using Regional Genomic Relationship Mapping.

    Directory of Open Access Journals (Sweden)

    Yoshitaka Nagamine

    Full Text Available The limited proportion of complex trait variance identified in genome-wide association studies may reflect the limited power of single SNP analyses to detect either rare causative alleles or those of small effect. Motivated by studies that demonstrate that loci contributing to trait variation may contain a number of different alleles, we have developed an analytical approach termed Regional Genomic Relationship Mapping that, like linkage-based family methods, integrates variance contributed by founder gametes within a pedigree. This approach takes advantage of very distant (and unrecorded relationships, and this greatly increases the power of the method, compared with traditional pedigree-based linkage analyses. By integrating variance contributed by founder gametes in the population, our approach provides an estimate of the Regional Heritability attributable to a small genomic region (e.g. 100 SNP window covering ca. 1 Mb of DNA in a 300000 SNP GWAS and has the power to detect regions containing multiple alleles that individually contribute too little variance to be detectable by GWAS as well as regions with single common GWAS-detectable SNPs. We use genome-wide SNP array data to obtain both a genome-wide relationship matrix and regional relationship ("identity by state" or IBS matrices for sequential regions across the genome. We then estimate a heritability for each region sequentially in our genome-wide scan. We demonstrate by simulation and with real data that, when compared to traditional ("individual SNP" GWAS, our method uncovers new loci that explain additional trait variation. We analysed data from three Southern European populations and from Orkney for exemplar traits - serum uric acid concentration and height. We show that regional heritability estimates are correlated with results from genome-wide association analysis but can capture more of the genetic variance segregating in the population and identify additional trait loci.

  9. Viral small RNAs reveal the genomic variations of three grapevine vein clearing virus quasispecies populations.

    Science.gov (United States)

    Howard, Susanne; Qiu, Wenping

    2017-02-02

    Viral small RNAs (vsRNAs) include viral small interfering RNAs (vsiRNAs) that are initiators and products of RNA silencing, and small RNAs that are derived from viral RNAs with function still unknown. Sequencing of vsRNAs allows assembling of viral genomes and revelation of viral population variations at genomic levels. Grapevine vein clearing virus (GVCV) is a new member of the family Caulimoviridae whose DNA genome is replicated by reverse transcription of pre-genomic RNA molecules. In this short report, three genomic sequences of GVCV were assembled from vsRNAs that were isolated and sequenced from three individual grapevines in commercial vineyards and compared to the GVCV-CHA reference genome. Profiles of single nucleotide polymorphism among three viral populations indicated a closer relatedness between two populations in different grape cultivars at the same location than those in the same grape cultivar at different locations, suggesting the spread of GVCV populations among vineyards of close proximity. Classic types of vsiRNAs (21-nt, 22-nt, and 24-nt) were found in the three GVCV vsiRNA populations, but these did not produce alignment hotspots on the GVCV-CHA reference genome. The number of 36-nt reads is the highest among vsRNAs, the role of these vsRNAs remains unclear. The analysis of vsRNAs provides a first holistic picture of genomic variations among GVCV viral quasispecies populations that help monitor epidemics and evolution of GVCV populations, an emerging virus that is becoming a threat to grape production in the Midwest region of the USA.

  10. Advances in biotechnology and informatics to link variation in the genome to phenotypes in plants and animals.

    Science.gov (United States)

    Appels, R; Barrero, R; Bellgard, M

    2013-03-01

    Advances in our understanding of genome structure provide consistent evidence for the existence of a core genome representing species classically defined by phenotype, as well as conditionally dispensable components of the genome that shows extensive variation between individuals of a given species. Generally, conservation of phenotypic features between species reflects conserved features of the genome; however, this is evidently not necessarily always the case as demonstrated by the analysis of the tunicate chordate Oikopleura dioica. In both plants and animals, the methylation activity of DNA and histones continues to present new variables for modifying (eventually) the phenotype of an organism and provides for structural variation that builds on the point mutations, rearrangements, indels, and amplification of retrotransposable elements traditionally considered. The translation of the advances in the structure/function analysis of the genome to industry is facilitated through the capture of research outputs in "toolboxes" that remain accessible in the public domain.

  11. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of parame

  12. Detection of breed specific copy number variations in domestic chicken genome.

    Science.gov (United States)

    Sohrabi, Saeed S; Mohammadabadi, Mohammadreza; Wu, Dong-Dong; Esmailizadeh, Ali

    2017-09-29

    Copy number variations (CNVs) are important large scale variants that are widespread in the genome and may contribute to phenotypic variation. Detection and characterization of CNVs can provide new insights into the genetic basis of important traits. Here, we performed whole genome short read sequence analysis to identify CNVs in two indigenous and commercial chicken breeds and evaluate the impact of the identified CNVs on breed specific traits. After filtration, a total of 12955 CNVs spanning (on average) about 9.42% of the chicken genome were found that made up 5467 CNV regions (CNVRs). Chicken quantitative trait loci (QTL) datasets and Ensembl gene annotations were used as resources for the estimation of potential phenotypic effects of our CNVRs on breed specific traits. In total, 34% of our detected CNVRs were also detected in earlier CNV studies. These CNVRs partly overlap with several previously reported QTL and gene ontology terms associated with some important traits, including shank length QTL in Creeper specific CNVRs and body weight and egg production characteristics as well as growth of muscles and body organs gene terms in the Arian commercial breed. Our findings provide new insights into the genomic structure of the chicken genome for an improved understanding of the potential roles of CNVRs in differentiating between breeds or lines.

  13. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Andrew H Chan

    Full Text Available Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA and the other from Africa (Gikongoro, Rwanda. It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features-including recombination rates, diversity, divergence, GC content, gene content, and sequence quality-is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between

  14. Analysis and validation of genome-specific DNA variations in 5' flanking conserved sequences of wheat low-molecular-weight glutenin subunit genes

    Institute of Scientific and Technical Information of China (English)

    LONG; Hai; WEI; Yuming

    2006-01-01

    The thirty-three 5' flanking conserved sequences of the known low-molecular-weight subunit (LMW-GS) genes have been divided into eight clusters, which was in agreement with the classification based on the deduced N-terminal protein sequences. The DNA polymorphism between the eight clusters was obtained by sequence alignment, and a total of 34 polymorphic positions were observed in the approximately 200 bp regions, among which 18 polymorphic positions were candidate SNPs. Seven cluster-specific primer sets were designed for seven out of eight clusters containing cluster-specific bases, with which the genomic DNA of the ditelosomic lines of group 1 chromosomes of a wheat variety 'Chinese Spring' was employed to carry out chromosome assignment. The subsequent cloning and DNA sequencing of PCR fragments validated the sequences specificity of the 5' flanking conserved sequences between LMW-GS gene groups in different genomes. These results suggested that the coding and 5' flanking regions of LMW-GS genes are likely to have evolved in a concerted fashion. The seven primer sets developed in this study could be used to isolate the complete ORFs of seven groups of LMW-GS genes, respectively, and therefore possess great value for further research in the contributions of a single LMW-GS gene to wheat quality in the complex genetic background and the efficient selections of quality-related components in breeding programs.

  15. AluScan: a method for genome-wide scanning of sequence and structure variations in the human genome

    Directory of Open Access Journals (Sweden)

    Mei Lingling

    2011-11-01

    Full Text Available Abstract Background To complement next-generation sequencing technologies, there is a pressing need for efficient pre-sequencing capture methods with reduced costs and DNA requirement. The Alu family of short interspersed nucleotide elements is the most abundant type of transposable elements in the human genome and a recognized source of genome instability. With over one million Alu elements distributed throughout the genome, they are well positioned to facilitate genome-wide sequence amplification and capture of regions likely to harbor genetic variation hotspots of biological relevance. Results Here we report on the use of inter-Alu PCR with an enhanced range of amplicons in conjunction with next-generation sequencing to generate an Alu-anchored scan, or 'AluScan', of DNA sequences between Alu transposons, where Alu consensus sequence-based 'H-type' PCR primers that elongate outward from the head of an Alu element are combined with 'T-type' primers elongating from the poly-A containing tail to achieve huge amplicon range. To illustrate the method, glioma DNA was compared with white blood cell control DNA of the same patient by means of AluScan. The over 10 Mb sequences obtained, derived from more than 8,000 genes spread over all the chromosomes, revealed a highly reproducible capture of genomic sequences enriched in genic sequences and cancer candidate gene regions. Requiring only sub-micrograms of sample DNA, the power of AluScan as a discovery tool for genetic variations was demonstrated by the identification of 357 instances of loss of heterozygosity, 341 somatic indels, 274 somatic SNVs, and seven potential somatic SNV hotspots between control and glioma DNA. Conclusions AluScan, implemented with just a small number of H-type and T-type inter-Alu PCR primers, provides an effective capture of a diversity of genome-wide sequences for analysis. The method, by enabling an examination of gene-enriched regions containing exons, introns, and

  16. A genome-wide study of recombination rate variation in Bartonella henselae

    Directory of Open Access Journals (Sweden)

    Guy Lionel

    2012-05-01

    Full Text Available Abstract Background Rates of recombination vary by three orders of magnitude in bacteria but the reasons for this variation is unclear. We performed a genome-wide study of recombination rate variation among genes in the intracellular bacterium Bartonella henselae, which has among the lowest estimated ratio of recombination relative to mutation in prokaryotes. Results The 1.9 Mb genomes of B. henselae strains IC11, UGA10 and Houston-1 genomes showed only minor gene content variation. Nucleotide sequence divergence levels were less than 1% and the relative rate of recombination to mutation was estimated to 1.1 for the genome overall. Four to eight segments per genome presented significantly enhanced divergences, the most pronounced of which were the virB and trw gene clusters for type IV secretion systems that play essential roles in the infection process. Consistently, multiple recombination events were identified inside these gene clusters. High recombination frequencies were also observed for a gene putatively involved in iron metabolism. A phylogenetic study of this gene in 80 strains of Bartonella quintana, B. henselae and B. grahamii indicated different population structures for each species and revealed horizontal gene transfers across Bartonella species with different host preferences. Conclusions Our analysis has shown little novel gene acquisition in B. henselae, indicative of a closed pan-genome, but higher recombination frequencies within the population than previously estimated. We propose that the dramatically increased fixation rate for recombination events at gene clusters for type IV secretion systems is driven by selection for sequence variability.

  17. Genomic and gene variation in Mycoplasma hominis strains

    DEFF Research Database (Denmark)

    Christiansen, Gunna; Andersen, H; Birkelund, Svend

    1987-01-01

    DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed...... no identity or cluster formation between strains. Variation within M. hominis rRNA genes was analyzed by Southern hybridization of EcoRI-cleaved DNA hybridized with a cloned fragment of the rRNA gene from the mycoplasma strain PG50. Five of the M. hominis strains showed identical hybridization patterns....... These hybridization patterns were compared with those of 12 other mycoplasma species, which showed a much more complex band pattern. Cloned nonribosomal RNA gene fragments of M. hominis PG21 DNA were analyzed, and the fragments were used to demonstrate heterogeneity among the strains. A monoclonal antibody against...

  18. Genome-wide detection of copy number variations among diverse horse breeds by array CGH.

    Science.gov (United States)

    Wang, Wei; Wang, Shenyuan; Hou, Chenglin; Xing, Yanping; Cao, Junwei; Wu, Kaifeng; Liu, Chunxia; Zhang, Dong; Zhang, Li; Zhang, Yanru; Zhou, Huanmin

    2014-01-01

    Recent studies have found that copy number variations (CNVs) are widespread in human and animal genomes. CNVs are a significant source of genetic variation, and have been shown to be associated with phenotypic diversity. However, the effect of CNVs on genetic variation in horses is not well understood. In the present study, CNVs in 6 different breeds of mare horses, Mongolia horse, Abaga horse, Hequ horse and Kazakh horse (all plateau breeds) and Debao pony and Thoroughbred, were determined using aCGH. In total, seven hundred CNVs were identified ranging in size from 6.1 Kb to 0.57 Mb across all autosomes, with an average size of 43.08 Kb and a median size of 15.11 Kb. By merging overlapping CNVs, we found a total of three hundred and fifty-three CNV regions (CNVRs). The length of the CNVRs ranged from 6.1 Kb to 1.45 Mb with average and median sizes of 38.49 Kb and 13.1 Kb. Collectively, 13.59 Mb of copy number variation was identified among the horses investigated and accounted for approximately 0.61% of the horse genome sequence. Five hundred and eighteen annotated genes were affected by CNVs, which corresponded to about 2.26% of all horse genes. Through the gene ontology (GO), genetic pathway analysis and comparison of CNV genes among different breeds, we found evidence that CNVs involving 7 genes may be related to the adaptation to severe environment of these plateau horses. This study is the first report of copy number variations in Chinese horses, which indicates that CNVs are ubiquitous in the horse genome and influence many biological processes of the horse. These results will be helpful not only in mapping the horse whole-genome CNVs, but also to further research for the adaption to the high altitude severe environment for plateau horses.

  19. Variation in salamanders: an essay on genomes, development, and evolution.

    Science.gov (United States)

    Brockes, Jeremy P

    2015-01-01

    Regeneration is studied in a few model species of salamanders, but the ten families of salamanders show considerable variation, and this has implications for our understanding of salamander biology. The most recent classification of the families identifies the cryptobranchoidea as the basal group which diverged in the early Jurassic. Variation in the sizes of genomes is particularly obvious, and reflects a major contribution from transposable elements which is already present in the basal group.Limb development has been a focus for evodevo studies, in part because of the variable property of pre-axial dominance which distinguishes salamanders from other tetrapods. This is thought to reflect the selective pressures that operate on a free-living aquatic larva, and might also be relevant for the evolution of limb regeneration. Recent fossil evidence suggests that both pre-axial dominance and limb regeneration were present 300 million years ago in larval temnospondyl amphibians that lived in mountain lakes. A satisfying account of regeneration in salamanders may need to address all these different aspects in the future.

  20. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    Directory of Open Access Journals (Sweden)

    Andersson Jan O

    2010-10-01

    Full Text Available Abstract Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation.

  1. Transposon Insertions, Structural Variations, and SNPs Contribute to the Evolution of the Melon Genome.

    Science.gov (United States)

    Sanseverino, Walter; Hénaff, Elizabeth; Vives, Cristina; Pinosio, Sara; Burgos-Paz, William; Morgante, Michele; Ramos-Onsins, Sebastián E; Garcia-Mas, Jordi; Casacuberta, Josep Maria

    2015-10-01

    The availability of extensive databases of crop genome sequences should allow analysis of crop variability at an unprecedented scale, which should have an important impact in plant breeding. However, up to now the analysis of genetic variability at the whole-genome scale has been mainly restricted to single nucleotide polymorphisms (SNPs). This is a strong limitation as structural variation (SV) and transposon insertion polymorphisms are frequent in plant species and have had an important mutational role in crop domestication and breeding. Here, we present the first comprehensive analysis of melon genetic diversity, which includes a detailed analysis of SNPs, SV, and transposon insertion polymorphisms. The variability found among seven melon varieties representing the species diversity and including wild accessions and highly breed lines, is relatively high due in part to the marked divergence of some lineages. The diversity is distributed nonuniformly across the genome, being lower at the extremes of the chromosomes and higher in the pericentromeric regions, which is compatible with the effect of purifying selection and recombination forces over functional regions. Additionally, this variability is greatly reduced among elite varieties, probably due to selection during breeding. We have found some chromosomal regions showing a high differentiation of the elite varieties versus the rest, which could be considered as strongly selected candidate regions. Our data also suggest that transposons and SV may be at the origin of an important fraction of the variability in melon, which highlights the importance of analyzing all types of genetic variability to understand crop genome evolution.

  2. Coronavirus Genomics and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Kwok-Yung Yuen

    2010-08-01

    Full Text Available The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4´10-4 to 2´10-2 substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV, between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1. Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

  3. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  4. Genome wide copy number analysis of single cells

    Science.gov (United States)

    Baslan, Timour; Kendall, Jude; Rodgers, Linda; Cox, Hilary; Riggs, Mike; Stepansky, Asya; Troge, Jennifer; Ravi, Kandasamy; Esposito, Diane; Lakshmi, B.; Wigler, Michael; Navin, Nicholas; Hicks, James

    2016-01-01

    Summary Copy number variation (CNV) is increasingly recognized as an important contributor to phenotypic variation in health and disease. Most methods for determining CNV rely on admixtures of cells, where information regarding genetic heterogeneity is lost. Here, we present a protocol that allows for the genome wide copy number analysis of single nuclei isolated from mixed populations of cells. Single nucleus sequencing (SNS), combines flow sorting of single nuclei based on DNA content, whole genome amplification (WGA), followed by next generation sequencing to quantize genomic intervals in a genome wide manner. Multiplexing of single cells is discussed. Additionally, we outline informatic approaches that correct for biases inherent in the WGA procedure and allow for accurate determination of copy number profiles. All together, the protocol takes ~3 days from flow cytometry to sequence-ready DNA libraries. PMID:22555242

  5. A genomic overview of short genetic variations in a basal chordate, Ciona intestinalis

    Directory of Open Access Journals (Sweden)

    Satou Yutaka

    2012-05-01

    Full Text Available Abstract Background Although the Ciona intestinalis genome contains many allelic polymorphisms, there is only limited data analyzed systematically. Establishing a dense map of genetic variations in C. intestinalis is necessary not only for linkage analysis, but also for other experimental biology including molecular developmental and evolutionary studies, because animals from natural populations are typically used for experiments. Results Here, we identified over three million candidate short genomic variations within a 110 Mb euchromatin region among five C. intestinalis individuals. The average nucleotide diversity was approximately 1.1%. Genetic variations were found at a similar density in intergenic and gene regions. Non-synonymous and nonsense nucleotide substitutions were found in 12,493 and 1,214 genes accounting for 81.9% and 8.0% of the entire gene set, respectively, and over 60% of genes in the single animal encode non-identical proteins between maternal and paternal alleles. Conclusions Our results provide a framework for studying evolution of the animal genome, as well as a useful resource for a wide range of C. intestinalis researchers.

  6. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping.

    Directory of Open Access Journals (Sweden)

    Amaury Vaysse

    2011-10-01

    Full Text Available The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.

  7. Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data

    Directory of Open Access Journals (Sweden)

    Wilhelm Larry J

    2007-11-01

    Full Text Available Abstract Background One objective of metagenomics is to reconstruct information about specific uncultured organisms from fragmentary environmental DNA sequences. We used the genome of an isolate of the marine alphaproteobacterium SAR11 ('Candidatus Pelagibacter ubique'; strain HTCC1062, obtained from the cold, productive Oregon coast, as a query sequence to study variation in SAR11 metagenome sequence data from the Sargasso Sea, a warm, oligotrophic ocean gyre. Results The average amino acid identity of SAR11 genes encoded by the metagenomic data to the query genome was only 71%, indicating significant evolutionary divergence between the coastal isolates and Sargasso Sea populations. However, an analysis of gene neighbors indicated that SAR11 genes in the Sargasso Sea metagenomic data match the gene order of the HTCC1062 genome in 96% of cases (> 85,000 observations, and that rearrangements are most frequent at predicted operon boundaries. There were no conserved examples of genes with known functions being found in the coastal isolates, but not the Sargasso Sea metagenomic data, or vice versa, suggesting that core regions of these diverse SAR11 genomes are relatively conserved in gene content. However, four hypervariable regions were observed, which may encode properties associated with variation in SAR11 ecotypes. The largest of these, HVR2, is a 48 kb region flanked by the sole 5S and 23S genes in the HTCC1062 genome, and mainly encodes genes that determine cell surface properties. A comparison of two closely related 'Candidatus Pelagibacter' genomes (HTCC1062 and HTCC1002 revealed a number of "gene indels" in core regions. Most of these were found to be polymorphic in the metagenomic data and showed evidence of purifying selection, suggesting that the same "polymorphic gene indels" are maintained in physically isolated SAR11 populations. Conclusion These findings suggest that natural selection has conserved many core features of SAR11

  8. Identification of probable genomic packaging signal sequence from SARS—CoV genome by bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    QINLei; XIONGBin; LUOCheng; GUOZong-Ming; HAOPei; SUJiong; NANPeng; FENGYing; SHIYi-Xiang; YUXiao-Jing; LUOXiao-Min; CHENKai-Xian; SHENXu; SHENJian-Hua; ZOUJian-Ping; ZHAOGuo-Ping; SHITie-Liu; HEWei-Zhong; ZHONGYang; JIANGHua-Liang; LIYi-Xue

    2003-01-01

    AIM:To predict the probable genomic packaging signal of SARS-CoV by bioinformatics analysis. The derived packaging signal may be used to design antisense RNA and RNA interfere (RANi) drugs treating SARS. methods: Based on the studies about the genomic packaging signals of MHV and BCoV, especially the information about primary and secondary structures, the putative genomic packaging signal of SARS_CoV were analyzed by using bioinformatic tools. Multi-alignment for the genomic sequences was performed among SARS-CoV,MHV,BCoV, PEDV and HCoV 229E. Secondary structures of RNA sequences were also predicted for the identification fo the possible genomic packaging signals. Meanwhile, the N and M proteins of all five viruses were analyzed to study the evolutionary relationship with genomic packaging signals. RESULTS: The putative genomic packaging signal of SARS-CoV locates at the 3′ end of ORF1b near that of MHV and BCoV, where is the most variable region of this gene. The RNA secondary structure of SARS-CoV genomic packaging signal is very similar to that of MHV and BCoV. The same result was also obtained in studying the genomic packaging signals of PEDV and HCoV 229E. Further more, the genomic sequence multi-alignment indicated that the locations of packaging signals of SARS-CoV, PEDV, and HCoV overlaped each other. It seems that the mutation rate of packaging signal sequences is much higher than the N protein, while only subtle variations for the M protein. CONCLUSIONS: The probable genomic packaging signal of SARS-CoV is analogous to that of MHV and BCoV, with the corresponding secondary RNA structure locating at the similar region of ORF1b. The positions where genomic packaging signals exist have suffered rounds of mutations, which may influence the primary structures of the N and M proteins consequently.

  9. Integrated translational genomics for analysis of complex traits in sorghum

    Science.gov (United States)

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  10. Genetic analysis of environmental variation

    NARCIS (Netherlands)

    Hill, W.G.; Mulder, H.A.

    2010-01-01

    Environmental variation (VE) in a quantitative trait – variation in phenotype that cannot be explained by genetic variation or identifiable genetic differences – can be regarded as being under some degree of genetic control. Such variation may be either between repeated expressions of the same trait

  11. Genetic analysis of environmental variation

    NARCIS (Netherlands)

    Hill, W.G.; Mulder, H.A.

    2010-01-01

    Environmental variation (VE) in a quantitative trait – variation in phenotype that cannot be explained by genetic variation or identifiable genetic differences – can be regarded as being under some degree of genetic control. Such variation may be either between repeated expressions of the same trait

  12. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; McCarroll, Steven A.; Beekrnan, Marian; De Craen, Anton J. M.; Suchiman, H. Eka D.; Hofman, Albert; Oostra, Ben; Uitterlinden, Andre G.; Willemsen, Gonneke; Platteel, Mathieu; Veldink, Jan H.; Van den Berg, Leonard H.; Pitts, Steven J.; Potluri, Shobha; Sundar, Purnima; Cox, David R.; Sunyaev, Shamil R.; Den Dunnen, Johan T.; Stoneking, Mark; De Knijff, Peter; Kayser, Manfred; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Peer, Itsik; Slagboom, P. Eline; Van Duijn, Cornelia M.; Boomsma, Dorret I.; Van Ommen, Gert-Jan B.; De Bakker, Paul I. W.; Swertz, Morris A.; Wijmenga, Cisca

    2014-01-01

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  13. Variation in genomic methylation in natural populations of chinese white poplar.

    Directory of Open Access Journals (Sweden)

    Kaifeng Ma

    Full Text Available BACKGROUND: It is thought that methylcytosine can be inherited through meiosis and mitosis, and that epigenetic variation may be under genetic control or correlation may be caused by neutral drift. However, DNA methylation also varies with tissue, developmental stage, and environmental factors. Eliminating these factors, we analyzed the levels and patterns, diversity and structure of genomic methylcytosine in the xylem of nine natural populations of Chinese white poplar. PRINCIPAL FINDINGS: On average, the relative total methylation and non-methylation levels were approximately 26.567% and 42.708% (P<0.001, respectively. Also, the relative CNG methylation level was higher than the relative CG methylation level. The relative methylation/non-methylation levels were significantly different among the nine natural populations. Epigenetic diversity ranged from 0.811 (Gansu to 1.211 (Shaanxi, and the coefficients of epigenetic differentiation (GST  = 0.159 were assessed by Shannon's diversity index. Co-inertia analysis indicated that methylation-sensitive polymorphism (MSP and genomic methylation pattern (CG-CNG profiles gave similar distributions. Using a between-group eigen analysis, we found that the Hebei and Shanxi populations were independent of each other, but the Henan population intersected with the other populations, to some degree. CONCLUSIONS: Genome methylation in Populus tomentosa presented tissue-specific characteristics and the relative 5'-CCGG methylation level was higher in xylem than in leaves. Meanwhile, the genome methylation in the xylem shows great epigenetic variation and could be fixed and inherited though mitosis. Compared to genetic structure, data suggest that epigenetic and genetic variation do not completely match.

  14. Natural selection affects multiple aspects of genetic variation at putatively peutral sites across the human genome

    DEFF Research Database (Denmark)

    Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui

    2011-01-01

    A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries...... affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations....... these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination...

  15. Genome-wide analysis correlates Ayurveda Prakriti.

    Science.gov (United States)

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K; Prasanna, B V; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S; Dedge, Amrish P; Bharadwaj, Ramachandra; Gangadharan, G G; Nair, Sreekumaran; Gopinath, Puthiya M; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-10-29

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as "Prakriti". To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10(-5)) were significantly different between Prakritis, without any confounding effect of stratification, after 10(6) permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India's traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.

  16. Somatic genomic variations in extra-embryonic tissues

    Energy Technology Data Exchange (ETDEWEB)

    Weier, Jingly F.; Ferlatte, Christy; Weier, Heinz-Ulli G.

    2010-05-21

    In the mature chorion, one of the membranes that exist during pregnancy between the developing fetus and mother, human placental cells form highly specialized tissues composed of mesenchyme and floating or anchoring villi. Using fluorescence in situ hybridization, we found that human invasive cytotrophoblasts isolated from anchoring villi or the uterine wall had gained individual chromosomes; however, chromosome losses were detected infrequently. With chromosomes gained in what appeared to be a chromosome-specific manner, more than half of the invasive cytotrophoblasts in normal pregnancies were found to be hyperdiploid. Interestingly, the rates of hyperdiploid cells depended not only on gestational age, but were strongly associated with the extraembryonic compartment at the fetal-maternal interface from which they were isolated. Since hyperdiploid cells showed drastically reduced DNA replication as measured by bromodeoxyuridine incorporation, we conclude that aneuploidy is a part of the normal process of placentation potentially limiting the proliferative capabilities of invasive cytotrophoblasts. Thus, under the special circumstances of human reproduction, somatic genomic variations may exert a beneficial, anti-neoplastic effect on the organism.

  17. From genomes to pangenomes: understanding variation among individuals and species

    OpenAIRE

    Contreras-Moreira, Bruno; Vinuesa, Pablo

    2017-01-01

    This tutorial illustrates how to analyze pan-genomes using GET_HOMOLOGUES and GET_HOMOLOGUES-EST. After a short introduction, where the main concepts are illustrated, the remaining sections cover the installation and typical operations required to analyze and annotate genomes and transcriptomes from a pan-genome perspective, in which individuals or species contribute genetic material to a pool.

  18. Modelling human regulatory variation in mouse: finding the function in genome-wide association studies and whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Jean-François Schmouth

    Full Text Available An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs, in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX. This method can be applied to most human genes for which a bacterial artificial chromosome (BAC construct can be derived and a mouse-null allele exists. This strategy comprises (1 the use of recombineering technology to create a human variant-harbouring BAC, (2 knock-in of this BAC into the mouse genome using Hprt docking technology, and (3 allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation.

  19. HGD-Chn: The Database of Genome Diversity and Variation for Chinese Populations.

    Science.gov (United States)

    Hong-Sheng, Gui; Peng, Zhou; Cheng-Bo, Yang; Sheng-Bin, Li

    2009-04-01

    The Database of Genome Diversity and Variation for Chinese Populations is toward a more efficient utilization and sharing of the valuable yet diminishing genetic resources in China (including sample information of healthy populations, healthy pedigrees, disease population and disease pedigrees; genomic diversity data; disease-related allelic and haplotype data). Organization of the database can be divided into two parts: (1) Genetic resources of healthy people--Organizing genetic resources of healthy people. A variety of genetic markers (VNTR, STR, SNP, HLA, and enzyme markers, etc.) are chosen for their diversity among populations, with their distribution among different ethnic groups in China stored in the form of allelic frequency. A further analysis as well as an overall description of the Chinese population genetic structure is also being made possible. (2) Disease genetic resources--Four categories are mainly concerned: chromosomal diseases, monogenic diseases, polygenic diseases, and birth defects. For each kind of disease, the basic introduction and description, sample information, and allelic data of related gene are involved. Aside from research-oriented information, introductory courses oriented at general public covering fields of genomic diversity and variation, the related experimental techniques, standards and specifications could also be accessed in our website. Further more, flexible query and submit system with user-friendly interfaces are also integrated in our website to simplify the process of user-query and administrators' database maintenance work. Online data analyzing and managing tools are developed using bioinformatics algorithm and programming language for a better interpretation of the biological data.

  20. Genomic instability is associated with natural life span variation in Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Hong Qin

    Full Text Available Increasing genomic instability is associated with aging in eukaryotes, but the connection between genomic instability and natural variation in life span is unknown. We have quantified chronological life span and loss-of-heterozygosity (LOH in 11 natural isolates of Saccharomyces cerevisiae. We show that genomic instability increases and mitotic asymmetry breaks down during chronological aging. The age-dependent increase of genomic instability generally lags behind the drop of viability and this delay accounts for approximately 50% of the observed natural variation of replicative life span in these yeast isolates. We conclude that the abilities of yeast strains to tolerate genomic instability co-vary with their replicative life spans. To the best of our knowledge, this is the first quantitative evidence that demonstrates a link between genomic instability and natural variation in life span.

  1. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  2. Genomic variation across the Yellow-rumped Warbler species complex

    OpenAIRE

    Toews, David P.L.; Brelsford, Alan; Grossen, Christine; Milá, Borja; Irwin, Darren E.

    2016-01-01

    Populations that have experienced long periods of geographic isolation will diverge over time. The application of highthroughput sequencing technologies to study the genomes of related taxa now allows us to quantify, at a fine scale, the consequences of this divergence across the genome. Throughout a number of studies, a notable pattern has emerged. In many cases, estimates of differentiation across the genome are strongly heterogeneous; however, the evolutionary processes driving this striki...

  3. Genomic and genie sequence variation in synthetic hexaploid wheat(AABBDD)as compared to their parental species

    Institute of Scientific and Technical Information of China (English)

    Lihong Nie; Zongfu Han; Lahu Lu; Yingyin Yao; Qixin Sun; Zhongfu Ni

    2008-01-01

    In order to understand the genomic changes during the evolution of hexaploid wheat,two sets of synthetic hexaploid wheat from hybridization between maternal tetraploid wheat (AABB) and paternal diploid goat grass(DD)were used for DNA-AFLP and single strand conformation polymorphism (SSCP) analysis to determine the genomic and genie variation in the synthetic hexaploid wheat.Results indicated that more DNA sequences from paternal diploid species wen eliminated in the synthetic hexaploid wheat than from maternal tetraploid wheat,suggesting that genome from parental species of lower ploidity tends to be eliminated preferentially.However,sequence variation detected by SSCP procedure was much lower than those detected by DNA-AFLP.which indicated that much less variation in the genie regions occurred in the synthetic hexaploid wheat.and sequence variations detected by DNA-AFLP could be derived mostly from non-coding regions and repetitive sequences.Our results also indicated that sequence variation in 4 genes can be detected in hybrid F1.which suggested that this type of sequence variation could be resulted from distant hybridization.It was interesting to note that 3 out of the 4 genes were mapped and clustered on the long alTll of chromosome 2D,which indicated that variation in genic sequences in synthetic hexaploid wheat might not be a randomized process.

  4. Hawaiian Drosophila genomes: size variation and evolutionary expansions.

    Science.gov (United States)

    Craddock, Elysse M; Gall, Joseph G; Jonas, Mark

    2016-02-01

    This paper reports genome sizes of one Hawaiian Scaptomyza and 16 endemic Hawaiian Drosophila species that include five members of the antopocerus species group, one member of the modified mouthpart group, and ten members of the picture wing clade. Genome size expansions have occurred independently multiple times among Hawaiian Drosophila lineages, and have resulted in an over 2.3-fold range of genome sizes among species, with the largest observed in Drosophila cyrtoloma (1C = 0.41 pg). We find evidence that these repeated genome size expansions were likely driven by the addition of significant amounts of heterochromatin and satellite DNA. For example, our data reveal that the addition of seven heterochromatic chromosome arms to the ancestral haploid karyotype, and a remarkable proportion of ~70 % satellite DNA, account for the greatly expanded size of the D. cyrtoloma genome. Moreover, the genomes of 13/17 Hawaiian picture wing species are composed of substantial proportions (22-70 %) of detectable satellites (all but one of which are AT-rich). Our results suggest that in this tightly knit group of recently evolved species, genomes have expanded, in large part, via evolutionary amplifications of satellite DNA sequences in centric and pericentric domains (especially of the X and dot chromosomes), which have resulted in longer acrocentric chromosomes or metacentrics with an added heterochromatic chromosome arm. We discuss possible evolutionary mechanisms that may have shaped these patterns, including rapid fixation of novel expanded genomes during founder-effect speciation.

  5. Comparative genomic analysis of esophageal cancers.

    Science.gov (United States)

    Caygill, Christine P J; Gatenby, Piers A C; Herceg, Zdenko; Lima, Sheila C S; Pinto, Luis F R; Watson, Anthony; Wu, Ming-Shiang

    2014-09-01

    The following, from the 12th OESO World Conference: Cancers of the Esophagus, includes commentaries on comparative genomic analysis of esophageal cancers: genomic polymorphisms, the genetic and epigenetic drivers in esophageal cancers, and the collection of data in the UK Barrett's Oesophagus Registry.

  6. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  7. Structural and functional analysis of rice genome

    Indian Academy of Sciences (India)

    Akhilesh K. Tyagi; Jitendra P. Khurana; Paramjit Khurana; Saurabh Raghuvanshi; Anupama Gaur; Anita Kapur; Vikrant Gupta; Dibyendu Kumar; V. Ravi; Shubha Vij; Parul Khurana; Sulabha Sharma

    2004-04-01

    Rice is an excellent system for plant genomics as it represents a modest size genome of 430 Mb. It feeds more than half the population of the world. Draft sequences of the rice genome, derived by whole-genome shotgun approach at relatively low coverage (4–6 X), were published and the International Rice Genome Sequencing Project (IRGSP) declared high quality (>10 X), genetically anchored, phase 2 level sequence in 2002. In addition, phase 3 level finished sequence of chromosomes 1, 4 and 10 (out of 12 chromosomes of rice) has already been reported by scientists from IRGSP consortium. Various estimates of genes in rice place the number at > 50,000. Already, over 28,000 full-length cDNAs have been sequenced, most of which map to genetically anchored genome sequence. Such information is very useful in revealing novel features of macro- and micro-level synteny of rice genome with other cereals. Microarray analysis is unraveling the identity of rice genes expressing in temporal and spatial manner and should help target candidate genes useful for improving traits of agronomic importance. Simultaneously, functional analysis of rice genome has been initiated by marker-based characterization of useful genes and employing functional knock-outs created by mutation or gene tagging. Integration of this enormous information is expected to catalyze tremendous activity on basic and applied aspects of rice genomics.

  8. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  9. TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data.

    Science.gov (United States)

    Bouaoun, Liacine; Sonkin, Dmitriy; Ardin, Maude; Hollstein, Monica; Byrnes, Graham; Zavadil, Jiri; Olivier, Magali

    2016-09-01

    TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database (http://p53.iarc.fr) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA-binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database.

  10. VIGoR: Variational Bayesian Inference for Genome-Wide Regression

    Directory of Open Access Journals (Sweden)

    Akio Onogi

    2016-04-01

    Full Text Available Genome-wide regression using a number of genome-wide markers as predictors is now widely used for genome-wide association mapping and genomic prediction. We developed novel software for genome-wide regression which we named VIGoR (variational Bayesian inference for genome-wide regression. Variational Bayesian inference is computationally much faster than widely used Markov chain Monte Carlo algorithms. VIGoR implements seven regression methods, and is provided as a command line program package for Linux/Mac, and as a cross-platform R package. In addition to model fitting, cross-validation and hyperparameter tuning using cross-validation can be automatically performed by modifying a single argument. VIGoR is available at https://github.com/Onogi/VIGoR. The R package is also available at https://cran.r-project.org/web/packages/VIGoR/index.html.

  11. Genome Size in North American Fireflies: Substantial Variation Likely Driven by Neutral Processes

    Science.gov (United States)

    Johnston, J. Spencer; Stanger-Hall, Kathrin F.; Hjelmen, Carl E.; Hanrahan, Shawn J.; Korunes, Katharine; Hall, David

    2017-01-01

    Abstract Eukaryotic genomes show tremendous size variation across taxa. Proximate explanations for genome size variation include differences in ploidy and amounts of noncoding DNA, especially repetitive DNA. Ultimate explanations include selection on physiological correlates of genome size such as cell size, which in turn influence body size, resulting in the often-observed correlation between body size and genome size. In this study, we examined body size and repetitive DNA elements in relationship to the evolution of genome size in North American representatives of a single beetle family, the Lampyridae (fireflies). The 23 species considered represent an excellent study system because of the greater than 5-fold range of genome sizes, documented here using flow cytometry, and the 3-fold range in body size, measured using pronotum width. We also identified common genomic repetitive elements using low-coverage sequencing. We found a positive relationship between genome size and repetitive DNA, particularly retrotransposons. Both genome size and these elements were evolving as expected given phylogenetic relatedness. We also tested whether genome size varied with body size and found no relationship. Together, our results suggest that genome size is evolving neutrally in fireflies. PMID:28541478

  12. G-protein genomic association with normal variation in gray matter density

    NARCIS (Netherlands)

    Chen, J.; Calhoun, V.D.; Arias-Vasquez, A.; Zwiers, M.P.; Hulzen, K. van; Fernandez, G.S.E.; Fisher, S.E.; Franke, B.; Turner, J.A.; Liu, J.

    2015-01-01

    While detecting genetic variations underlying brain structures helps reveal mechanisms of neural disorders, high data dimensionality poses a major challenge for imaging genomic association studies. In this work, we present the application of a recently proposed approach, parallel independent

  13. Transposable element distribution, abundance and role in genome size variation in the genus Oryza.

    Science.gov (United States)

    Zuccolo, Andrea; Sebastian, Aswathy; Talag, Jayson; Yu, Yeisoo; Kim, HyeRan; Collura, Kristi; Kudrna, Dave; Wing, Rod A

    2007-08-29

    The genus Oryza is composed of 10 distinct genome types, 6 diploid and 4 polyploid, and includes the world's most important food crop - rice (Oryza sativa [AA]). Genome size variation in the Oryza is more than 3-fold and ranges from 357 Mbp in Oryza glaberrima [AA] to 1283 Mbp in the polyploid Oryza ridleyi [HHJJ]. Because repetitive elements are known to play a significant role in genome size variation, we constructed random sheared small insert genomic libraries from 12 representative Oryza species and conducted a comprehensive study of the repetitive element composition, distribution and phylogeny in this genus. Particular attention was paid to the role played by the most important classes of transposable elements (Long Terminal Repeats Retrotransposons, Long interspersed Nuclear Elements, helitrons, DNA transposable elements) in shaping these genomes and in their contributing to genome size variation. We identified the elements primarily responsible for the most strikingly genome size variation in Oryza. We demonstrated how Long Terminal Repeat retrotransposons belonging to the same families have proliferated to very different extents in various species. We also showed that the pool of Long Terminal Repeat Retrotransposons is substantially conserved and ubiquitous throughout the Oryza and so its origin is ancient and its existence predates the speciation events that originated the genus. Finally we described the peculiar behavior of repeats in the species Oryza coarctata [HHKK] whose placement in the Oryza genus is controversial. Long Terminal Repeat retrotransposons are the major component of the Oryza genomes analyzed and, along with polyploidization, are the most important contributors to the genome size variation across the Oryza genus. Two families of Ty3-gypsy elements (RIRE2 and Atlantys) account for a significant portion of the genome size variations present in the Oryza genus.

  14. Transposable element distribution, abundance and role in genome size variation in the genus Oryza

    Directory of Open Access Journals (Sweden)

    Collura Kristi

    2007-08-01

    Full Text Available Abstract Background The genus Oryza is composed of 10 distinct genome types, 6 diploid and 4 polyploid, and includes the world's most important food crop – rice (Oryza sativa [AA]. Genome size variation in the Oryza is more than 3-fold and ranges from 357 Mbp in Oryza glaberrima [AA] to 1283 Mbp in the polyploid Oryza ridleyi [HHJJ]. Because repetitive elements are known to play a significant role in genome size variation, we constructed random sheared small insert genomic libraries from 12 representative Oryza species and conducted a comprehensive study of the repetitive element composition, distribution and phylogeny in this genus. Particular attention was paid to the role played by the most important classes of transposable elements (Long Terminal Repeats Retrotransposons, Long interspersed Nuclear Elements, helitrons, DNA transposable elements in shaping these genomes and in their contributing to genome size variation. Results We identified the elements primarily responsible for the most strikingly genome size variation in Oryza. We demonstrated how Long Terminal Repeat retrotransposons belonging to the same families have proliferated to very different extents in various species. We also showed that the pool of Long Terminal Repeat Retrotransposons is substantially conserved and ubiquitous throughout the Oryza and so its origin is ancient and its existence predates the speciation events that originated the genus. Finally we described the peculiar behavior of repeats in the species Oryza coarctata [HHKK] whose placement in the Oryza genus is controversial. Conclusion Long Terminal Repeat retrotransposons are the major component of the Oryza genomes analyzed and, along with polyploidization, are the most important contributors to the genome size variation across the Oryza genus. Two families of Ty3-gypsy elements (RIRE2 and Atlantys account for a significant portion of the genome size variations present in the Oryza genus.

  15. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

  16. A high-definition view of functional genetic variation from natural yeast genomes.

    Science.gov (United States)

    Bergström, Anders; Simpson, Jared T; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N; Moses, Alan M; Louis, Edward J; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni

    2014-04-01

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.

  17. A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

    OpenAIRE

    2012-01-01

    Background Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common var...

  18. Microarray comparative genomic hybridisation analysis incorporating genomic organisation, and application to enterobacterial plant pathogens.

    Directory of Open Access Journals (Sweden)

    Leighton Pritchard

    2009-08-01

    Full Text Available Microarray comparative genomic hybridisation (aCGH provides an estimate of the relative abundance of genomic DNA (gDNA taken from comparator and reference organisms by hybridisation to a microarray containing probes that represent sequences from the reference organism. The experimental method is used in a number of biological applications, including the detection of human chromosomal aberrations, and in comparative genomic analysis of bacterial strains, but optimisation of the analysis is desirable in each problem domain.We present a method for analysis of bacterial aCGH data that encodes spatial information from the reference genome in a hidden Markov model. This technique is the first such method to be validated in comparisons of sequenced bacteria that diverge at the strain and at the genus level: Pectobacterium atrosepticum SCRI1043 (Pba1043 and Dickeya dadantii 3937 (Dda3937; and Lactococcus lactis subsp. lactis IL1403 and L. lactis subsp. cremoris MG1363. In all cases our method is found to outperform common and widely used aCGH analysis methods that do not incorporate spatial information. This analysis is applied to comparisons between commercially important plant pathogenic soft-rotting enterobacteria (SRE Pba1043, P. atrosepticum SCRI1039, P. carotovorum 193, and Dda3937.Our analysis indicates that it should not be assumed that hybridisation strength is a reliable proxy for sequence identity in aCGH experiments, and robustly extends the applicability of aCGH to bacterial comparisons at the genus level. Our results in the SRE further provide evidence for a dynamic, plastic 'accessory' genome, revealing major genomic islands encoding gene products that provide insight into, and may play a direct role in determining, variation amongst the SRE in terms of their environmental survival, host range and aetiology, such as phytotoxin synthesis, multidrug resistance, and nitrogen fixation.

  19. CREST maps somatic structural variation in cancer genomes with base-pair resolution.

    Science.gov (United States)

    Wang, Jianmin; Mullighan, Charles G; Easton, John; Roberts, Stefan; Heatley, Sue L; Ma, Jing; Rusch, Michael C; Chen, Ken; Harris, Christopher C; Ding, Li; Holmfeldt, Linda; Payne-Turner, Debbie; Fan, Xian; Wei, Lei; Zhao, David; Obenauer, John C; Naeve, Clayton; Mardis, Elaine R; Wilson, Richard K; Downing, James R; Zhang, Jinghui

    2011-06-12

    We developed 'clipping reveals structure' (CREST), an algorithm that uses next-generation sequencing reads with partial alignments to a reference genome to directly map structural variations at the nucleotide level of resolution. Application of CREST to whole-genome sequencing data from five pediatric T-lineage acute lymphoblastic leukemias (T-ALLs) and a human melanoma cell line, COLO-829, identified 160 somatic structural variations. Experimental validation exceeded 80%, demonstrating that CREST had a high predictive accuracy.

  20. CREST maps somatic structural variation in cancer genomes with base-pair resolution

    OpenAIRE

    2011-01-01

    We developed CREST (Clipping REveals STructure), an algorithm that uses next-generation sequencing reads with partial alignments to a reference genome to directly map structural variations at the nucleotide level of resolution. Application of CREST to whole-genome sequencing data from five pediatric T-lineage acute lymphoblastic leukemias (T-ALLs) and a human melanoma cell line, COLO-829, identified 160 somatic structural variations. Experimental validation exceeded 80% demonstrating that CRE...

  1. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    Dang Thanh Hai; Nguyen Dai Thanh; Pham Thi Minh Trang; Le Si Quang; Phan Thi Thu Hang; Dang Cao Cuong; Hoang Kim Phuc; Nguyen Huu Duc; Do Duc Dong; Bui Quang Minh; Pham Bao Son; Le Sy Vinh

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome.We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥ 300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

  2. Comparative genomic analysis of eutherian kallikrein genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2017-03-01

    Full Text Available The present study made attempts to update and revise eutherian kallikrein genes implicated in major physiological and pathological processes and in medical molecular diagnostics. Using eutherian comparative genomic analysis protocol and free available genomic sequence assemblies, the tests of reliability of eutherian public genomic sequences annotated most comprehensive curated third party data gene data set of eutherian kallikrein genes including 121 complete coding sequences among 335 potential coding sequences. The present analysis first described 13 major gene clusters of eutherian kallikrein genes, and explained their differential gene expansion patterns. One updated classification and nomenclature of eutherian kallikrein genes was proposed, as new framework of future experiments.

  3. Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation.

    Science.gov (United States)

    Hoffmann, Thomas J; Ehret, Georg B; Nandakumar, Priyanka; Ranatunga, Dilrini; Schaefer, Catherine; Kwok, Pui-Yan; Iribarren, Carlos; Chakravarti, Aravinda; Risch, Neil

    2017-01-01

    Longitudinal electronic health records on 99,785 Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort individuals provided 1,342,814 systolic and diastolic blood pressure measurements for a genome-wide association study on long-term average systolic, diastolic, and pulse pressure. We identified 39 new loci among 75 genome-wide significant loci (P ≤ 5 × 10(-8)), with most replicating in the combined International Consortium for Blood Pressure (ICBP; n = 69,396) and UK Biobank (UKB; n = 152,081) studies. Combining GERA with ICBP yielded 36 additional new loci, with most replicating in UKB. Combining all three studies (n = 321,262) yielded 241 additional genome-wide significant loci, although no replication sample was available for these. All associated loci explained 2.9%, 2.5%, and 3.1% of variation in systolic, diastolic, and pulse pressure, respectively, in GERA non-Hispanic whites. Using multiple blood pressure measurements in GERA doubled the variance explained. A normalized risk score was associated with time to onset of hypertension (hazards ratio = 1.18, P = 8.2 × 10(-45)). Expression quantitative trait locus analysis of blood pressure loci showed enrichment in aorta and tibial artery.

  4. A genome wide survey of SNP variation reveals the genetic structure of sheep breeds.

    Directory of Open Access Journals (Sweden)

    James W Kijas

    Full Text Available The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability.

  5. Structural and Expressional Variations of the Mitochondrial Genome Conferring the Wild Abortive Type of Cytoplasmic Male Sterility in Rice

    Institute of Scientific and Technical Information of China (English)

    Zhen-Lan Liu; Hong Xu; Jing-Xin Guo; Yao-Guang Liu

    2007-01-01

    The so-called "wild abortive" (WA) type of cytoplasmic male sterility (CMS) derived from a wild rice species Oryza rufipogon has been extensively used for hybrid rice breeding. However, extensive analysis of the structure of the related mitochondrial genome has not been reported, and the CMS-associated gene(s) remain unknown. In this study, we exploited a mitochondrial genome-wide strategy to examine the structural and expressional variations in the mitochondrial genome conferring the CMS. The entire mitochondrial genomes of a CMS-WA line and two normal fertile rice lines were amplified by Long-polymerase chain reaction into tilling fragments of up to 15.2 kb. Restriction and DNA blotting analyses of these fragments revealed that structural variations occurred in several regions in the WA mitochondrial genome, as compared to those of the fertile lines. All of the amplified fragments covering the entire mitochondrial genome were used as RNA blot probes to examine the mitochondrial expression profile among the CMS-WA and fertile lines. As a result, only two mRNAs were found to be differentially expressed between the CMS-WA and the fertile lines, which were detected by a probe containing the nadS and orf153 genes and the other having the ribosomal protein gene rp15, respectively. These mRNAs are proposed to be the candidates for further identification and functional studies of the CMS gene.

  6. Complete chloroplast genomes from apomictic Taraxacum (Asteraceae): Identity and variation between three microspecies

    Science.gov (United States)

    Majeský, Ľuboš; Schwarzacher, Trude; Gornall, Richard; Heslop-Harrison, Pat

    2017-01-01

    Chloroplast DNA sequences show substantial variation between higher plant species, and less variation within species, so are typically excellent markers to investigate evolutionary, population and genetic relationships and phylogenies. We sequenced the plastomes of Taraxacum obtusifrons Markl. (O978); T. stridulum Trávniček ined. (S3); and T. amplum Markl. (A978), three apomictic triploid (2n = 3x = 24) dandelions from the T. officinale agg. We aimed to characterize the variation in plastomes, define relationships and correlations with the apomictic microspecies status, and refine placement of the microspecies in the evolutionary or phylogenetic context of the Asteraceae. The chloroplast genomes of accessions O978 and S3 were identical and 151,322 bp long (where the nuclear genes are known to show variation), while A978 was 151,349 bp long. All three genomes contained 135 unique genes, with an additional copy of the trnF-GGA gene in the LSC region and 20 duplicated genes in the IR region, along with short repeats, the typical major Inverted Repeats (IR1 and IR2, 24,431bp long), and Large and Small Single Copy regions (LSC 83,889bp and SSC 18,571bp in O978). Between the two Taraxacum plastomes types, we identified 28 SNPs. The distribution of polymorphisms suggests some parts of the Taraxacum plastome are evolving at a slower rate. There was a hemi-nested inversion in the LSC region that is common to Asteraceae, and an SSC inversion from ndhF to rps15 found only in some Asteraceae lineages. A comparative repeat analysis showed variation between Taraxacum and the phylogenetically close genus Lactuca, with many more direct repeats of 40bp or more in Lactuca (1% larger plastome than Taraxacum). When individual genes and non-coding regions were for Asteraceae phylogeny reconstruction, not all showed the same evolutionary scenario suggesting care is needed for interpretation of relationships if a limited number of markers are used. Studying genotypic diversity in

  7. Complete chloroplast genomes from apomictic Taraxacum (Asteraceae): Identity and variation between three microspecies.

    Science.gov (United States)

    M Salih, Rubar Hussein; Majeský, Ľuboš; Schwarzacher, Trude; Gornall, Richard; Heslop-Harrison, Pat

    2017-01-01

    Chloroplast DNA sequences show substantial variation between higher plant species, and less variation within species, so are typically excellent markers to investigate evolutionary, population and genetic relationships and phylogenies. We sequenced the plastomes of Taraxacum obtusifrons Markl. (O978); T. stridulum Trávniček ined. (S3); and T. amplum Markl. (A978), three apomictic triploid (2n = 3x = 24) dandelions from the T. officinale agg. We aimed to characterize the variation in plastomes, define relationships and correlations with the apomictic microspecies status, and refine placement of the microspecies in the evolutionary or phylogenetic context of the Asteraceae. The chloroplast genomes of accessions O978 and S3 were identical and 151,322 bp long (where the nuclear genes are known to show variation), while A978 was 151,349 bp long. All three genomes contained 135 unique genes, with an additional copy of the trnF-GGA gene in the LSC region and 20 duplicated genes in the IR region, along with short repeats, the typical major Inverted Repeats (IR1 and IR2, 24,431bp long), and Large and Small Single Copy regions (LSC 83,889bp and SSC 18,571bp in O978). Between the two Taraxacum plastomes types, we identified 28 SNPs. The distribution of polymorphisms suggests some parts of the Taraxacum plastome are evolving at a slower rate. There was a hemi-nested inversion in the LSC region that is common to Asteraceae, and an SSC inversion from ndhF to rps15 found only in some Asteraceae lineages. A comparative repeat analysis showed variation between Taraxacum and the phylogenetically close genus Lactuca, with many more direct repeats of 40bp or more in Lactuca (1% larger plastome than Taraxacum). When individual genes and non-coding regions were for Asteraceae phylogeny reconstruction, not all showed the same evolutionary scenario suggesting care is needed for interpretation of relationships if a limited number of markers are used. Studying genotypic diversity in

  8. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  9. Analysis of copy number variations among diverse cattle breeds

    Science.gov (United States)

    Liu, George E.; Hou, Yali; Zhu, Bin; Cardone, Maria Francesca; Jiang, Lu; Cellamare, Angelo; Mitra, Apratim; Alexander, Leeson J.; Coutinho, Luiz L.; Dell'Aquila, Maria Elena; Gasbarre, Lou C.; Lacalandra, Gianni; Li, Robert W.; Matukumalli, Lakshmi K.; Nonneman, Dan; de A. Regitano, Luciana C.; Smith, Tim P.L.; Song, Jiuzhou; Sonstegard, Tad S.; Van Tassell, Curt P.; Ventura, Mario; Eichler, Evan E.; McDaneld, Tara G.; Keele, John W.

    2010-01-01

    Genomic structural variation is an important and abundant source of genetic and phenotypic variation. Here, we describe the first systematic and genome-wide analysis of copy number variations (CNVs) in modern domesticated cattle using array comparative genomic hybridization (array CGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH). The array CGH panel included 90 animals from 11 Bos taurus, three Bos indicus, and three composite breeds for beef, dairy, or dual purpose. We identified over 200 candidate CNV regions (CNVRs) in total and 177 within known chromosomes, which harbor or are adjacent to gains or losses. These 177 high-confidence CNVRs cover 28.1 megabases or ∼1.07% of the genome. Over 50% of the CNVRs (89/177) were found in multiple animals or breeds and analysis revealed breed-specific frequency differences and reflected aspects of the known ancestry of these cattle breeds. Selected CNVs were further validated by independent methods using qPCR and FISH. Approximately 67% of the CNVRs (119/177) completely or partially span cattle genes and 61% of the CNVRs (108/177) directly overlap with segmental duplications. The CNVRs span about 400 annotated cattle genes that are significantly enriched for specific biological functions, such as immunity, lactation, reproduction, and rumination. Multiple gene families, including ULBP, have gone through ruminant lineage-specific gene amplification. We detected and confirmed marked differences in their CNV frequencies across diverse breeds, indicating that some cattle CNVs are likely to arise independently in breeds and contribute to breed differences. Our results provide a valuable resource beyond microsatellites and single nucleotide polymorphisms to explore the full dimension of genetic variability for future cattle genomic research. PMID:20212021

  10. Genome Size and Variation Analysis of Mango (Mangifera indica L.) Germplasms in Yunnan by Flow Cytometry%云南芒果种质基因组大小测定与变异分析

    Institute of Scientific and Technical Information of China (English)

    柳觐; 李开雄; 孔广红; 倪书邦

    2015-01-01

    为了解云南芒果(Mangifera indica L.)种质资源的基因组的变异情况,采用流式细胞术对35份云南芒果种质资源的基因组大小进行了测定和变异分析。结果表明,云南芒果种质资源的基因组大小存在一定差异,基因组的平均C值是0.445110 pg,0.4353177×109 bp,最小的是采自景洪的半栽培种YSM-44(0.434567 pg,0.4250060×109 bp),最大的是采自红河的野生种YSM-25(0.458679 pg,0.4485881×109 bp)。基因组C值变异程度最大的是野生种(CV=1.65%),其次为半野生种(CV=1.26%)、半栽培种(CV=1.21%)和栽培种(CV=0.11%)。与芒果具有相近基因组大小的多为苔藓植物,与“C值悖论”观点相一致。因此,应用流式细胞术能准确、快捷地测定芒果基因组大小,而且云南野生、半野生及半栽培芒果种质资源遗传变异类型丰富,有较大的挖掘利用潜力。%In order to understand the variation of mango (Mangifera indica L.) germplasms in Yunnan, the genome size of 35 germplasms was determined by lfow cytometry and their variation was analyzed. The results showed that the mean genome size among the 35 germplasms was 0.445110 pg and 0.4353177×109 bp, which the minimum one (0.434567 pg, 0.4250060×109 bp) was YSM-44 from Jinghong, and the maximum one (0.458679 pg, 0.44485881×109 bp) was YSM-25 from Honghe. The genome size variation of wild germplasms was the largest (CV=1.65%), followed by semi-wild germplasms (CV=1.26%), semi-cultivated germplasms (CV=1.21%) and cultivated germplasms (CV=0.11%). The bryophytes had similar genome size to mango, which is consistent with the“C-value paradox”theory. Therefore, lfow cytometry method could accurately and fastly measure genome size of mango, and the genetic variation in wild, semi-wild and semi-cultivated germplasms was rich, these could be used for mango breeding.

  11. Genetic Basis for Spontaneous Hybrid Genome Doubling during Allopolyploid Speciation of Common Wheat Shown by Natural Variation Analyses of the Paternal Species

    Science.gov (United States)

    Matsuoka, Yoshihiro; Nasuda, Shuhei; Ashida, Yasuyo; Nitta, Miyuki; Tsujimoto, Hisashi; Takumi, Shigeo; Kawahara, Taihachi

    2013-01-01

    The complex process of allopolyploid speciation includes various mechanisms ranging from species crosses and hybrid genome doubling to genome alterations and the establishment of new allopolyploids as persisting natural entities. Currently, little is known about the genetic mechanisms that underlie hybrid genome doubling, despite the fact that natural allopolyploid formation is highly dependent on this phenomenon. We examined the genetic basis for the spontaneous genome doubling of triploid F1 hybrids between the direct ancestors of allohexaploid common wheat (Triticum aestivum L., AABBDD genome), namely Triticumturgidum L. (AABB genome) and Aegilopstauschii Coss. (DD genome). An Ae. tauschii intraspecific lineage that is closely related to the D genome of common wheat was identified by population-based analysis. Two representative accessions, one that produces a high-genome-doubling-frequency hybrid when crossed with a T. turgidum cultivar and the other that produces a low-genome-doubling-frequency hybrid with the same cultivar, were chosen from that lineage for further analyses. A series of investigations including fertility analysis, immunostaining, and quantitative trait locus (QTL) analysis showed that (1) production of functional unreduced gametes through nonreductional meiosis is an early step key to successful hybrid genome doubling, (2) first division restitution is one of the cytological mechanisms that cause meiotic nonreduction during the production of functional male unreduced gametes, and (3) six QTLs in the Ae. tauschii genome, most of which likely regulate nonreductional meiosis and its subsequent gamete production processes, are involved in hybrid genome doubling. Interlineage comparisons of Ae. tauschii’s ability to cause hybrid genome doubling suggested an evolutionary model for the natural variation pattern of the trait in which non-deleterious mutations in six QTLs may have important roles. The findings of this study demonstrated that the

  12. Genetic basis for spontaneous hybrid genome doubling during allopolyploid speciation of common wheat shown by natural variation analyses of the paternal species.

    Directory of Open Access Journals (Sweden)

    Yoshihiro Matsuoka

    Full Text Available The complex process of allopolyploid speciation includes various mechanisms ranging from species crosses and hybrid genome doubling to genome alterations and the establishment of new allopolyploids as persisting natural entities. Currently, little is known about the genetic mechanisms that underlie hybrid genome doubling, despite the fact that natural allopolyploid formation is highly dependent on this phenomenon. We examined the genetic basis for the spontaneous genome doubling of triploid F1 hybrids between the direct ancestors of allohexaploid common wheat (Triticum aestivum L., AABBDD genome, namely Triticumturgidum L. (AABB genome and Aegilopstauschii Coss. (DD genome. An Ae. tauschii intraspecific lineage that is closely related to the D genome of common wheat was identified by population-based analysis. Two representative accessions, one that produces a high-genome-doubling-frequency hybrid when crossed with a T. turgidum cultivar and the other that produces a low-genome-doubling-frequency hybrid with the same cultivar, were chosen from that lineage for further analyses. A series of investigations including fertility analysis, immunostaining, and quantitative trait locus (QTL analysis showed that (1 production of functional unreduced gametes through nonreductional meiosis is an early step key to successful hybrid genome doubling, (2 first division restitution is one of the cytological mechanisms that cause meiotic nonreduction during the production of functional male unreduced gametes, and (3 six QTLs in the Ae. tauschii genome, most of which likely regulate nonreductional meiosis and its subsequent gamete production processes, are involved in hybrid genome doubling. Interlineage comparisons of Ae. tauschii's ability to cause hybrid genome doubling suggested an evolutionary model for the natural variation pattern of the trait in which non-deleterious mutations in six QTLs may have important roles. The findings of this study demonstrated

  13. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    Science.gov (United States)

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2015-10-30

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.

  14. Global spectrum of copy number variations reveals genome organizational plasticity and proposes new migration routes.

    Science.gov (United States)

    Veerappa, Avinash M; Vishweswaraiah, Sangeetha; Lingaiah, Kusuma; Murthy, Megha; Suresh, Raviraj V; Manjegowda, Dinesh S; Ramachandra, Nallur B

    2015-01-01

    Global spectrum of CNVs is required to catalog variations to provide a high-resolution on the dynamics of genome-organization and human migration. In this study, we performed genome-wide genotyping using high-resolution arrays and identified 44,109 CNVs from 1,715 genomes across 12 populations. The study unraveled the force of independent evolutionary dynamics on genome-organizational plasticity across populations. We demonstrated the use of CNV tool to study human migration and identified a second major settlement establishing new migration routes in addition to existing ones.

  15. Genome Variation Within Triticale in Comparison to its Wheat and Rye Progenitors

    Science.gov (United States)

    Genome variation in the intergeneric wheat-rye hybrid triticale (X Triticosecale Wittmack) has been a puzzle to scientists and plant breeders since the first triticale was synthesized. The existence of unexplained genetic variation in triticale as compared to the parents has been a hindrance to bre...

  16. Copy number variation in Fayoumi and Leghorn chickens analyzed using array comparative genomic hybridization

    NARCIS (Netherlands)

    Abernathy, J.; Li, X.; Jia, X.; Chou, W.; Lamont, S.J.; Crooijmans, R.P.M.A.; Zhou, H.

    2014-01-01

    Copy number variation refers to regions along chromosomes that harbor a type of structural variation, such as duplications or deletions. Copy number variants (CNVs) play a role in many important traits as well as in genetic diversity. Previous analyses of chickens using array comparative genomic hyb

  17. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA...... from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the 'Chicken...... Variation Database' (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information...

  18. Theories of Population Variation in Genes and Genomes

    DEFF Research Database (Denmark)

    Christiansen, Freddy

    as biologists, molecular biologists, breeders, biomathematicians, and biostatisticians. •    Up-to-date treatment of key areas in classical and modern theoretical population genetics •    In-depth coverage of coalescent theory •    Timely discussion of genomic effects of selection •    Inspired by...

  19. Genomic and gene variation in Mycoplasma hominis strains

    DEFF Research Database (Denmark)

    Christiansen, Gunna; Andersen, H; Birkelund, Svend;

    1987-01-01

    DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed no ide...

  20. Structural genomic variation as risk factor for idiopathic recurrent miscarriage

    DEFF Research Database (Denmark)

    Nagirnaja, Liina; Palta, Priit; Kasak, Laura

    2014-01-01

    Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs...

  1. Shifting patterns of natural variation in the nuclear genome of caenorhabditis elegans

    Directory of Open Access Journals (Sweden)

    Okamoto Kazufusa

    2011-06-01

    Full Text Available Abstract Background Genome wide analysis of variation within a species can reveal the evolution of fundamental biological processes such as mutation, recombination, and natural selection. We compare genome wide sequence differences between two independent isolates of the nematode Caenorhabditis elegans (CB4856 and CB4858 and the reference genome (N2. Results The base substitution pattern when comparing N2 against CB4858 reveals a transition over transversion bias (1.32:1 that is not present in CB4856. In CB4856, there is a significant bias in the direction of base substitution. The frequency of A or T bases in N2 that are G or C bases in CB4856 outnumber the opposite frequencies for transitions as well as transversions. These differences were not observed in the N2/CB4858 comparison. Similarly, we observed a strong bias for deletions over insertions in CB4856 (1.44: 1 that is not present in CB4858. In both CB4856 and CB4858, there is a significant correlation between SNP rate and recombination rate on the autosomes but not on the X chromosome. Furthermore, we identified numerous significant hotspots of variation in the CB4856-N2 comparison. In both CB4856 and CB4858, based on a measure of the strength of selection (ka/ks, all the chromosomes are under negative selection and in CB4856, there is no difference in the strength of natural selection in either the autosomes versus X or between any of the chromosomes. By contrast, in CB4858, ka/ks values are smaller in the autosomes than in the X chromosome. In addition, in CB4858, ka/ks values differ between chromosomes. Conclusions The clear bias of deletions over insertions in CB4856 suggests that either the CB4856 genome is becoming smaller or the N2 genome is getting larger. We hypothesize the hotspots found represent alleles that are shared between CB4856 and CB4858 but not N2. Because the ka/ks ratio in the X chromosome is higher than the autosomes on average in CB4858, purifying selection is

  2. Shifting patterns of natural variation in the nuclear genome of caenorhabditis elegans.

    Science.gov (United States)

    Solorzano, Eleanne; Okamoto, Kazufusa; Datla, Pushpa; Sung, Way; Bergeron, R D; Thomas, W K

    2011-06-16

    Genome wide analysis of variation within a species can reveal the evolution of fundamental biological processes such as mutation, recombination, and natural selection. We compare genome wide sequence differences between two independent isolates of the nematode Caenorhabditis elegans (CB4856 and CB4858) and the reference genome (N2). The base substitution pattern when comparing N2 against CB4858 reveals a transition over transversion bias (1.32:1) that is not present in CB4856. In CB4856, there is a significant bias in the direction of base substitution. The frequency of A or T bases in N2 that are G or C bases in CB4856 outnumber the opposite frequencies for transitions as well as transversions. These differences were not observed in the N2/CB4858 comparison. Similarly, we observed a strong bias for deletions over insertions in CB4856 (1.44: 1) that is not present in CB4858. In both CB4856 and CB4858, there is a significant correlation between SNP rate and recombination rate on the autosomes but not on the X chromosome. Furthermore, we identified numerous significant hotspots of variation in the CB4856-N2 comparison.In both CB4856 and CB4858, based on a measure of the strength of selection (ka/ks), all the chromosomes are under negative selection and in CB4856, there is no difference in the strength of natural selection in either the autosomes versus X or between any of the chromosomes. By contrast, in CB4858, ka/ks values are smaller in the autosomes than in the X chromosome. In addition, in CB4858, ka/ks values differ between chromosomes. The clear bias of deletions over insertions in CB4856 suggests that either the CB4856 genome is becoming smaller or the N2 genome is getting larger. We hypothesize the hotspots found represent alleles that are shared between CB4856 and CB4858 but not N2. Because the ka/ks ratio in the X chromosome is higher than the autosomes on average in CB4858, purifying selection is reduced on the X chromosome.

  3. Mathematical Analysis of Genomic Evolution

    Directory of Open Access Journals (Sweden)

    Cedric Green

    2011-01-01

    Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.

  4. Theories of Population Variation in Genes and Genomes

    DEFF Research Database (Denmark)

    Christiansen, Freddy

    genetics, while emphasizing the close interplay between theory and empiricism. Traditional topics such as genetic and phenotypic variation, mutation, migration, and linkage are covered and advanced by contemporary coalescent theory, which describes the genealogy of genes in a population, ultimately...

  5. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping

    DEFF Research Database (Denmark)

    Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas;

    2011-01-01

    across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease....... breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary...... to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed...

  6. Mutations in MAPT gene cause chromosome instability and introduce copy number variations widely in the genome.

    Science.gov (United States)

    Rossi, Giacomina; Conconi, Donatella; Panzeri, Elena; Redaelli, Serena; Piccoli, Elena; Paoletta, Laura; Dalprà, Leda; Tagliavini, Fabrizio

    2013-01-01

    In addition to the main function of promoting polymerization and stabilization of microtubules, other roles are being attributed to tau, now considered a multifunctional protein. In particular, previous studies suggest that tau is involved in chromosome stability and genome protection. We performed cytogenetic analysis, including molecular karyotyping, on lymphocytes and fibroblasts from patients affected by frontotemporal lobar degeneration carrying different mutations in the microtubule-associated protein tau gene, to investigate the effects of these mutations on genome stability. Furthermore, we analyzed the response of mutated lymphoblastoid cell lines to genotoxic agents to evaluate the participation of tau to DNA repair systems. We found a significantly higher level of chromosome aberrations in mutated than in control cells. Mutated lymphocytes showed higher percentages of stable lesions, clonal and total aneuploidy (medians: 2 versus 0, p $\\ll$ 0.01; 1.5 versus 0, p $\\ll$ 0.01; 16.5 versus 0, p $\\ll$ 0.01, respectively). Fibroblasts of patients showed higher percentages of stable lesions, structural aberrations and total aneuploidy (medians: 0 versus 0, p = 0.03; 5.8 versus 0, p = 0.02; 26.5 versus 12.6, p $\\ll$ 0.01, respectively). In addition, the in depth analysis of DNA copy number variations showed a higher tendency to non-allelic homologous recombination in mutated cells. Finally, while our analysis did not support an involvement of tau in DNA repair systems, it revealed its role in stabilization of chromatin. In summary, our findings indicate a role of tau in genome and chromosome stability that can be ascribed to its function as a microtubule-associated protein as well as a protein protecting chromatin integrity through interaction with DNA.

  7. Cyanobacteria Maintain Constant Protein Concentration despite Genome Copy-Number Variation.

    Science.gov (United States)

    Zheng, Xiao-Yu; O'Shea, Erin K

    2017-04-18

    The cyanobacterium Synechococcus elongatus PCC 7942 has multiple copies of its single chromosome, and the copy number varies in individual cells, providing an ideal system to study the effect of genome copy-number variation on cell size and gene expression. Using single-cell fluorescence imaging, we found that protein concentration remained constant across individual cells regardless of genome copy number. Cell volume and the total protein amount from a single gene were both positively, linearly correlated with genome copy number, suggesting that changes in cell volume play an important role in buffering genome copy-number variance. This study provides a quantitative examination of gene expression regulation in cells with variable genome copies and sheds light on the compensation mechanisms for variance in genome copy number. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  8. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    Science.gov (United States)

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer.

  9. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    Science.gov (United States)

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  10. Genomic variation in the porcine immunoglobulin lambda variable region.

    Science.gov (United States)

    Guo, Xi; Schwartz, John C; Murtaugh, Michael P

    2016-04-01

    Production of a vast antibody repertoire is essential for the protection against pathogens. Variable region germline complexity contributes to repertoire diversity and is a standard feature of mammalian immunoglobulin loci, but functional V region genes are limited in swine. For example, the porcine lambda light chain locus is composed of 23 variable (V) genes and 4 joining (J) genes, but only 10 or 11 V and 2 J genes are functional. Allelic variation in V and J may increase overall diversity within a population, yet lead to repertoire holes in individuals lacking key alleles. Previous studies focused on heavy chain genetic variation, thus light chain allelic diversity is not known. We characterized allelic variation of the porcine immunoglobulin lambda variable (IGLV) region genes. All intact IGLV genes in 81 pigs were amplified, sequenced, and analyzed to determine their allelic variation and functionality. We observed mutational variation across the entire length of the IGLV genes, in both framework and complementarity determining regions (CDRs). Three recombination hotspot motifs were also identified suggesting that non-allelic homologous recombination is an evolutionarily alternative mechanism for generating germline antibody diversity. Functional alleles were greatest in the most highly expressed families, IGLV3 and IGLV8. At the population level, allelic variation appears to help maintain the potential for broad antibody repertoire diversity in spite of reduced gene segment choices and limited germline sequence modification. The trade-off may be a reduction in repertoire diversity within individuals that could result in an increased variation in immunity to infectious disease and response to vaccination.

  11. Inheritance and Variation of Genomic DNA Methylation in Diploid and Triploid Pacific Oyster (Crassostrea gigas).

    Science.gov (United States)

    Jiang, Qun; Li, Qi; Yu, Hong; Kong, Lingfeng

    2016-02-01

    DNA methylation is an important epigenetic mechanism that could be responsive to environmental changes indicating a potential role in natural selection and adaption. In order to evaluate an evolutionary role of DNA methylation, it is essential to first gain a better insight into inheritability. To address this question, this study investigated DNA methylation variation from parents to offspring in the Pacific oyster Crassostrea gigas using fluorescent-labeled methylation-sensitive amplified polymorphism (F-MSAP) analysis. Most of parental methylated loci were stably transmitted to offspring segregating following Medelian expectation. However, methylated loci deviated more often than non-methylated loci and offspring showed a few de novo methylated loci indicating DNA methylation changes from parents to offspring. Interestingly, some male-specific methylated loci were found in this study which might help to explore sex determination in oyster. Despite environmental stimuli, genomic stresses such as polyploidization also can induce methylation changes. This study also compared global DNA methylation level and individual methylated loci between diploid and triploid oysters. Results showed no difference in global methylation state but a few ploidy-specific loci were detected. DNA methylation variation during polyploidization was less than autonomous methylation variation from parents to offspring.

  12. DNA variation of the mammalian major histocompatibility complex reflects genomic diversity and population history

    Energy Technology Data Exchange (ETDEWEB)

    Yuhki, Naoya; O' Brien, S.J. (National Cancer Institute, Frederick, MD (USA))

    1990-01-01

    The major histocompatibility complex (MHC) is a multigene complex of tightly linked homologous genes that encode cell surface antigens that play a key role in immune regulation and response to foreign antigens. In most species, MHC gene products display extreme antigenic polymorphism, and their variability has been interpreted to reflect an adaptive strategy for accommodating rapidly evolving infectious agents that periodically afflict natural populations. Determination of the extent of MHC variation has been limited to populations in which skin grafting is feasible or for which serological reagents have been developed. The authors present here a quantitative analysis of restriction fragment length polymorphism of MHC class I genes in several mammalian species (cats, rodents, humans) known to have very different levels of genetic diversity based on functional MHC assays and on allozyme surveys. When homologous class I probes were employed, a notable concordance was observed between the extent of MHC restriction fragment variation and functional MHC variation detected by skin grafts or genome-wide diversity estimated by allozyme screens. These results confirm the genetically depauperate character of the African cheetah, Acinonyx jubatus, and the Asiatic lion, Panthera leo persica; further, they support the use of class I MHC molecular reagents in estimating the extent and character of genetic diversity in natural populations.

  13. A Distance Measure for Genome Phylogenetic Analysis

    Science.gov (United States)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  14. Identification of genomic indels and structural variations using split reads

    Directory of Open Access Journals (Sweden)

    Urban Alexander E

    2011-07-01

    Full Text Available Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC, a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read. All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions. A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models. This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions. We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole

  15. Fixed point theory, variational analysis, and optimization

    CERN Document Server

    Al-Mezel, Saleh Abdullah R; Ansari, Qamrul Hasan

    2015-01-01

    ""There is a real need for this book. It is useful for people who work in areas of nonlinear analysis, optimization theory, variational inequalities, and mathematical economics.""-Nan-Jing Huang, Sichuan University, Chengdu, People's Republic of China

  16. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

    Directory of Open Access Journals (Sweden)

    Osval A. Montesinos-López

    2017-06-01

    Full Text Available There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments.

  17. Patterns of genomic variation in the poplar rust fungus Melampsora larici-populina identify pathogenesis-related factors

    Directory of Open Access Journals (Sweden)

    Antoine ePersoons

    2014-09-01

    Full Text Available Melampsora larici-populina is a fungal pathogen responsible for foliar rust disease on poplar trees, which causes damage to forest plantations worldwide, particularly in Northern Europe. The reference genome of the isolate 98AG31 was previously sequenced using a whole genome shotgun strategy, revealing a large genome of 101 megabases containing 16,399 predicted genes, which included secreted protein genes representing poplar rust candidate effectors. In the present study, the genomes of 15 isolates collected over the past 20 years throughout the French territory, representing distinct virulence profiles, were characterized by massively parallel sequencing to assess genetic variation in the poplar rust fungus. Comparison to the reference genome revealed striking structural variations. Analysis of coverage and sequencing depth identified large missing regions between isolates related to the mating type loci. More than 611,824 single-nucleotide polymorphism (SNP positions were uncovered overall, indicating a remarkable level of polymorphism. Based on the accumulation of non-synonymous substitutions in coding sequences and the relative frequencies of synonymous and non-synonymous polymorphisms (i.e. PN/PS, we identify candidate genes that may be involved in fungal pathogenesis. Correlation between non-synonymous SNPs in genes encoding secreted proteins and pathotypes of the studied isolates revealed candidate genes potentially related to virulences 1, 6 and 8 of the poplar rust fungus.

  18. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  19. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan;

    2014-01-01

    than 1 kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  20. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  1. Nuclear genomic control of naturally occurring variation in mitochondrial function in Drosophila melanogaster.

    Science.gov (United States)

    Jumbo-Lucioni, Patricia; Bu, Su; Harbison, Susan T; Slaughter, Juanita C; Mackay, Trudy F C; Moellering, Douglas R; De Luca, Maria

    2012-11-22

    Mitochondria are organelles found in nearly all eukaryotic cells that play a crucial role in cellular survival and function. Mitochondrial function is under the control of nuclear and mitochondrial genomes. While the latter has been the focus of most genetic research, we remain largely ignorant about the nuclear-encoded genomic control of inter-individual variability in mitochondrial function. Here, we used Drosophila melanogaster as our model organism to address this question. We quantified mitochondrial state 3 and state 4 respiration rates and P:O ratio in mitochondria isolated from the thoraces of 40 sequenced inbred lines of the Drosophila Genetic Reference Panel. We found significant within-population genetic variability for all mitochondrial traits. Hence, we performed genome-wide association mapping and identified 141 single nucleotide polymorphisms (SNPs) associated with differences in mitochondrial respiration and efficiency (P ≤1 × 10-5). Gene-centered regression models showed that 2-3 SNPs can explain 31, 13, and 18% of the phenotypic variation in state 3, state 4, and P:O ratio, respectively. Most of the genes tagged by the SNPs are involved in organ development, second messenger-mediated signaling pathways, and cytoskeleton remodeling. One of these genes, sallimus (sls), encodes a component of the muscle sarcomere. We confirmed the direct effect of sls on mitochondrial respiration using two viable mutants and their coisogenic wild-type strain. Furthermore, correlation network analysis revealed that sls functions as a transcriptional hub in a co-regulated module associated with mitochondrial respiration and is connected to CG7834, which is predicted to encode a protein with mitochondrial electron transfer flavoprotein activity. This latter finding was also verified in the sls mutants. Our results provide novel insights into the genetic factors regulating natural variation in mitochondrial function in D. melanogaster. The integrative genomic

  2. Worldwide patterns of genomic variation and admixture in gray wolves.

    Science.gov (United States)

    Fan, Zhenxin; Silva, Pedro; Gronau, Ilan; Wang, Shuoguo; Armero, Aitor Serres; Schweizer, Rena M; Ramirez, Oscar; Pollinger, John; Galaverni, Marco; Ortega Del-Vecchyo, Diego; Du, Lianming; Zhang, Wenping; Zhang, Zhihe; Xing, Jinchuan; Vilà, Carles; Marques-Bonet, Tomas; Godinho, Raquel; Yue, Bisong; Wayne, Robert K

    2016-02-01

    The gray wolf (Canis lupus) is a widely distributed top predator and ancestor of the domestic dog. To address questions about wolf relationships to each other and dogs, we assembled and analyzed a data set of 34 canine genomes. The divergence between New and Old World wolves is the earliest branching event and is followed by the divergence of Old World wolves and dogs, confirming that the dog was domesticated in the Old World. However, no single wolf population is more closely related to dogs, supporting the hypothesis that dogs were derived from an extinct wolf population. All extant wolves have a surprisingly recent common ancestry and experienced a dramatic population decline beginning at least ∼30 thousand years ago (kya). We suggest this crisis was related to the colonization of Eurasia by modern human hunter-gatherers, who competed with wolves for limited prey but also domesticated them, leading to a compensatory population expansion of dogs. We found extensive admixture between dogs and wolves, with up to 25% of Eurasian wolf genomes showing signs of dog ancestry. Dogs have influenced the recent history of wolves through admixture and vice versa, potentially enhancing adaptation. Simple scenarios of dog domestication are confounded by admixture, and studies that do not take admixture into account with specific demographic models are problematic. © 2016 Fan et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Analysis of dinucleotide signatures in HIV-1 subtype B genomes

    Indian Academy of Sciences (India)

    Aridaman Pandit; Jyothirmayi Vadlamudi; Somdatta Sinha

    2013-12-01

    Dinucleotide usage is known to vary in the genomes of organisms. The dinucleotide usage profiles or genome signatures are similar for sequence samples taken from the same genome, but are different for taxonomically distant species. This concept of genome signatures has been used to study several organisms including viruses, to elucidate the signatures of evolutionary processes at the genome level. Genome signatures assume greater importance in the case of host–pathogen interactions, where molecular interactions between the two species take place continuously, and can influence their genomic composition. In this study, analyses of whole genome sequences of the HIV-1 subtype B, a retrovirus that caused global pandemic of AIDS, have been carried out to analyse the variation in genome signatures of the virus from 1983 to 2007.We show statistically significant temporal variations in some dinucleotide patterns highlighting the selective evolution of the dinucleotide profiles of HIV-1 subtype B, possibly a consequence of host specific selection.

  4. A New Biophysical Metric for Interrogating the Information Content in Human Genome Sequence Variation: Proof of Concept

    CERN Document Server

    Lindesay, James; Ricks-Santi, Luisel; Hercules, William; Kurian, Philip; Dunston, Georgia M

    2011-01-01

    Various studies have shown an association between single nucleotide polymorphisms (SNPs) and common disease. We hypothesize that information encoded in the structure of SNP haploblock variation illumines molecular pathways and cellular mechanisms involved in the regulation of host adaptation to the environment. We developed and utilized the normalized information content (NIC), a novel metric based on SNP haploblock variation. We found that all SNP haploblocks with statistically low information content contained putative transcription factor binding sites and microRNA motifs. We were able to translate a biophysical, mathematical measure of common variants into a deeper understanding of the life sciences through analysis of biochemical patterns associated with SNP haploblock variation. We submit that this new metric, NIC, may be useful in decoding the functional significance of common variation in the human genome and in analyzing the regulation of molecular pathways involved in host adaptation to environmenta...

  5. Genomic variation in recently collected maize landraces from Mexico

    Science.gov (United States)

    Arteaga, María Clara; Moreno-Letelier, Alejandra; Mastretta-Yanes, Alicia; Vázquez-Lobo, Alejandra; Breña-Ochoa, Alejandra; Moreno-Estrada, Andrés; Eguiarte, Luis E.; Piñero, Daniel

    2015-01-01

    The present dataset comprises 36,931 SNPs genotyped in 46 maize landraces native to Mexico as well as the teosinte subspecies Zea maiz ssp. parviglumis and ssp. mexicana. These landraces were collected directly from farmers mostly between 2006 and 2010. We accompany these data with a short description of the variation within each landrace, as well as maps, principal component analyses and neighbor joining trees showing the distribution of the genetic diversity relative to landrace, geographical features and maize biogeography. High levels of genetic variation were detected for the maize landraces (HE = 0.234 to 0.318 (mean 0.311), while slightly lower levels were detected in Zea m. mexicana and Zea m. parviglumis (HE = 0.262 and 0.234, respectively). The distribution of genetic variation was better explained by environmental variables given by the interaction of altitude and latitude than by landrace identity. This dataset is a follow up product of the Global Native Maize Project, an initiative to update the data on Mexican maize landraces and their wild relatives, and to generate information that is necessary for implementing the Mexican Biosafety Law. PMID:26981357

  6. Genomic variation in recently collected maize landraces from Mexico

    Directory of Open Access Journals (Sweden)

    María Clara Arteaga

    2016-03-01

    Full Text Available The present dataset comprises 36,931 SNPs genotyped in 46 maize landraces native to Mexico as well as the teosinte subspecies Zea maiz ssp. parviglumis and ssp. mexicana. These landraces were collected directly from farmers mostly between 2006 and 2010. We accompany these data with a short description of the variation within each landrace, as well as maps, principal component analyses and neighbor joining trees showing the distribution of the genetic diversity relative to landrace, geographical features and maize biogeography. High levels of genetic variation were detected for the maize landraces (HE = 0.234 to 0.318 (mean 0.311, while slightly lower levels were detected in Zea m. mexicana and Zea m. parviglumis (HE = 0.262 and 0.234, respectively. The distribution of genetic variation was better explained by environmental variables given by the interaction of altitude and latitude than by landrace identity. This dataset is a follow up product of the Global Native Maize Project, an initiative to update the data on Mexican maize landraces and their wild relatives, and to generate information that is necessary for implementing the Mexican Biosafety Law.

  7. Genomic variation in recently collected maize landraces from Mexico.

    Science.gov (United States)

    Arteaga, María Clara; Moreno-Letelier, Alejandra; Mastretta-Yanes, Alicia; Vázquez-Lobo, Alejandra; Breña-Ochoa, Alejandra; Moreno-Estrada, Andrés; Eguiarte, Luis E; Piñero, Daniel

    2016-03-01

    The present dataset comprises 36,931 SNPs genotyped in 46 maize landraces native to Mexico as well as the teosinte subspecies Zea maiz ssp. parviglumis and ssp. mexicana. These landraces were collected directly from farmers mostly between 2006 and 2010. We accompany these data with a short description of the variation within each landrace, as well as maps, principal component analyses and neighbor joining trees showing the distribution of the genetic diversity relative to landrace, geographical features and maize biogeography. High levels of genetic variation were detected for the maize landraces (H E = 0.234 to 0.318 (mean 0.311), while slightly lower levels were detected in Zea m. mexicana and Zea m. parviglumis (H E = 0.262 and 0.234, respectively). The distribution of genetic variation was better explained by environmental variables given by the interaction of altitude and latitude than by landrace identity. This dataset is a follow up product of the Global Native Maize Project, an initiative to update the data on Mexican maize landraces and their wild relatives, and to generate information that is necessary for implementing the Mexican Biosafety Law.

  8. Genomic Structure and Variation of Nuclear Factor (Erythroid-Derived 2-Like 2

    Directory of Open Access Journals (Sweden)

    Hye-Youn Cho

    2013-01-01

    Full Text Available High-density mapping of mammalian genomes has enabled a wide range of genetic investigations including the mapping of polygenic traits, determination of quantitative trait loci, and phylogenetic comparison. Genome sequencing analysis of inbred mouse strains has identified high-density single nucleotide polymorphisms (SNPs for investigation of complex traits, which has become a useful tool for biomedical research of human disease to alleviate ethical and practical problems of experimentation in humans. Nuclear factor (erythroid-derived 2-like 2 (NRF2 encodes a key host defense transcription factor. This review describes genetic characteristics of human NRF2 and its homologs in other vertebrate species. NRF2 is evolutionally conserved and shares sequence homology among species. Compilation of publically available SNPs and other genetic mutations shows that human NRF2 is highly polymorphic with a mutagenic frequency of 1 per every 72 bp. Functional at-risk alleles and haplotypes have been demonstrated in various human disorders. In addition, other pathogenic alterations including somatic mutations and misregulated epigenetic processes in NRF2 have led to oncogenic cell survival. Comprehensive information from the current review addresses association of NRF2 variation and disease phenotypes and supports the new insights into therapeutic strategies.

  9. Distinct Contributions of Replication and Transcription to Mutation Rate Variation of Human Genomes

    KAUST Repository

    Cui, Peng

    2012-03-23

    Here, we evaluate the contribution of two major biological processes—DNA replication and transcription—to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.

  10. The New Genomics: What Molecular Databases Can Tell Us About Human Population Variation and Endocrine Disease.

    Science.gov (United States)

    Rotwein, Peter

    2017-07-01

    Major recent advances in genetics and genomics present unique opportunities for enhancing our understanding of human physiology and disease predisposition. Here I demonstrate how analysis of genomic information can provide new insights into endocrine systems, using the human growth hormone (GH) signaling pathway as an illustrative example. GH is essential for normal postnatal growth in children, and plays important roles in other biological processes throughout life. GH actions are mediated by the GH receptor, primarily via the JAK2 protein tyrosine kinase and the STAT5B transcription factor, and inactivating mutations in this pathway all lead to impaired somatic growth. Variation in GH signaling genes has been evaluated using DNA sequence data from the Exome Aggregation Consortium, a compendium of information from >60,000 individuals. Results reveal many potential missense and other alterations in the coding regions of GH1, GHR, JAK2, and STAT5B, with most changes being uncommon. The total number of different alleles per gene varied by ~threefold, from 101 for GH1 to 338 for JAK2. Several known disease-linked mutations in GH1, GHR, and JAK2 were present but infrequent in the population; however, three amino acid changes in GHR were sufficiently prevalent (~4% to 44% of chromosomes) to suggest that they are not disease causing. Collectively, these data provide new opportunities to understand how genetically driven variability in GH signaling and action may modify human physiology and disease. Copyright © 2017 Endocrine Society.

  11. Intra-specific variation in genome size in maize: cytological and phenotypic correlates

    Science.gov (United States)

    Realini, María Florencia; Poggio, Lidia; Cámara-Hernández, Julián; González, Graciela Esther

    2016-01-01

    Genome size variation accompanies the diversification and evolution of many plant species. Relationships between DNA amount and phenotypic and cytological characteristics form the basis of most hypotheses that ascribe a biological role to genome size. The goal of the present research was to investigate the intra-specific variation in the DNA content in maize populations from Northeastern Argentina and further explore the relationship between genome size and the phenotypic traits seed weight and length of the vegetative cycle. Moreover, cytological parameters such as the percentage of heterochromatin as well as the number, position and sequence composition of knobs were analysed and their relationships with 2C DNA values were explored. The populations analysed presented significant differences in 2C DNA amount, from 4.62 to 6.29 pg, representing 36.15 % of the inter-populational variation. Moreover, intra-populational genome size variation was found, varying from 1.08 to 1.63-fold. The variation in the percentage of knob heterochromatin as well as in the number, chromosome position and sequence composition of the knobs was detected among and within the populations. Although a positive relationship between genome size and the percentage of heterochromatin was observed, a significant correlation was not found. This confirms that other non-coding repetitive DNA sequences are contributing to the genome size variation. A positive relationship between DNA amount and the seed weight has been reported in a large number of species, this relationship was not found in the populations studied here. The length of the vegetative cycle showed a positive correlation with the percentage of heterochromatin. This result allowed attributing an adaptive effect to heterochromatin since the length of this cycle would be optimized via selection for an appropriate percentage of heterochromatin. PMID:26644343

  12. Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion.

    Science.gov (United States)

    Xi, Ruibin; Hadjipanayis, Angela G; Luquette, Lovelace J; Kim, Tae-Min; Lee, Eunjung; Zhang, Jianhua; Johnson, Mark D; Muzny, Donna M; Wheeler, David A; Gibbs, Richard A; Kucherlapati, Raju; Park, Peter J

    2011-11-15

    DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer and confer susceptibility to a variety of human disorders. Array comparative genomic hybridization has been used widely to identify CNVs genome wide, but the next-generation sequencing technology provides an opportunity to characterize CNVs genome wide with unprecedented resolution. In this study, we developed an algorithm to detect CNVs from whole-genome sequencing data and applied it to a newly sequenced glioblastoma genome with a matched control. This read-depth algorithm, called BIC-seq, can accurately and efficiently identify CNVs via minimizing the Bayesian information criterion. Using BIC-seq, we identified hundreds of CNVs as small as 40 bp in the cancer genome sequenced at 10× coverage, whereas we could only detect large CNVs (> 15 kb) in the array comparative genomic hybridization profiles for the same genome. Eighty percent (14/16) of the small variants tested (110 bp to 14 kb) were experimentally validated by quantitative PCR, demonstrating high sensitivity and true positive rate of the algorithm. We also extended the algorithm to detect recurrent CNVs in multiple samples as well as deriving error bars for breakpoints using a Gibbs sampling approach. We propose this statistical approach as a principled yet practical and efficient method to estimate CNVs in whole-genome sequencing data.

  13. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  14. Understanding the Causes and Implications of Endothelial Metabolic Variation in Cardiovascular Disease through Genome-Scale Metabolic Modeling

    DEFF Research Database (Denmark)

    McGarrity, Sarah; Halldórsson, Haraldur; Palsson, Sirus

    2016-01-01

    of endothelial cell (EC) metabolism and its connections to cardiovascular disease (CVD) and explore the use of genome-scale metabolic models (GEMs) for integrating metabolic and genomic data. GEMs combine gene expression and metabolic data acting as frameworks for their analysis and, ultimately, afford...... mechanistic understanding of how genetic variation impacts metabolism. We demonstrate how GEMs can be used to investigate CVD-related genetic variation, drug resistance mechanisms, and novel metabolic pathways in ECs. The application of GEMs in personalized medicine is also highlighted. Particularly, we focus...... on the potential of GEMs to identify metabolic biomarkers of endothelial dysfunction and to discover methods of stratifying treatments for CVDs based on individual genetic markers. Recent advances in systems biology methodology, and how these methodologies can be applied to understand EC metabolism in both health...

  15. Genetic variation and population substructure in outbred CD-1 mice: implications for genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Kimberly A Aldinger

    Full Text Available Outbred laboratory mouse populations are widely used in biomedical research. Since little is known about the degree of genetic variation present in these populations, they are not widely used for genetic studies. Commercially available outbred CD-1 mice are drawn from an extremely large breeding population that has accumulated many recombination events, which is desirable for genome-wide association studies. We therefore examined the degree of genome-wide variation within CD-1 mice to investigate their suitability for genetic studies. The CD-1 mouse genome displays patterns of linkage disequilibrium and heterogeneity similar to wild-caught mice. Population substructure and phenotypic differences were observed among CD-1 mice obtained from different breeding facilities. Differences in genetic variation among CD-1 mice from distinct facilities were similar to genetic differences detected between closely related human populations, consistent with a founder effect. This first large-scale genetic analysis of the outbred CD-1 mouse strain provides important considerations for the design and analysis of genetic studies in CD-1 mice.

  16. AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks.

    Science.gov (United States)

    Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M

    2017-01-15

    AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.

  17. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    Energy Technology Data Exchange (ETDEWEB)

    Muchero, Wellington [ORNL; Labbe, Jessy L [ORNL; Priya, Ranjan [University of Tennessee, Knoxville (UTK); DiFazio, Steven P [West Virginia University, Morgantown; Tuskan, Gerald A [ORNL

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  18. A refined model of the genomic basis for phenotypic variation in vertebrate hemostasis.

    Science.gov (United States)

    Ribeiro, Ângela M; Zepeda-Mendoza, M Lisandra; Bertelsen, Mads F; Kristensen, Annemarie T; Jarvis, Erich D; Gilbert, M Thomas P; da Fonseca, Rute R

    2015-06-30

    Hemostasis is a defense mechanism that enhances an organism's survival by minimizing blood loss upon vascular injury. In vertebrates, hemostasis has been evolving with the cardio-vascular and hemodynamic systems over the last 450 million years. Birds and mammals have very similar vascular and hemodynamic systems, thus the mechanism that blocks ruptures in the vasculature is expected to be the same. However, the speed of the process varies across vertebrates, and is particularly slow for birds. Understanding the differences in the hemostasis pathway between birds and mammals, and placing them in perspective to other vertebrates may provide clues to the genetic contribution to variation in blood clotting phenotype in vertebrates. We compiled genomic data corresponding to key elements involved in hemostasis across vertebrates to investigate its genetic basis and understand how it affects fitness. We found that: i) fewer genes are involved in hemostasis in birds compared to mammals; and ii) the largest differences concern platelet membrane receptors and components from the kallikrein-kinin system. We propose that lack of the cytoplasmic domain of the GPIb receptor subunit alpha could be a strong contributor to the prolonged bleeding phenotype in birds. Combined analysis of laboratory assessments of avian hemostasis with the first avian phylogeny based on genomic-scale data revealed that differences in hemostasis within birds are not explained by phylogenetic relationships, but more so by genetic variation underlying components of the hemostatic process, suggestive of natural selection. This work adds to our understanding of the evolution of hemostasis in vertebrates. The overlap with the inflammation, complement and renin-angiotensin (blood pressure regulation) pathways is a potential driver of rapid molecular evolution in the hemostasis network. Comparisons between avian species and mammals allowed us to hypothesize that the observed mammalian innovations might have

  19. Stochastic Power Grid Analysis Considering Process Variations

    CERN Document Server

    Ghanta, Praveen; Panda, Rajendran; Wang, Janet

    2011-01-01

    In this paper, we investigate the impact of interconnect and device process variations on voltage fluctuations in power grids. We consider random variations in the power grid's electrical parameters as spatial stochastic processes and propose a new and efficient method to compute the stochastic voltage response of the power grid. Our approach provides an explicit analytical representation of the stochastic voltage response using orthogonal polynomials in a Hilbert space. The approach has been implemented in a prototype software called OPERA (Orthogonal Polynomial Expansions for Response Analysis). Use of OPERA on industrial power grids demonstrated speed-ups of up to two orders of magnitude. The results also show a significant variation of about $\\pm$ 35% in the nominal voltage drops at various nodes of the power grids and demonstrate the need for variation-aware power grid analysis.

  20. Genome-wide profiling of genetic variation in Agrobacterium-transformed rice plants*#

    Science.gov (United States)

    Li, Wen-xu; Wu, San-ling; Liu, Yan-hua; Jin, Gu-lei; Zhao, Hai-jun; Fan, Long-jiang; Shu, Qing-yao

    2016-01-01

    Agrobacterium-mediated transformation has been widely used in producing transgenic plants, and was recently used to generate “transgene-clean” targeted genomic modifications coupled with the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas9) system. Although tremendous variation in morphological and agronomic traits, such as plant height, seed fertility, and grain size, was observed in transgenic plants, the underlying mechanisms are not yet well understood, and the types and frequency of genetic variation in transformed plants have not been fully disclosed. To reveal the genome-wide variation in transformed plants, we sequenced the genomes of five independent T0 rice plants using next-generation sequencing (NGS) techniques. Bioinformatics analyses followed by experimental validation revealed the following: (1) in addition to transfer-DNA (T-DNA) insertions, three transformed plants carried heritable plasmid backbone DNA of variable sizes (855–5216 bp) and in different configurations with the T-DNA insertions (linked or apart); (2) each transgenic plant contained an estimated 338–1774 independent genetic variations (single nucleotide variations (SNVs) or small insertion/deletions); and (3) 2–6 new Tos17 insertions were detected in each transformed plant, but no other transposable elements or bacterial genomic DNA. PMID:27921404

  1. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden

    Science.gov (United States)

    Platzer, Alexander; Zhang, Qingrun; Vilhjálmsson, Bjarni J; Korte, Arthur; Nizhynska, Viktoria; Voronin, Viktor; Korte, Pamela; Sedman, Laura; Mandáková, Terezie; Lysak, Martin A; Seren, Ümit; Hellmann, Ines; Nordborg, Magnus

    2013-01-01

    Despite advances in sequencing, the goal of obtaining a comprehensive view of genetic variation in populations is still far from reached. We sequenced 180 lines of A. thaliana from Sweden to obtain as complete a picture as possible of variation in a single region. Whereas simple polymorphisms in the unique portion of the genome are readily identified, other polymorphisms are not. The massive variation in genome size identified by flow cytometry seems largely to be due to 45S rDNA copy number variation, with lines from northern Sweden having particularly large numbers of copies. Strong selection is evident in the form of long-range linkage disequilibrium (LD), as well as in LD between nearby compensatory mutations. Many footprints of selective sweeps were found in lines from northern Sweden, and a massive global sweep was shown to have involved a 700-kb transposition. PMID:23793030

  2. Analysis of recent segmental duplications in the bovine genome

    Directory of Open Access Journals (Sweden)

    Li Congjun

    2009-12-01

    Full Text Available Abstract Background Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We performed the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus. Using two distinct computational analyses, we estimated that 3.1% (94.4 Mb of the bovine genome consists of recently duplicated sequences (≥ 1 kb in length, ≥ 90% sequence identity. Similar to other mammalian draft assemblies, almost half (47% of 94.4 Mb of these sequences have not been assigned to cattle chromosomes. Results In this study, we provide the first experimental validation large duplications and briefly compared their distribution on two independent bovine genome assemblies using fluorescent in situ hybridization (FISH. Our analyses suggest that the (75-90% of segmental duplications are organized into local tandem duplication clusters. Along with rodents and carnivores, these results now confidently establish tandem duplications as the most likely mammalian archetypical organization, in contrast to humans and great ape species which show a preponderance of interspersed duplications. A cross-species survey of duplicated genes and gene families indicated that duplication, positive selection and gene conversion have shaped primates, rodents, carnivores and ruminants to different degrees for their speciation and adaptation. We identified that bovine segmental duplications corresponding to genes are significantly enriched for specific biological functions such as immunity, digestion, lactation and reproduction. Conclusion Our results suggest that in most mammalian lineages segmental duplications are organized in a tandem configuration. Segmental duplications remain problematic for genome and assembly and we highlight genic regions that require higher quality sequence characterization. This study provides insights into mammalian genome evolution and generates a valuable

  3. Genome size determination in peronosporales (Oomycota) by Feulgen image analysis.

    Science.gov (United States)

    Voglmayr, H; Greilhuber, J

    1998-12-01

    Genome size was determined, by nuclear Feulgen staining and image analysis, in 46 accessions of 31 species of Peronosporales (Oomycota), including important plant pathogens such as Bremia lactucae, Plasmopara viticola, Pseudoperonospora cubensis, and Pseudoperonospora humuli. The 1C DNA contents ranged from 0.046 (45. 6 Mb) to 0.163 pg (159.9 Mb). This is 0.041- to 0.144-fold that of Glycine max (soybean, 1C = 1.134 pg), which was used as an internal standard for genome size determination. The linearity of Feulgen absorbance photometry method over this range was demonstrated by calibration of Aspergillus species (1C = 31-38 Mb) against Glycine, which revealed differences of less than 6% compared to the published CHEF data. The low coefficients of variation (usually between 5 and 10%), repeatability of the results, and compatibility with CHEF data prove the resolution power of Feulgen image analysis. The applicability and limitations of Feulgen photometry are discussed in relation to other methods of genome size determination (CHEF gel electrophoresis, reassociation kinetics, genomic reconstruction) that have been previously applied to Oomycota. Copyright 1998 Academic Press.

  4. Structural genomic variation in childhood epilepsies with complex phenotypes

    DEFF Research Database (Denmark)

    Helbig, Ingo; Swinkels, Marielle E M; Aten, Emmelien

    2014-01-01

    A genetic contribution to a broad range of epilepsies has been postulated, and particularly copy number variations (CNVs) have emerged as significant genetic risk factors. However, the role of CNVs in patients with epilepsies with complex phenotypes is not known. Therefore, we investigated the role...... of CNVs in patients with unclassified epilepsies and complex phenotypes. A total of 222 patients from three European countries, including patients with structural lesions on magnetic resonance imaging (MRI), dysmorphic features, and multiple congenital anomalies, were clinically evaluated and screened...... for CNVs. MRI findings including acquired or developmental lesions and patient characteristics were subdivided and analyzed in subgroups. MRI data were available for 88.3% of patients, of whom 41.6% had abnormal MRI findings. Eighty-eight rare CNVs were discovered in 71 out of 222 patients (31...

  5. Integrative bayesian network analysis of genomic data.

    Science.gov (United States)

    Ni, Yang; Stingo, Francesco C; Baladandayuthapani, Veerabhadran

    2014-01-01

    Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.

  6. Comparative genome analysis of Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  7. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    Directory of Open Access Journals (Sweden)

    Kaas Rolf S

    2012-10-01

    Full Text Available Abstract Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. Conclusion The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.

  8. Chromosomal Copy Number Variation in Saccharomyces pastorianus Is Evidence for Extensive Genome Dynamics in Industrial Lager Brewing Strains.

    Science.gov (United States)

    van den Broek, M; Bolat, I; Nijkamp, J F; Ramos, E; Luttik, M A H; Koopman, F; Geertman, J M; de Ridder, D; Pronk, J T; Daran, J-M

    2015-09-01

    Lager brewing strains of Saccharomyces pastorianus are natural interspecific hybrids originating from the spontaneous hybridization of Saccharomyces cerevisiae and Saccharomyces eubayanus. Over the past 500 years, S. pastorianus has been domesticated to become one of the most important industrial microorganisms. Production of lager-type beers requires a set of essential phenotypes, including the ability to ferment maltose and maltotriose at low temperature, the production of flavors and aromas, and the ability to flocculate. Understanding of the molecular basis of complex brewing-related phenotypic traits is a prerequisite for rational strain improvement. While genome sequences have been reported, the variability and dynamics of S. pastorianus genomes have not been investigated in detail. Here, using deep sequencing and chromosome copy number analysis, we showed that S. pastorianus strain CBS1483 exhibited extensive aneuploidy. This was confirmed by quantitative PCR and by flow cytometry. As a direct consequence of this aneuploidy, a massive number of sequence variants was identified, leading to at least 1,800 additional protein variants in S. pastorianus CBS1483. Analysis of eight additional S. pastorianus strains revealed that the previously defined group I strains showed comparable karyotypes, while group II strains showed large interstrain karyotypic variability. Comparison of three strains with nearly identical genome sequences revealed substantial chromosome copy number variation, which may contribute to strain-specific phenotypic traits. The observed variability of lager yeast genomes demonstrates that systematic linking of genotype to phenotype requires a three-dimensional genome analysis encompassing physical chromosomal structures, the copy number of individual chromosomes or chromosomal regions, and the allelic variation of copies of individual genes.

  9. Phylogeny, rate variation, and genome size evolution of Pelargonium (Geraniaceae).

    Science.gov (United States)

    Weng, Mao-Lun; Ruhlman, Tracey A; Gibby, Mary; Jansen, Robert K

    2012-09-01

    The phylogeny of 58 Pelargonium species was estimated using five plastid markers (rbcL, matK, ndhF, rpoC1, trnL-F) and one mitochondrial gene (nad5). The results confirmed the monophyly of three major clades and four subclades within Pelargonium but also indicate the need to revise some sectional classifications. This phylogeny was used to examine karyotype evolution in the genus: plotting chromosome sizes, numbers and 2C-values indicates that genome size is significantly correlated with chromosome size but not number. Accelerated rates of nucleotide substitution have been previously detected in both plastid and mitochondrial genes in Pelargonium, but sparse taxon sampling did not enable identification of the phylogenetic distribution of these elevated rates. Using the multigene phylogeny as a constraint, we investigated lineage- and locus-specific heterogeneity of substitution rates in Pelargonium for an expanded number of taxa and demonstrated that both plastid and mitochondrial genes have had accelerated substitution rates but with markedly disparate patterns. In the plastid, the exons of rpoC1 have significantly accelerated substitution rates compared to its intron and the acceleration was mainly due to nonsynonymous substitutions. In contrast, the mitochondrial gene, nad5, experienced substantial acceleration of synonymous substitution rates in three internal branches of Pelargonium, but this acceleration ceased in all terminal branches. Several lineages also have dN/dS ratios significantly greater than one for rpoC1, indicating that positive selection is acting on this gene, whereas the accelerated synonymous substitutions in the mitochondrial gene are the result of elevated mutation rates.

  10. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.

    2015-01-01

    Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins.  Results: Here we present the genomic sequen...

  11. European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation

    Science.gov (United States)

    Tine, Mbaye; Kuhl, Heiner; Gagnaire, Pierre-Alexandre; Louro, Bruno; Desmarais, Erick; Martins, Rute S.T.; Hecht, Jochen; Knaust, Florian; Belkhir, Khalid; Klages, Sven; Dieterich, Roland; Stueber, Kurt; Piferrer, Francesc; Guinand, Bruno; Bierne, Nicolas; Volckaert, Filip A. M.; Bargelloni, Luca; Power, Deborah M.; Bonhomme, François; Canario, Adelino V. M.; Reinhardt, Richard

    2014-01-01

    The European sea bass (Dicentrarchus labrax) is a temperate-zone euryhaline teleost of prime importance for aquaculture and fisheries. This species is subdivided into two naturally hybridizing lineages, one inhabiting the north-eastern Atlantic Ocean and the other the Mediterranean and Black seas. Here we provide a high-quality chromosome-scale assembly of its genome that shows a high degree of synteny with the more highly derived teleosts. We find expansions of gene families specifically associated with ion and water regulation, highlighting adaptation to variation in salinity. We further generate a genome-wide variation map through RAD-sequencing of Atlantic and Mediterranean populations. We show that variation in local recombination rates strongly influences the genomic landscape of diversity within and differentiation between lineages. Comparing predictions of alternative demographic models to the joint allele-frequency spectrum indicates that genomic islands of differentiation between sea bass lineages were generated by varying rates of introgression across the genome following a period of geographical isolation. PMID:25534655

  12. Genomic variation in rice: genesis of highly polymorphic linkage blocks during domestication.

    Directory of Open Access Journals (Sweden)

    Tian Tang

    2006-11-01

    Full Text Available Genomic regions that are unusually divergent between closely related species or racial groups can be particularly informative about the process of speciation or the operation of natural selection. The two sequenced genomes of cultivated Asian rice, Oryza sativa, reveal that at least 6% of the genomes are unusually divergent. Sequencing of ten unlinked loci from the highly divergent regions consistently identified two highly divergent haplotypes with each locus in nearly complete linkage disequilibrium among 25 O. sativa cultivars and 35 lines from six wild species. The existence of two highly divergent haplotypes in high divergence regions in species from all geographical areas (Africa, Asia, and Oceania was in contrast to the low polymorphism and low linkage disequilibrium that were observed in other parts of the genome, represented by ten reference loci. While several natural processes are likely to contribute to this pattern of genomic variation, domestication may have greatly exaggerated the trend. In this hypothesis, divergent haplotypes that were adapted to different geographical and ecological environments migrated along with humans during the development of domesticated varieties. If true, these high divergence regions of the genome would be enriched for loci that contribute to the enormous range of phenotypic variation observed among domesticated breeds.

  13. Experimental evidence for ecological selection on genome variation in the wild.

    Science.gov (United States)

    Gompert, Zachariah; Comeault, Aaron A; Farkas, Timothy E; Feder, Jeffrey L; Parchman, Thomas L; Buerkle, C Alex; Nosil, Patrik

    2014-03-01

    Understanding natural selection's effect on genetic variation is a major goal in biology, but the genome-scale consequences of contemporary selection are not well known. In a release and recapture field experiment we transplanted stick insects to native and novel host plants and directly measured allele frequency changes within a generation at 186,576 genetic loci. We observed substantial, genome-wide allele frequency changes during the experiment, most of which could be attributed to random mortality (genetic drift). However, we also documented that selection affected multiple genetic loci distributed across the genome, particularly in transplants to the novel host. Host-associated selection affecting the genome acted on both a known colour-pattern trait as well as other (unmeasured) phenotypes. We also found evidence that selection associated with elevation affected genome variation, although our experiment was not designed to test this. Our results illustrate how genomic data can identify previously underappreciated ecological sources and phenotypic targets of selection. © 2013 The Authors. Ecology Letters published by John Wiley & Sons Ltd and CNRS.

  14. Whole-genome sequencing reveals the diversity of cattle copy number variations and multicopy genes

    Science.gov (United States)

    Structural and functional impacts of copy number variations (CNVs) on livestock genomes are not yet well understood. We identified 1853 CNV regions using population-scale sequencing data generated from 75 cattle representing 8 breeds (Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, Romagnol...

  15. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian;

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti...

  16. Identification of Nucleotide Variation in Genomes Using Next-Generation Sequencing

    NARCIS (Netherlands)

    Megens, H.J.W.C.; Groenen, M.A.M.

    2012-01-01

    Discovery of genome-wide variation has taken a huge leap forward with the introduction of next-generation sequencing (NGS) technology. Variant discovery requires sampling of a number of haplotypes. This can be either the two haplotypes of a diploid organism or multiple haplotypes in a population. Va

  17. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    Directory of Open Access Journals (Sweden)

    Aurélien Chateigner

    2015-07-01

    Full Text Available Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%. K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs. Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential.

  18. Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.

    Science.gov (United States)

    Faber-Hammond, Joshua J; Brown, Kim H

    2016-07-01

    The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10-20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.

  19. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  20. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Science.gov (United States)

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  1. [Homologous simple sequence repeats (SSRs) analysis in tetraploid (AD1) and diploid (A₂, D₅) genomes of Gossypium].

    Science.gov (United States)

    Gaofei, Sun; Shoupu, He; Zhaoe, Pan; Xiongming, Du

    2015-02-01

    Simple sequence repeats (SSRs)are a class of repetitive DNA sequences, which are commonly used for genome analysis. Comparison of the homologous SSRs among different genomes is helpful to understand the evolutionary process in relative species. In this study, SSR scanning was performed to investigate their distribution and length variation among the genomes of G. raimondii (D₅), G. arboretum (A₂) and G. hirsutum (AD₁). The results demonstrated that the distribution of SSRs in A genome was very similar with that in D genome, while the length variation of homologous SSRs between A and AD genome was more conserved than that between D and AD genome. Compared with SSRs in AD genome, the number of SSRs with longer motif length in A genome was about five times of those with shorter motif length, while it was about three times in D genome. This implied that the length variation rates of homologous SSRs between diploid cotton and tetraploid cotton were different during the parallel evolution due to the subgenome fusion, and the motif length of most SSRs in tetraoploid genome tended to become shorter than homologous SSRs in diploid genome during the process of evolution. This study comprehensively compared the SSRs in three cotton genomes and revealed the significant difference among them, providing a foundation for further evolutionary study of Gossypium genome.

  2. Copy number variation in the genomes of twelve natural isolates of Caenorhabditis elegans

    Directory of Open Access Journals (Sweden)

    Flibotte Stephane

    2010-01-01

    Full Text Available Abstract Background Copy number variation is an important component of genetic variation in higher eukaryotes. The extent of natural copy number variation in C. elegans is unknown outside of 2 highly divergent wild isolates and the canonical N2 Bristol strain. Results We have used array comparative genomic hybridization (aCGH to detect copy number variation in the genomes of 12 natural isolates of Caenorhabditis elegans. Deletions relative to the canonical N2 strain are more common in these isolates than duplications, and indels are enriched in multigene families on the autosome arms. Among the strains in our study, the Hawaiian and Madeiran strains (CB4856 and JU258 carry the largest number of deletions, followed by the Vancouver strain (KR314. Overall we detected 510 different deletions affecting 1136 genes, or over 5% of the genes in the canonical N2 genome. The indels we identified had a median length of 2.7 kb. Since many deletions are found in multiple isolates, deletion loci were used as markers to derive an unrooted tree to estimate genetic relatedness among the strains. Conclusion Copy number variation is extensive in C. elegans, affecting over 5% of the genes in the genome. The deletions we have detected in natural isolates of C. elegans contribute significantly to the number of deletion alleles available to researchers. The relationships between strains are complex and different regions of the genome possess different genealogies due to recombination throughout the natural history of the species, which may not be apparent in studies utilizing smaller numbers of genetic markers.

  3. The next evolutionary synthesis: from Lamarck and Darwin to genomic variation and systems biology

    Directory of Open Access Journals (Sweden)

    Bard Jonathan BL

    2011-11-01

    Full Text Available Abstract The evolutionary synthesis, the standard 20th century view of how evolutionary change occurs, is based on selection, heritable phenotypic variation and a very simple view of genes. It is therefore unable to incorporate two key aspects of modern molecular knowledge: first is the richness of genomic variation, so much more complicated than simple mutation, and second is the opaque relationship between the genotype and its resulting phenotype. Two new and important books shed some light on how we should view evolutionary change now. Evolution: a view from the 21st century by J.A. Shapiro (2011, FT Press Science, New Jersey, USA. pp. 246. $34.99. examines the richness of genomic variation and its implications. Transformations of Lamarckism: from Subtle Fluids to Molecular Biology edited by S.B. Gissis & E. Jablonka (2011, MIT Press, Cambridge, USA. pp. 457 includes some 40 papers that anyone with an interest in the history of evolutionary thought and the relationship between the environment and the genome will want to read. This review discusses both books within the context of contemporary evolutionary thinking and points out that neither really comes to terms with today's key systems-biology question: how does mutation-induced variation in a molecular network generate variation in the resulting phenotype?

  4. Genic intolerance to functional variation and the interpretation of personal genomes.

    Directory of Open Access Journals (Sweden)

    Slavé Petrovski

    Full Text Available A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP. Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.

  5. Adaptive potential of genomic structural variation in human and mammalian evolution.

    Science.gov (United States)

    Radke, David W; Lee, Charles

    2015-09-01

    Because phenotypic innovations must be genetically heritable for biological evolution to proceed, it is natural to consider new mutation events as well as standing genetic variation as sources for their birth. Previous research has identified a number of single-nucleotide polymorphisms that underlie a subset of adaptive traits in organisms. However, another well-known class of variation, genomic structural variation, could have even greater potential to produce adaptive phenotypes, due to the variety of possible types of alterations (deletions, insertions, duplications, among others) at different genomic positions and with variable lengths. It is from these dramatic genomic alterations, and selection on their phenotypic consequences, that adaptations leading to biological diversification could be derived. In this review, using studies in humans and other mammals, we highlight examples of how phenotypic variation from structural variants might become adaptive in populations and potentially enable biological diversification. Phenotypic change arising from structural variants will be described according to their immediate effect on organismal metabolic processes, immunological response and physical features. Study of population dynamics of segregating structural variation can therefore provide a window into understanding current and historical biological diversification.

  6. Extreme recombination frequencies shape genome variation and evolution in the honeybee, Apis mellifera.

    Directory of Open Access Journals (Sweden)

    Andreas Wallberg

    2015-04-01

    Full Text Available Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC, which we infer to generate an allele fixation bias 5 - 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution.

  7. SENSITIVITY ANALYSIS FOR PARAMETERIZED VARIATIONAL INEQUALITY PROBLEMS

    Institute of Scientific and Technical Information of China (English)

    Li Fei

    2004-01-01

    This paper presents sensitivity analysis for parameterized variational inequality problems (VIP). Under appropriate assumption, it is shown that the perturbed solution to parameterized VIP is existent, unique, continuous and differentiable with respect to perturbation parameter. In the case of differentiability, we derive the equations forcalculating the derivative of solution variables with respect to the perturbation parameters.

  8. Genomic profiling of plastid DNA variation in the Mediterranean olive tree

    Directory of Open Access Journals (Sweden)

    Dorado Gabriel

    2011-05-01

    Full Text Available Abstract Background Characterisation of plastid genome (or cpDNA polymorphisms is commonly used for phylogeographic, population genetic and forensic analyses in plants, but detecting cpDNA variation is sometimes challenging, limiting the applications of such an approach. In the present study, we screened cpDNA polymorphism in the olive tree (Olea europaea L. by sequencing the complete plastid genome of trees with a distinct cpDNA lineage. Our objective was to develop new markers for a rapid genomic profiling (by Multiplex PCRs of cpDNA haplotypes in the Mediterranean olive tree. Results Eight complete cpDNA genomes of Olea were sequenced de novo. The nucleotide divergence between olive cpDNA lineages was low and not exceeding 0.07%. Based on these sequences, markers were developed for studying two single nucleotide substitutions and length polymorphism of 62 regions (with variable microsatellite motifs or other indels. They were then used to genotype the cpDNA variation in cultivated and wild Mediterranean olive trees (315 individuals. Forty polymorphic loci were detected on this sample, allowing the distinction of 22 haplotypes belonging to the three Mediterranean cpDNA lineages known as E1, E2 and E3. The discriminating power of cpDNA variation was particularly low for the cultivated olive tree with one predominating haplotype, but more diversity was detected in wild populations. Conclusions We propose a method for a rapid characterisation of the Mediterranean olive germplasm. The low variation in the cultivated olive tree indicated that the utility of cpDNA variation for forensic analyses is limited to rare haplotypes. In contrast, the high cpDNA variation in wild populations demonstrated that our markers may be useful for phylogeographic and populations genetic studies in O. europaea.

  9. Copy Number Variation of UGT 2B Genes in Indian Families Using Whole Genome Scans

    Directory of Open Access Journals (Sweden)

    Avinash M. Veerappa

    2016-01-01

    Full Text Available Background and Objectives. Uridine diphospho-glucuronosyltransferase 2B (UGT2B is a family of genes involved in metabolizing steroid hormones and several other xenobiotics. These UGT2B genes are highly polymorphic in nature and have distinct polymorphisms associated with specific regions around the globe. Copy number variations (CNVs status of UGT2B17 in Indian population is not known and their disease associations have been inconclusive. It was therefore of interest to investigate the CNV profile of UGT2B genes. Methods. We investigated the presence of CNVs in UGT2B genes in 31 members from eight Indian families using Affymetrix Genome-Wide Human SNP Array 6.0 chip. Results. Our data revealed >50% of the study members carried CNVs in UGT2B genes, of which 76% showed deletion polymorphism. CNVs were observed more in UGT2B17 (76.4% than in UGT2B15 (17.6%. Molecular network and pathway analysis found enrichment related to steroid metabolic process, carboxylesterase activity, and sequence specific DNA binding. Interpretation and Conclusion. We report the presence of UGT2B gene deletion and duplication polymorphisms in Indian families. Network analysis indicates the substitutive role of other possible genes in the UGT activity. The CNVs of UGT2B genes are very common in individuals indicating that the effect is neutral in causing any suspected diseases.

  10. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  11. 1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

    Science.gov (United States)

    Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

    2017-01-01

    The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network

  12. Identification of genome-wide copy number variations among diverse pig breeds using SNP genotyping arrays.

    Directory of Open Access Journals (Sweden)

    Jiying Wang

    Full Text Available Copy number variations (CNVs are important forms of genetic variation complementary to SNPs, and can be considered as promising markers for some phenotypic and economically important traits or diseases susceptibility in domestic animals. In the present study, we performed a genome-wide CNV identification in 14 individuals selected from diverse populations, including six types of Chinese indigenous breeds, one Asian wild boar population, as well as three modern commercial foreign breeds. We identified 63 CNVRs in total, which covered 9.98 Mb of polymorphic sequence and corresponded to 0.36% of the genome sequence. The length of these CNVRs ranged from 3.20 to 827.21 kb, with an average of 158.37 kb and a median of 97.85 kb. Functional annotation revealed these identified CNVR have important molecular function, and may play an important role in exploring the genetic basis of phenotypic variability and disease susceptibility among pigs. Additionally, to confirm these potential CNVRs, we performed qPCR for 12 randomly selected CNVRs and 8 of them (66.67% were confirmed successfully. CNVs detected in diverse populations herein are essential complementary to the CNV map in the pig genome, which provide an important resource for studies of genomic variation and the association between various economically important traits and CNVs.

  13. Genetic variation architecture of mitochondrial genome reveals the differentiation in Korean landrace and weedy rice.

    Science.gov (United States)

    Tong, Wei; He, Qiang; Park, Yong-Jin

    2017-03-03

    Mitochondrial genome variations have been detected despite the overall conservation of this gene content, which has been valuable for plant population genetics and evolutionary studies. Here, we describe mitochondrial variation architecture and our performance of a phylogenetic dissection of Korean landrace and weedy rice. A total of 4,717 variations across the mitochondrial genome were identified adjunct with 10 wild rice. Genetic diversity assessment revealed that wild rice has higher nucleotide diversity than landrace and/or weedy, and landrace rice has higher diversity than weedy rice. Genetic distance was suggestive of a high level of breeding between landrace and weedy rice, and the landrace showing a closer association with wild rice than weedy rice. Population structure and principal component analyses showed no obvious difference in the genetic backgrounds of landrace and weedy rice in mitochondrial genome level. Phylogenetic, population split, and haplotype network evaluations were suggestive of independent origins of the indica and japonica varieties. The origin of weedy rice is supposed to be more likely from cultivated rice rather than from wild rice in mitochondrial genome level.

  14. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Héloïse Bastide

    2013-06-01

    Full Text Available Various approaches can be applied to uncover the genetic basis of natural phenotypic variation, each with their specific strengths and limitations. Here, we use a replicated genome-wide association approach (Pool-GWAS to fine-scale map genomic regions contributing to natural variation in female abdominal pigmentation in Drosophila melanogaster, a trait that is highly variable in natural populations and highly heritable in the laboratory. We examined abdominal pigmentation phenotypes in approximately 8000 female European D. melanogaster, isolating 1000 individuals with extreme phenotypes. We then used whole-genome Illumina sequencing to identify single nucleotide polymorphisms (SNPs segregating in our sample, and tested these for associations with pigmentation by contrasting allele frequencies between replicate pools of light and dark individuals. We identify two small regions near the pigmentation genes tan and bric-à-brac 1, both corresponding to known cis-regulatory regions, which contain SNPs showing significant associations with pigmentation variation. While the Pool-GWAS approach suffers some limitations, its cost advantage facilitates replication and it can be applied to any non-model system with an available reference genome.

  15. Genetic variation architecture of mitochondrial genome reveals the differentiation in Korean landrace and weedy rice

    Science.gov (United States)

    Tong, Wei; He, Qiang; Park, Yong-Jin

    2017-01-01

    Mitochondrial genome variations have been detected despite the overall conservation of this gene content, which has been valuable for plant population genetics and evolutionary studies. Here, we describe mitochondrial variation architecture and our performance of a phylogenetic dissection of Korean landrace and weedy rice. A total of 4,717 variations across the mitochondrial genome were identified adjunct with 10 wild rice. Genetic diversity assessment revealed that wild rice has higher nucleotide diversity than landrace and/or weedy, and landrace rice has higher diversity than weedy rice. Genetic distance was suggestive of a high level of breeding between landrace and weedy rice, and the landrace showing a closer association with wild rice than weedy rice. Population structure and principal component analyses showed no obvious difference in the genetic backgrounds of landrace and weedy rice in mitochondrial genome level. Phylogenetic, population split, and haplotype network evaluations were suggestive of independent origins of the indica and japonica varieties. The origin of weedy rice is supposed to be more likely from cultivated rice rather than from wild rice in mitochondrial genome level. PMID:28256554

  16. Read clouds uncover variation in complex regions of the human genome.

    Science.gov (United States)

    Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

    2015-10-01

    Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.

  17. Analysis of Random Variation in Subthreshold FGMOSFET

    Directory of Open Access Journals (Sweden)

    Rawid Banchuin

    2016-01-01

    Full Text Available The analysis of random variation in the performance of Floating Gate Metal Oxide Semiconductor Field Effect Transistor (FGMOSFET which is an often cited semiconductor based electronic device, operated in the subthreshold region defined in terms of its drain current (ID, has been proposed in this research. ID is of interest because it is directly measurable and can be the basis for determining the others. All related manufacturing process induced device level random variations, their statistical correlations, and low voltage/low power operating condition have been taken into account. The analysis result has been found to be very accurate since it can fit the nanometer level SPICE BSIM4 based reference with very high accuracy. By using such result, the strategies for minimizing variation in ID can be found and the analysis of variation in the circuit level parameter of any subthreshold FGMOSFET based circuit can be performed. So, the result of this research has been found to be beneficial to the variability aware design of subthreshold FGMOSFET based circuit.

  18. Background selection as baseline for nucleotide variation across the Drosophila genome.

    Science.gov (United States)

    Comeron, Josep M

    2014-06-01

    The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future

  19. Background selection as baseline for nucleotide variation across the Drosophila genome.

    Directory of Open Access Journals (Sweden)

    Josep M Comeron

    2014-06-01

    Full Text Available The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS. Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and

  20. Resolution of Disease Phenotypes Resulting from Multilocus Genomic Variation.

    Science.gov (United States)

    Posey, Jennifer E; Harel, Tamar; Liu, Pengfei; Rosenfeld, Jill A; James, Regis A; Coban Akdemir, Zeynep H; Walkiewicz, Magdalena; Bi, Weimin; Xiao, Rui; Ding, Yan; Xia, Fan; Beaudet, Arthur L; Muzny, Donna M; Gibbs, Richard A; Boerwinkle, Eric; Eng, Christine M; Sutton, V Reid; Shaw, Chad A; Plon, Sharon E; Yang, Yaping; Lupski, James R

    2017-01-05

    Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes. We conducted a retrospective analysis of data from a series of 7374 consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology. A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P=1.77×10(-7)). In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes encoding proteins that interact within

  1. Comparative genomic analysis of soybean flowering genes.

    Directory of Open Access Journals (Sweden)

    Chol-Hee Jung

    Full Text Available Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant

  2. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    Directory of Open Access Journals (Sweden)

    Huajing Teng

    2016-07-01

    Full Text Available Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  3. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    Science.gov (United States)

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  4. Rare and common regulatory variation in population-scale sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Stephen B Montgomery

    2011-07-01

    Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

  5. Nucleotide diversity maps reveal variation in diversity among wheat genomes and chromosomes

    Directory of Open Access Journals (Sweden)

    McGuire Patrick E

    2010-12-01

    chromosomal regions. The net effect of these factors in T. aestivum is large variation in diversity among genomes and chromosomes, which impacts the development of SNP markers and their practical utility. Accumulation of new mutations in older polyploid species, such as wild emmer, results in increased diversity and its more uniform distribution across the genome.

  6. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  7. Cluster-based exposure variation analysis.

    Science.gov (United States)

    Samani, Afshin; Mathiassen, Svend Erik; Madeleine, Pascal

    2013-04-04

    Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. For this purpose, we simulated a repeated cyclic exposure varying within each cycle between "low" and "high" exposure levels in a "near" or "far" range, and with "low" or "high" velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a "small" or "large" standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity.Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p analysis are the advantages

  8. Variations and classification of toxic epitopes related to celiac disease among α-gliadin genes from four Aegilops genomes.

    Science.gov (United States)

    Li, Jie; Wang, Shunli; Li, Shanshan; Ge, Pei; Li, Xiaohui; Ma, Wujun; Zeller, F J; Hsam, Sai L K; Yan, Yueming

    2012-07-01

    The α-gliadins are associated with human celiac disease. A total of 23 noninterrupted full open reading frame α-gliadin genes and 19 pseudogenes were cloned and sequenced from C, M, N, and U genomes of four diploid Aegilops species. Sequence comparison of α-gliadin genes from Aegilops and Triticum species demonstrated an existence of extensive allelic variations in Gli-2 loci of the four Aegilops genomes. Specific structural features were found including the compositions and variations of two polyglutamine domains (QI and QII) and four T cell stimulatory toxic epitopes. The mean numbers of glutamine residues in the QI domain in C and N genomes and the QII domain in C, N, and U genomes were much higher than those in Triticum genomes, and the QI domain in C and N genomes and the QII domain in C, M, N, and U genomes displayed greater length variations. Interestingly, the types and numbers of four T cell stimulatory toxic epitopes in α-gliadins from the four Aegilops genomes were significantly less than those from Triticum A, B, D, and their progenitor genomes. Relationships between the structural variations of the two polyglutamine domains and the distributions of four T cell stimulatory toxic epitopes were found, resulting in the α-gliadin genes from the Aegilops and Triticum genomes to be classified into three groups.

  9. Natural selection affects multiple aspects of genetic variation at putatively peutral sites across the human genome

    DEFF Research Database (Denmark)

    Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui;

    2011-01-01

    throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has......A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries...... and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations...

  10. A biometrical genome search in rats reveals the multigenic basis of blood pressure variation.

    Science.gov (United States)

    Schork, N J; Krieger, J E; Trolliet, M R; Franchini, K G; Koike, G; Krieger, E M; Lander, E S; Dzau, V J; Jacob, H J

    1995-09-01

    A genome-wide search for multiple loci influencing salt-loaded systolic blood pressure (NaSBP) variation among 188 F2 progeny from a cross between the Brown-Norway and spontaneously hypertensive rat strains was pursued in an effort to gain insight into the polygenic basis of blood pressure regulation. The results suggest that loci within five to six genomic regions collectively explain approximately 43% of the total NaSBP variation exhibited among the 188 F2 progeny. Many of these loci are in regions that previous studies have not implicated in blood pressure regulation. Ultimately, however, this study not only sheds light on the multigenic basis of blood pressure but provides further evidence that the identification of the genetic determinants of polygenic traits in mammals is possible with modern biometrical and molecular genetic tools in controlled settings (i.e., breeding paradigm and model organism).

  11. Comparative population genomics of latitudinal variation in Drosophila simulans and Drosophila melanogaster.

    Science.gov (United States)

    Machado, Heather E; Bergland, Alan O; O'Brien, Katherine R; Behrman, Emily L; Schmidt, Paul S; Petrov, Dmitri A

    2016-02-01

    Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster. © 2015 John Wiley & Sons Ltd.

  12. Chromosome Numbers and Genome Size Variation in Indian Species of Curcuma (Zingiberaceae)

    Science.gov (United States)

    Leong-Škorničková, Jana; Šída, Otakar; Jarolímová, Vlasta; Sabu, Mamyil; Fér, Tomáš; Trávníček, Pavel; Suda, Jan

    2007-01-01

    Background and Aims Genome size and chromosome numbers are important cytological characters that significantly influence various organismal traits. However, geographical representation of these data is seriously unbalanced, with tropical and subtropical regions being largely neglected. In the present study, an investigation was made of chromosomal and genome size variation in the majority of Curcuma species from the Indian subcontinent, and an assessment was made of the value of these data for taxonomic purposes. Methods Genome size of 161 homogeneously cultivated plant samples classified into 51 taxonomic entities was determined by propidium iodide flow cytometry. Chromosome numbers were counted in actively growing root tips using conventional rapid squash techniques. Key Results Six different chromosome counts (2n = 22, 42, 63, >70, 77 and 105) were found, the last two representing new generic records. The 2C-values varied from 1·66 pg in C. vamana to 4·76 pg in C. oligantha, representing a 2·87-fold range. Three groups of taxa with significantly different homoploid genome sizes (Cx-values) and distinct geographical distribution were identified. Five species exhibited intraspecific variation in nuclear DNA content, reaching up to 15·1 % in cultivated C. longa. Chromosome counts and genome sizes of three Curcuma-like species (Hitchenia caulina, Kaempferia scaposa and Paracautleya bhatii) corresponded well with typical hexaploid (2n = 6x = 42) Curcuma spp. Conclusions The basic chromosome number in the majority of Indian taxa (belonging to subgenus Curcuma) is x = 7; published counts correspond to 6x, 9x, 11x, 12x and 15x ploidy levels. Only a few species-specific C-values were found, but karyological and/or flow cytometric data may support taxonomic decisions in some species alliances with morphological similarities. Close evolutionary relationships among some cytotypes are suggested based on the similarity in homoploid genome sizes and geographical grouping

  13. Variation in Genomic Methylation in Natural Populations of Chinese White Poplar

    OpenAIRE

    Kaifeng Ma; Yuepeng Song; Xiaohui Yang; Zhiyi Zhang; Deqiang Zhang

    2013-01-01

    BACKGROUND: It is thought that methylcytosine can be inherited through meiosis and mitosis, and that epigenetic variation may be under genetic control or correlation may be caused by neutral drift. However, DNA methylation also varies with tissue, developmental stage, and environmental factors. Eliminating these factors, we analyzed the levels and patterns, diversity and structure of genomic methylcytosine in the xylem of nine natural populations of Chinese white poplar. PRINCIPAL FINDINGS: O...

  14. Genome-wide association study identified CNP12587 region underlying height variation in Chinese females.

    Directory of Open Access Journals (Sweden)

    Yin-Ping Zhang

    Full Text Available INTRODUCTION: Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs in Chinese. METHODS: Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs. We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height. RESULTS: We detected 10 significant association signals for height (p<0.05 in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41 × 10(-4 was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048. Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L, was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators. CONCLUSIONS: Our findings suggest the important genetic variants underlying height variation in Chinese.

  15. Exploration of presence/absence variation and corresponding polymorphic markers in soybean genome

    Institute of Scientific and Technical Information of China (English)

    Yufeng Wang; Tuanjie Zhao; Junyi Gai; Jiangjie Lu; Shouyi Chen; Liping Shu; Reid GPalmer; Guangnan Xing; Yan Li; Shouping Yang; Deyue Yu

    2014-01-01

    This study was designed to reveal the genome-wide distribution of presence/absence variation (PAV) and to establish a database of polymorphic PAV markers in soybean. The 33 soybean whole-genome sequences were compared to each other with that of Wil iams 82 as a reference genome. A total of 33,127 PAVs were detected and 28,912 PAV markers with their primer sequences were designed as the database NJAUSoyPAV_1.0. The PAVs scattered on whole genome while only 518 (1.8%) over-lapped with simple sequence repeats (SSRs) in BARCSOYSSR_1.0 database. In a random sample of 800 PAVs, 713 (89.13%) showed polymorphism among the 12 differential genotypes. Using 126 PAVs and 108 SSRs to test a Chinese soybean germplasm col ection composed of 828 Glycine soja Sieb. et Zucc. and Glycine max (L.) Merr. accessions, the per locus al ele number and its variation appeared less in PAVs than in SSRs. The distinctness among al eles/bands of PCR (polymerase chain reaction) products showed better in PAVs than in SSRs, potential in accurate marker-assisted al ele selection. The association mapping results showed SSR þ PAV was more powerful than any single marker systems. The NJAUSoyPAV_1.0 database has enriched the source of PCR markers, and may fit the materials with a range of per locus al ele numbers, if jointly used with SSR markers.

  16. Single-Nucleotide Variations in Cardiac Arrhythmias: Prospects for Genomics and Proteomics Based Biomarker Discovery and Diagnostics

    Directory of Open Access Journals (Sweden)

    Ayman Abunimer

    2014-03-01

    Full Text Available Cardiovascular diseases are a large contributor to causes of early death in developed countries. Some of these conditions, such as sudden cardiac death and atrial fibrillation, stem from arrhythmias—a spectrum of conditions with abnormal electrical activity in the heart. Genome-wide association studies can identify single nucleotide variations (SNVs that may predispose individuals to developing acquired forms of arrhythmias. Through manual curation of published genome-wide association studies, we have collected a comprehensive list of 75 SNVs associated with cardiac arrhythmias. Ten of the SNVs result in amino acid changes and can be used in proteomic-based detection methods. In an effort to identify additional non-synonymous mutations that affect the proteome, we analyzed the post-translational modification S-nitrosylation, which is known to affect cardiac arrhythmias. We identified loss of seven known S-nitrosylation sites due to non-synonymous single nucleotide variations (nsSNVs. For predicted nitrosylation sites we found 1429 proteins where the sites are modified due to nsSNV. Analysis of the predicted S-nitrosylation dataset for over- or under-representation (compared to the complete human proteome of pathways and functional elements shows significant statistical over-representation of the blood coagulation pathway. Gene Ontology (GO analysis displays statistically over-represented terms related to muscle contraction, receptor activity, motor activity, cystoskeleton components, and microtubule activity. Through the genomic and proteomic context of SNVs and S-nitrosylation sites presented in this study, researchers can look for variation that can predispose individuals to cardiac arrhythmias. Such attempts to elucidate mechanisms of arrhythmia thereby add yet another useful parameter in predicting susceptibility for cardiac diseases.

  17. Expression, tandem repeat copy number variation and stability of four macrosatellite arrays in the human genome

    Directory of Open Access Journals (Sweden)

    Chadwick Brian P

    2010-11-01

    Full Text Available Abstract Background Macrosatellites are some of the largest variable number tandem repeats in the human genome, but what role these unusual sequences perform is unknown. Their importance to human health is clearly demonstrated by the 4q35 macrosatellite D4Z4 that is associated with the onset of the muscle degenerative disease facioscapulohumeral muscular dystrophy. Nevertheless, many other macrosatellite arrays in the human genome remain poorly characterized. Results Here we describe the organization, tandem repeat copy number variation, transmission stability and expression of four macrosatellite arrays in the human genome: the TAF11-Like array located on chromosomes 5p15.1, the SST1 arrays on 4q28.3 and 19q13.12, the PRR20 array located on chromosome 13q21.1, and the ZAV array at 9q32. All are polymorphic macrosatellite arrays that at least for TAF11-Like and SST1 show evidence of meiotic instability. With the exception of the SST1 array that is ubiquitously expressed, all are expressed at high levels in the testis and to a lesser extent in the brain. Conclusions Our results extend the number of characterized macrosatellite arrays in the human genome and provide the foundation for formulation of hypotheses to begin assessing their functional role in the human genome.

  18. The Genome of the Trinidadian Guppy, Poecilia reticulata, and Variation in the Guanapo Population

    Science.gov (United States)

    Künstner, Axel; Hoffmann, Margarete; Fraser, Bonnie A.; Kottler, Verena A.; Sharma, Eshita; Weigel, Detlef; Dreyer, Christine

    2016-01-01

    For over a century, the live bearing guppy, Poecilia reticulata, has been used to study sexual selection as well as local adaptation. Natural guppy populations differ in many traits that are of intuitively adaptive significance such as ornamentation, age at maturity, brood size and body shape. Water depth, light supply, food resources and predation regime shape these traits, and barrier waterfalls often separate contrasting environments in the same river. We have assembled and annotated the genome of an inbred single female from a high-predation site in the Guanapo drainage. The final assembly comprises 731.6 Mb with a scaffold N50 of 5.3 MB. Scaffolds were mapped to linkage groups, placing 95% of the genome assembly on the 22 autosomes and the X-chromosome. To investigate genetic variation in the population used for the genome assembly, we sequenced 10 wild caught male individuals. The identified 5 million SNPs correspond to an average nucleotide diversity (π) of 0.0025. The genome assembly and SNP map provide a rich resource for investigating adaptation to different predation regimes. In addition, comparisons with the genomes of other Poeciliid species, which differ greatly in mechanisms of sex determination and maternal resource allocation, as well as comparisons to other teleost genera can begin to reveal how live bearing evolved in teleost fish. PMID:28033408

  19. Variation in genome organization of the plant pathogenic fungus Colletotrichum lindemuthianum.

    Science.gov (United States)

    O'Sullivan, D; Tosi, P; Creusot, F; Cooke, B M; Phan, T H; Dron, M; Langin, T

    1998-04-01

    The genome structure of Colletotrichum lindemuthianum in a set of diverse isolates was investigated using a combination of physical and molecular approaches. Flow cytometric measurement of genome size revealed significant variation between strains, with the smallest genome representing 59% of the largest. Southern-blot profiles of a cloned fungal telomere revealed a total chromosome number varying from 9 to 12. Chromosome separations using pulsed-field gel electrophoresis (PFGE) showed that these chromosomes belong to two distinct size classes: a variable number of small (< 2.5 Mb) polymorphic chromosomes and a set of unresolved chromosomes larger than 7 Mb. Two dispersed repeat elements were shown to cluster on distinct polymorphic minichromosomes. Single-copy flanking sequences from these repeat-containing clones specifically marked distinct small chromosomes. These markers were absent in some strains, indicating that part of the observed variability in genome organization may be explained by the presence or absence, in a given strain, of dispensable genomic regions and/or chromosomes.

  20. Genome variations associated with viral susceptibility and calcification in Emiliania huxleyi.

    Directory of Open Access Journals (Sweden)

    Jessica U Kegel

    Full Text Available Emiliania huxleyi, a key player in the global carbon cycle is one of the best studied coccolithophores with respect to biogeochemical cycles, climatology, and host-virus interactions. Strains of E. huxleyi show phenotypic plasticity regarding growth behaviour, light-response, calcification, acidification, and virus susceptibility. This phenomenon is likely a consequence of genomic differences, or transcriptomic responses, to environmental conditions or threats such as viral infections. We used an E. huxleyi genome microarray based on the sequenced strain CCMP1516 (reference strain to perform comparative genomic hybridizations (CGH of 16 E. huxleyi strains of different geographic origin. We investigated the genomic diversity and plasticity and focused on the identification of genes related to virus susceptibility and coccolith production (calcification. Among the tested 31940 gene models a core genome of 14628 genes was identified by hybridization among 16 E. huxleyi strains. 224 probes were characterized as specific for the reference strain CCMP1516. Compared to the sequenced E. huxleyi strain CCMP1516 variation in gene content of up to 30 percent among strains was observed. Comparison of core and non-core transcripts sets in terms of annotated functions reveals a broad, almost equal functional coverage over all KOG-categories of both transcript sets within the whole annotated genome. Within the variable (non-core genome we identified genes associated with virus susceptibility and calcification. Genes associated with virus susceptibility include a Bax inhibitor-1 protein, three LRR receptor-like protein kinases, and mitogen-activated protein kinase. Our list of transcripts associated with coccolith production will stimulate further research, e.g. by genetic manipulation. In particular, the V-type proton ATPase 16 kDa proteolipid subunit is proposed to be a plausible target gene for further calcification studies.

  1. Whole genome sequence and analysis of the Marwari horse breed and its genetic origin.

    Science.gov (United States)

    Jun, JeHoon; Cho, Yun Sung; Hu, Haejin; Kim, Hak-Min; Jho, Sungwoong; Gadhvi, Priyvrat; Park, Kyung Mi; Lim, Jeongheui; Paek, Woon Kee; Han, Kyudong; Manica, Andrea; Edwards, Jeremy S; Bhak, Jong

    2014-01-01

    The horse (Equus ferus caballus) is one of the earliest domesticated species and has played an important role in the development of human societies over the past 5,000 years. In this study, we characterized the genome of the Marwari horse, a rare breed with unique phenotypic characteristics, including inwardly turned ear tips. It is thought to have originated from the crossbreeding of local Indian ponies with Arabian horses beginning in the 12th century. We generated 101 Gb (~30 × coverage) of whole genome sequences from a Marwari horse using the Illumina HiSeq2000 sequencer. The sequences were mapped to the horse reference genome at a mapping rate of ~98% and with ~95% of the genome having at least 10 × coverage. A total of 5.9 million single nucleotide variations, 0.6 million small insertions or deletions, and 2,569 copy number variation blocks were identified. We confirmed a strong Arabian and Mongolian component in the Marwari genome. Novel variants from the Marwari sequences were annotated, and were found to be enriched in olfactory functions. Additionally, we suggest a potential functional genetic variant in the TSHZ1 gene (p.Ala344>Val) associated with the inward-turning ear tip shape of the Marwari horses. Here, we present an analysis of the Marwari horse genome. This is the first genomic data for an Asian breed, and is an invaluable resource for future studies of genetic variation associated with phenotypes and diseases in horses.

  2. A novel technique for measuring variations in DNA copy-number: competitive genomic polymerase chain reaction

    Directory of Open Access Journals (Sweden)

    Nakagawara Akira

    2007-07-01

    Full Text Available Background Changes in genomic copy number occur in many human diseases including cancer. Characterization of these changes is important for both basic understanding and diagnosis of these diseases. Microarrays have recently become the standard technique and are commercially available. However, it is useful to have an affordable technique to complement them. Results We describe a novel polymerase chain reaction (PCR-based technique, termed competitive genomic PCR (CGP. The main characteristic of CGP is that different adaptors are added to the sample and control genomic DNAs after appropriate restriction enzyme digestion. These adaptor-supplemented DNAs are subjected to competitive PCR using an adaptor-primer and a locus-specific primer. The amplified products are then separated according to size differences between the adaptors. CGP eliminates the tedious steps inherent in quantitative PCR and achieves moderate throughput. Assays with different X chromosome numbers showed that it can provide accurate quantification. High-resolution analysis of neuroblastoma cell lines around the MYCN locus revealed novel junctions for amplification, which were not detected by a commercial array. Conclusion CGP is a moderate throughput technique for analyzing changes in genomic copy numbers. Because CGP can measure any genomic locus using PCR primers, it is especially useful for detailed analysis of a genomic region of interest.

  3. A genome-wide investigation of copy number variation in patients with sporadic brain arteriovenous malformation.

    Directory of Open Access Journals (Sweden)

    Nasrine Bendjilali

    Full Text Available BACKGROUND: Brain arteriovenous malformations (BAVM are clusters of abnormal blood vessels, with shunting of blood from the arterial to venous circulation and a high risk of rupture and intracranial hemorrhage. Most BAVMs are sporadic, but also occur in patients with Hereditary Hemorrhagic Telangiectasia, a Mendelian disorder caused by mutations in genes in the transforming growth factor beta (TGFβ signaling pathway. METHODS: To investigate whether copy number variations (CNVs contribute to risk of sporadic BAVM, we performed a genome-wide association study in 371 sporadic BAVM cases and 563 healthy controls, all Caucasian. Cases and controls were genotyped using the Affymetrix 6.0 array. CNVs were called using the PennCNV and Birdsuite algorithms and analyzed via segment-based and gene-based approaches. Common and rare CNVs were evaluated for association with BAVM. RESULTS: A CNV region on 1p36.13, containing the neuroblastoma breakpoint family, member 1 gene (NBPF1, was significantly enriched with duplications in BAVM cases compared to controls (P = 2.2×10(-9; NBPF1 was also significantly associated with BAVM in gene-based analysis using both PennCNV and Birdsuite. We experimentally validated the 1p36.13 duplication; however, the association did not replicate in an independent cohort of 184 sporadic BAVM cases and 182 controls (OR = 0.81, P = 0.8. Rare CNV analysis did not identify genes significantly associated with BAVM. CONCLUSION: We did not identify common CNVs associated with sporadic BAVM that replicated in an independent cohort. Replication in larger cohorts is required to elucidate the possible role of common or rare CNVs in BAVM pathogenesis.

  4. Human and mouse genome analysis using array comparative genomic hybridization

    NARCIS (Netherlands)

    Snijders, Antoine Maria

    2004-01-01

    Almost all human cancers as well as developmental abnormalities are characterized by the presence of genetic alterations, most of which target a gene or a particular genomic locus resulting in altered gene expression and ultimately an altered phenotype. Different types of genetic alterations include

  5. Genome organization and variation in the 3′-partial sequence of garlic latent virus in China

    Institute of Scientific and Technical Information of China (English)

    陈炯; 郑红英; 陈剑平; 杨崇良

    2002-01-01

    Ten different isolates of a carlavirus were detected by degenerate PCR from 12 garlic samples collected from 6 provinces in China, and the complete genome sequence of the Zhejiang isolate ZJ1 and 3′-terminal sequences of 9 other isolates were determined. The RNA genome of isolate ZJ1 consisted of 8363nts excluding the 3′-poly (A) tail, and the genome organization was similar to other carlaviruses with 6 open reading frames encoding a replicase, TGB1, TGB2, TGB3, CP and NABP respectively. Sequence comparisons showed that all 10 isolates were Garlic latent virus (GarLV). The variations in the TGB2, TGB3 and NABP were more significant than those in the CP. High homology was also detected between those isolates and Shallot latent virus (ShLV). Phylogenetic analysis suggested that GarLV isolates from garlic can be divided into 4 main groups and Chinese isolates belonged to each group. This is the first reported molecular analysis of members of the genus Carlavirus in China.

  6. Antigen-presenting genes and genomic copy number variations in the Tasmanian devil MHC

    Directory of Open Access Journals (Sweden)

    Cheng Yuanyuan

    2012-03-01

    Full Text Available Abstract Background The Tasmanian devil (Sarcophilus harrisii is currently under threat of extinction due to an unusual fatal contagious cancer called Devil Facial Tumour Disease (DFTD. DFTD is caused by a clonal tumour cell line that is transmitted between unrelated individuals as an allograft without triggering immune rejection due to low levels of Major Histocompatibility Complex (MHC diversity in Tasmanian devils. Results Here we report the characterization of the genomic regions encompassing MHC Class I and Class II genes in the Tasmanian devil. Four genomic regions approximately 960 kb in length were assembled and annotated using BAC contigs and physically mapped to devil Chromosome 4q. 34 genes and pseudogenes were identified, including five Class I and four Class II loci. Interestingly, when two haplotypes from two individuals were compared, three genomic copy number variants with sizes ranging from 1.6 to 17 kb were observed within the classical Class I gene region. One deletion is particularly important as it turns a Class Ia gene into a pseudogene in one of the haplotypes. This deletion explains the previously observed variation in the Class I allelic number between individuals. The frequency of this deletion is highest in the northwestern devil population and lowest in southeastern areas. Conclusions The third sequenced marsupial MHC provides insights into the evolution of this dynamic genomic region among the diverse marsupial species. The two sequenced devil MHC haplotypes revealed three copy number variations that are likely to significantly affect immune response and suggest that future work should focus on the role of copy number variations in disease susceptibility in this species.

  7. Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis

    Directory of Open Access Journals (Sweden)

    Fowler Katie E

    2009-08-01

    Full Text Available Abstract Background The availability of the complete chicken (Gallus gallus genome sequence as well as a large number of chicken probes for fluorescent in-situ hybridization (FISH and microarray resources facilitate comparative genomic studies between chicken and other bird species. In a previous study, we provided a comprehensive cytogenetic map for the turkey (Meleagris gallopavo and the first analysis of copy number variants (CNVs in birds. Here, we extend this approach to the Pekin duck (Anas platyrhynchos, an obvious target for comparative genomic studies due to its agricultural importance and resistance to avian flu. Results We provide a detailed molecular cytogenetic map of the duck genome through FISH assignment of 155 chicken clones. We identified one inter- and six intrachromosomal rearrangements between chicken and duck macrochromosomes and demonstrated conserved synteny among all microchromosomes analysed. Array comparative genomic hybridisation revealed 32 CNVs, of which 5 overlap previously designated "hotspot" regions between chicken and turkey. Conclusion Our results suggest extensive conservation of avian genomes across 90 million years of evolution in both macro- and microchromosomes. The data on CNVs between chicken and duck extends previous analyses in chicken and turkey and supports the hypotheses that avian genomes contain fewer CNVs than mammalian genomes and that genomes of evolutionarily distant species share regions of copy number variation ("CNV hotspots". Our results will expedite duck genomics, assist marker development and highlight areas of interest for future evolutionary and functional studies.

  8. MicroRNAs and genomic variations: from Proteus tricks to Prometheus gift.

    Science.gov (United States)

    Fabbri, Muller; Valeri, Nicola; Calin, George A

    2009-06-01

    MicroRNAs (miRNAs) are small non-coding RNAs with regulatory functions. MiRNAs are aberrantly expressed in almost all human cancers, leading to abnormal levels of target genes. Recently, an increasing number of studies have addressed whether genomic variations including germ line or somatic mutations and single-nucleotide polymorphisms can count for miRNA abnormal expression by altering their biogenesis and/or affect the ability of miRNAs to bind to target messenger RNAs. Here, we provide an extensive review of the studies that have investigated variations occurring both in miRNA genes and in target genes and we discuss the possible clinical implications of these findings. Furthermore, we propose that sequence variations in miRNAs or interactor sites located in mRNAs can be involved in cancer predisposition.

  9. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Directory of Open Access Journals (Sweden)

    Gurusamy Raman

    Full Text Available Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC region (82,805 bp, with some variations in the inverted repeat region A (IRA/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19 was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA and ribosomal protein subunit L23 (rpl23 genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  10. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Science.gov (United States)

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  11. Insights into the Dekkera bruxellensis genomic landscape: comparative genomics reveals variations in ploidy and nutrient utilisation potential amongst wine isolates.

    Directory of Open Access Journals (Sweden)

    Anthony R Borneman

    2014-02-01

    Full Text Available The yeast Dekkera bruxellensis is a major contaminant of industrial fermentations, such as those used for the production of biofuel and wine, where it outlasts and, under some conditions, outcompetes the major industrial yeast Saccharomyces cerevisiae. In order to investigate the level of inter-strain variation that is present within this economically important species, the genomes of four diverse D. bruxellensis isolates were compared. While each of the four strains was shown to contain a core diploid genome, which is clearly sufficient for survival, two of the four isolates have a third haploid complement of chromosomes. The sequences of these additional haploid genomes were both highly divergent from those comprising the diploid core and divergent between the two triploid strains. Similar to examples in the Saccharomyces spp. clade, where some allotriploids have arisen on the basis of enhanced ability to survive a range of environmental conditions, it is likely these strains are products of two independent hybridisation events that may have involved multiple species or distinct sub-species of Dekkera. Interestingly these triploid strains represent the vast majority (92% of isolates from across the Australian wine industry, suggesting that the additional set of chromosomes may confer a selective advantage in winery environments that has resulted in these hybrid strains all-but replacing their diploid counterparts in Australian winery settings. In addition to the apparent inter-specific hybridisation events, chromosomal aberrations such as strain-specific insertions and deletions and loss-of-heterozygosity by gene conversion were also commonplace. While these events are likely to have affected many phenotypes across these strains, we have been able to link a specific deletion to the inability to utilise nitrate by some strains of D. bruxellensis, a phenotype that may have direct impacts in the ability for these strains to compete with S

  12. Insights into the Dekkera bruxellensis genomic landscape: comparative genomics reveals variations in ploidy and nutrient utilisation potential amongst wine isolates.

    Science.gov (United States)

    Borneman, Anthony R; Zeppel, Ryan; Chambers, Paul J; Curtin, Chris D

    2014-02-01

    The yeast Dekkera bruxellensis is a major contaminant of industrial fermentations, such as those used for the production of biofuel and wine, where it outlasts and, under some conditions, outcompetes the major industrial yeast Saccharomyces cerevisiae. In order to investigate the level of inter-strain variation that is present within this economically important species, the genomes of four diverse D. bruxellensis isolates were compared. While each of the four strains was shown to contain a core diploid genome, which is clearly sufficient for survival, two of the four isolates have a third haploid complement of chromosomes. The sequences of these additional haploid genomes were both highly divergent from those comprising the diploid core and divergent between the two triploid strains. Similar to examples in the Saccharomyces spp. clade, where some allotriploids have arisen on the basis of enhanced ability to survive a range of environmental conditions, it is likely these strains are products of two independent hybridisation events that may have involved multiple species or distinct sub-species of Dekkera. Interestingly these triploid strains represent the vast majority (92%) of isolates from across the Australian wine industry, suggesting that the additional set of chromosomes may confer a selective advantage in winery environments that has resulted in these hybrid strains all-but replacing their diploid counterparts in Australian winery settings. In addition to the apparent inter-specific hybridisation events, chromosomal aberrations such as strain-specific insertions and deletions and loss-of-heterozygosity by gene conversion were also commonplace. While these events are likely to have affected many phenotypes across these strains, we have been able to link a specific deletion to the inability to utilise nitrate by some strains of D. bruxellensis, a phenotype that may have direct impacts in the ability for these strains to compete with S. cerevisiae.

  13. Poly(T) variation in heteroderid nematode mitochondrial genomes is predominantly an artefact of amplification.

    Science.gov (United States)

    Riepsamen, Angelique H; Gibson, Tracey; Rowe, Janet; Chitwood, David J; Subbotin, Sergei A; Dowton, Mark

    2011-02-01

    We assessed the rate of in vitro polymerase errors at polythymidine [poly(T)] tracts in the mitochondrial DNA (mtDNA) of a heteroderid nematode (Heterodera cajani). The mtDNA of these nematodes contain unusually high numbers of poly(T) tracts, and have previously been suggested to contain biological poly(T) length variation. However, using a cloned molecule, we observed that poly(T) variation was generated in vitro at regions containing more than six consecutive Ts. This artefactual error rate was estimated at 7.3 × 10(-5) indels/poly(T) tract >6 Ts/cycle. This rate was then compared to the rate of poly(T) variation detected after the amplification of a biological sample, in order to estimate the 'biological + artefactual' rate of poly(T) variation. There was no significant difference between the artefactual and the artefactual + biological rates, suggesting that the majority of poly(T) variation in the biological sample was artefactual. We then examined the generation of poly(T) variation in a range of templates with tracts up to 16 Ts long, utilizing a range of Heteroderidae species. We observed that T deletions occurred five times more frequently than insertions, and a trend towards increasing error rates with increasing poly(T) tract length. These findings have significant implications for studies involving genomes with many homopolymer tracts.

  14. Genomic landscape of copy number variation and copy neutral loss of heterozygosity events in equine sarcoids reveals increased instability of the sarcoid genome.

    Science.gov (United States)

    Pawlina-Tyszko, Klaudia; Gurgul, Artur; Szmatoła, Tomasz; Ropka-Molik, Katarzyna; Semik-Gurgul, Ewelina; Klukowska-Rötzler, Jolanta; Koch, Christoph; Mählmann, Kathrin; Bugno-Poniewierska, Monika

    2017-09-01

    Although they are the most common neoplasms in equids, sarcoids are not fully characterized at the molecular level. Therefore, the objective of this study was to characterize the landscape of structural rearrangements, such as copy number variation (CNV) and copy neutral loss of heterozygosity (cnLOH), in the genomes of sarcoid tumor cells. This information will not only broaden our understanding of the characteristics of this genome but will also improve the general knowledge of this tumor and the mechanisms involved in its generation. To this end, Equine SNP64K Illumina microarrays were applied along with bioinformatics tools dedicated for signal intensity analysis. The analysis revealed increased instability of the genome of sarcoid cells compared with unaltered skin tissue samples, which was manifested by the prevalence of CNV and cnLOH events. Many of the identified CNVs overlapped with the other research results, but the simultaneously observed variability in the number and sizes of detected aberrations indicated a need for further studies and the development of more reliable bioinformatics algorithms. The functional analysis of genes co-localized with the identified aberrations revealed that these genes are engaged in vital cellular processes. In addition, a number of these genes directly contribute to neoplastic transformation. Furthermore, large numbers of cnLOH events identified in the sarcoids suggested that they may play no less significant roles than CNVs in the carcinogenesis of this tumor. Thus, our results indicate the importance of cnLOH and CNV in equine sarcoid oncogenesis and present a direction of future research. Copyright © 2017 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.

  15. Enhancing genomics information retrieval through dimensional analysis.

    Science.gov (United States)

    Hu, Qinmin; Huang, Jimmy Xiangji

    2013-06-01

    We propose a novel dimensional analysis approach to employing meta information in order to find the relationships within the unstructured or semi-structured document/passages for improving genomics information retrieval performance. First, we make use of the auxiliary information as three basic dimensions, namely "temporal", "journal", and "author". The reference section is treated as a commensurable quantity of the three basic dimensions. Then, the sample space and subspaces are built up and a set of events are defined to meet the basic requirement of dimensional homogeneity to be commensurable quantities. After that, the classic graph analysis algorithm in the Web environments is applied on each dimension respectively to calculate the importance of each dimension. Finally, we integrate all the dimension networks and re-rank the outputs for evaluation. Our experimental results show the proposed approach is superior and promising.

  16. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    cells are capable of regulating their gene expression, so that each cell can only express a particular set of genes yielding limited numbers of proteins with specialized functions. Therefore a rigid control of differential gene expression is necessary for cellular diversity. On the other hand, aberrant...... gene regulation will disrupt the cell’s fundamental processes, which in turn can cause disease. Hence, understanding gene regulation is essential for deciphering the code of life. Along with the development of high throughput sequencing (HTS) technology and the subsequent large-scale data analysis......, genome-wide assays have increased our understanding of gene regulation significantly. This thesis describes the integration and analysis of HTS data across different important aspects of gene regulation. Gene expression can be regulated at different stages when the genetic information is passed from gene...

  17. Infection and inflammation in schizophrenia and bipolar disorder: a genome wide study for interactions with genetic variation.

    Directory of Open Access Journals (Sweden)

    Dimitrios Avramopoulos

    Full Text Available Inflammation and maternal or fetal infections have been suggested as risk factors for schizophrenia (SZ and bipolar disorder (BP. It is likely that such environmental effects are contingent on genetic background. Here, in a genome-wide approach, we test the hypothesis that such exposures increase the risk for SZ and BP and that the increase is dependent on genetic variants. We use genome-wide genotype data, plasma IgG antibody measurements against Toxoplasma gondii, Herpes simplex virus type 1, Cytomegalovirus, Human Herpes Virus 6 and the food antigen gliadin as well as measurements of C-reactive protein (CRP, a peripheral marker of inflammation. The subjects are SZ cases, BP cases, parents of cases and screened controls. We look for higher levels of our immunity/infection variables and interactions between them and common genetic variation genome-wide. We find many of the antibody measurements higher in both disorders. While individual tests do not withstand correction for multiple comparisons, the number of nominally significant tests and the comparisons showing the expected direction are in significant excess (permutation p=0.019 and 0.004 respectively. We also find CRP levels highly elevated in SZ, BP and the mothers of BP cases, in agreement with existing literature, but possibly confounded by our inability to correct for smoking or body mass index. In our genome-wide interaction analysis no signal reached genome-wide significance, yet many plausible candidate genes emerged. In a hypothesis driven test, we found multiple interactions among SZ-associated SNPs in the HLA region on chromosome 6 and replicated an interaction between CMV infection and genotypes near the CTNNA3 gene reported by a recent GWAS. Our results support that inflammatory processes and infection may modify the risk for psychosis and suggest that the genotype at SZ-associated HLA loci modifies the effect of these variables on the risk to develop SZ.

  18. Infection and inflammation in schizophrenia and bipolar disorder: a genome wide study for interactions with genetic variation.

    Science.gov (United States)

    Avramopoulos, Dimitrios; Pearce, Brad D; McGrath, John; Wolyniec, Paula; Wang, Ruihua; Eckart, Nicole; Hatzimanolis, Alexandros; Goes, Fernando S; Nestadt, Gerald; Mulle, Jennifer; Coneely, Karen; Hopkins, Myfanwy; Ruczinski, Ingo; Yolken, Robert; Pulver, Ann E

    2015-01-01

    Inflammation and maternal or fetal infections have been suggested as risk factors for schizophrenia (SZ) and bipolar disorder (BP). It is likely that such environmental effects are contingent on genetic background. Here, in a genome-wide approach, we test the hypothesis that such exposures increase the risk for SZ and BP and that the increase is dependent on genetic variants. We use genome-wide genotype data, plasma IgG antibody measurements against Toxoplasma gondii, Herpes simplex virus type 1, Cytomegalovirus, Human Herpes Virus 6 and the food antigen gliadin as well as measurements of C-reactive protein (CRP), a peripheral marker of inflammation. The subjects are SZ cases, BP cases, parents of cases and screened controls. We look for higher levels of our immunity/infection variables and interactions between them and common genetic variation genome-wide. We find many of the antibody measurements higher in both disorders. While individual tests do not withstand correction for multiple comparisons, the number of nominally significant tests and the comparisons showing the expected direction are in significant excess (permutation p=0.019 and 0.004 respectively). We also find CRP levels highly elevated in SZ, BP and the mothers of BP cases, in agreement with existing literature, but possibly confounded by our inability to correct for smoking or body mass index. In our genome-wide interaction analysis no signal reached genome-wide significance, yet many plausible candidate genes emerged. In a hypothesis driven test, we found multiple interactions among SZ-associated SNPs in the HLA region on chromosome 6 and replicated an interaction between CMV infection and genotypes near the CTNNA3 gene reported by a recent GWAS. Our results support that inflammatory processes and infection may modify the risk for psychosis and suggest that the genotype at SZ-associated HLA loci modifies the effect of these variables on the risk to develop SZ.

  19. Population-Genomic Insights into Variation in Prevotella intermedia and Prevotella nigrescens Isolates and Its Association with Periodontal Disease

    Directory of Open Access Journals (Sweden)

    Yifei Zhang

    2017-09-01

    Full Text Available High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia. The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as “crucial substances” in subgingival plaque, which may

  20. An initial comparative map of copy number variations in the goat (Capra hircus genome

    Directory of Open Access Journals (Sweden)

    Casadio Rita

    2010-11-01

    Full Text Available Abstract Background The goat (Capra hircus represents one of the most important farm animal species. It is reared in all continents with an estimated world population of about 800 million of animals. Despite its importance, studies on the goat genome are still in their infancy compared to those in other farm animal species. Comparative mapping between cattle and goat showed only a few rearrangements in agreement with the similarity of chromosome banding. We carried out a cross species cattle-goat array comparative genome hybridization (aCGH experiment in order to identify copy number variations (CNVs in the goat genome analysing animals of different breeds (Saanen, Camosciata delle Alpi, Girgentana, and Murciano-Granadina using a tiling oligonucleotide array with ~385,000 probes designed on the bovine genome. Results We identified a total of 161 CNVs (an average of 17.9 CNVs per goat, with the largest number in the Saanen breed and the lowest in the Camosciata delle Alpi goat. By aggregating overlapping CNVs identified in different animals we determined CNV regions (CNVRs: on the whole, we identified 127 CNVRs covering about 11.47 Mb of the virtual goat genome referred to the bovine genome (0.435% of the latter genome. These 127 CNVRs included 86 loss and 41 gain and ranged from about 24 kb to about 1.07 Mb with a mean and median equal to 90,292 bp and 49,530 bp, respectively. To evaluate whether the identified goat CNVRs overlap with those reported in the cattle genome, we compared our results with those obtained in four independent cattle experiments. Overlapping between goat and cattle CNVRs was highly significant (P Conclusions We describe a first map of goat CNVRs. This provides information on a comparative basis with the cattle genome by identifying putative recurrent interspecies CNVs between these two ruminant species. Several goat CNVs affect genes with important biological functions. Further studies are needed to evaluate the

  1. Large scale copy number variation (CNV at 14q12 is associated with the presence of genomic abnormalities in neoplasia

    Directory of Open Access Journals (Sweden)

    Turley Stefanie

    2006-06-01

    Full Text Available Abstract Background Advances made in the area of microarray comparative genomic hybridization (aCGH have enabled the interrogation of the entire genome at a previously unattainable resolution. This has lead to the discovery of a novel class of alternative entities called large-scale copy number variations (CNVs. These CNVs are often found in regions of closely linked sequence homology called duplicons that are thought to facilitate genomic rearrangements in some classes of neoplasia. Recently, it was proposed that duplicons located near the recurrent translocation break points on chromosomes 9 and 22 in chronic myeloid leukemia (CML may facilitate this tumor-specific translocation. Furthermore, ~15–20% of CML patients also carry a microdeletion on the derivative 9 chromosome (der(9 and these patients have a poor prognosis. It has been hypothesised that der(9 deletion patients have increased levels of chromosomal instability. Results In this study aCGH was performed and identified a CNV (RP11-125A5, hereafter called CNV14q12 that was present as a genomic gain or loss in 10% of control DNA samples derived from cytogenetically normal individuals. CNV14q12 was the same clone identified by Iafrate et al. as a CNV. Real-time polymerase chain reaction (Q-PCR was used to determine the relative frequency of this CNV in DNA from a series of 16 CML patients (both with and without a der(9 deletion together with DNA derived from 36 paediatric solid tumors in comparison to the incidence of CNV in control DNA. CNV14q12 was present in ~50% of both tumor and CML DNA, but was found in 72% of CML bearing a der(9 microdeletion. Chi square analysis found a statistically significant difference (p ≤ 0.001 between the incidence of this CNV in cancer and normal DNA and a slightly increased incidence in CML with deletions in comparison to those CML without a detectable deletion. Conclusion The increased incidence of CNV14q12 in tumor samples suggests that either

  2. Genome size and phenotypic variation of Nymphaea (Nymphaeaceae species from Eastern Europe and temperate Asia

    Directory of Open Access Journals (Sweden)

    Magdalena Anna Dąbrowska

    2015-07-01

    Full Text Available Despite long-term research, the aquatic genus Nymphaea still possesses major taxonomic challenges. High phenotypic plasticity and possible interspecific hybridization often make it impossible to identify individual specimens. The main aim of this study was to assess phenotypic variation in Nymphaea taxa sampled over a wide area of Eastern Europe and temperate Asia. Samples were identified based on species-specific genome sizes and diagnostic morphological characters for each taxon were then selected. A total of 353 specimens from 32 populations in Poland, Russia and Ukraine were studied, with nine biometric traits being examined. Although some specimens morphologically matched N. ×borealis (a hybrid between N. alba and N. candida according to published determination keys, only one hybrid individual was revealed based on genome size data. Other specimens with intermediate morphology possessed genome size corresponding to N. alba, N. candida or N. tetragona. This indicates that natural hybridization between N. alba and N. candida is not as frequent as previously suggested. Our results also revealed a considerably higher variation in the studied morphological traits (especially the quantitative ones in N. alba and N. candida than reported in the literature. A determination key for the investigated Nymphaea species is provided, based on taxonomically-informative morphological characters identified in our study.

  3. Achilles' heel of pluripotent stem cells: genetic, genomic and epigenetic variations during prolonged culture.

    Science.gov (United States)

    Rebuzzini, Paola; Zuccotti, Maurizio; Redi, Carlo Alberto; Garagna, Silvia

    2016-07-01

    Pluripotent stem cells differentiate into almost any specialized adult cell type of an organism. PSCs can be derived either from the inner cell mass of a blastocyst-giving rise to embryonic stem cells-or after reprogramming of somatic terminally differentiated cells to obtain ES-like cells, named induced pluripotent stem cells. The potential use of these cells in the clinic, for investigating in vitro early embryonic development or for screening the effects of new drugs or xenobiotics, depends on capability to maintain their genome integrity during prolonged culture and differentiation. Both human and mouse PSCs are prone to genomic and (epi)genetic instability during in vitro culture, a feature that seriously limits their real potential use. Culture-induced variations of specific chromosomes or genes, are almost all unpredictable and, as a whole, differ among independent cell lines. They may arise at different culture passages, suggesting the absence of a safe passage number maintaining genome integrity and rendering the control of genomic stability mandatory since the very early culture passages. The present review highlights the urgency for further studies on the mechanisms involved in determining (epi)genetic and chromosome instability, exploiting the knowledge acquired earlier on other cell types.

  4. Analysis of copy number variations in Holstein cows identify potential mechanisms contributing to differences in residual feed intake

    Science.gov (United States)

    Genomic structural variation is an important and abundant source of genetic and phenotypic variation. In this study, we performed an initial analysis of CNVs using BovineHD SNP genotyping data from 147 Holstein cows identified as having high or low feed efficiency as estimated by residual feed intak...

  5. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  6. Genome Data Exploration Using Correspondence Analysis.

    Science.gov (United States)

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  7. A genome-wide survey of genetic variation in gorillas using reduced representation sequencing.

    Directory of Open Access Journals (Sweden)

    Aylwyn Scally

    Full Text Available All non-human great apes are endangered in the wild, and it is therefore important to gain an understanding of their demography and genetic diversity. Whole genome assembly projects have provided an invaluable foundation for understanding genetics in all four genera, but to date genetic studies of multiple individuals within great ape species have largely been confined to mitochondrial DNA and a small number of other loci. Here, we present a genome-wide survey of genetic variation in gorillas using a reduced representation sequencing approach, focusing on the two lowland subspecies. We identify 3,006,670 polymorphic sites in 14 individuals: 12 western lowland gorillas (Gorilla gorilla gorilla and 2 eastern lowland gorillas (Gorilla beringei graueri. We find that the two species are genetically distinct, based on levels of heterozygosity and patterns of allele sharing. Focusing on the western lowland population, we observe evidence for population substructure, and a deficit of rare genetic variants suggesting a recent episode of population contraction. In western lowland gorillas, there is an elevation of variation towards telomeres and centromeres on the chromosomal scale. On a finer scale, we find substantial variation in genetic diversity, including a marked reduction close to the major histocompatibility locus, perhaps indicative of recent strong selection there. These findings suggest that despite their maintaining an overall level of genetic diversity equal to or greater than that of humans, population decline, perhaps associated with disease, has been a significant factor in recent and long-term pressures on wild gorilla populations.

  8. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species.

    Science.gov (United States)

    Hess, Jon E; Campbell, Nathan R; Close, David A; Docker, Margaret F; Narum, Shawn R

    2013-06-01

    Unlike most anadromous fishes that have evolved strict homing behaviour, Pacific lamprey (Entosphenus tridentatus) seem to lack philopatry as evidenced by minimal population structure across the species range. Yet unexplained findings of within-region population genetic heterogeneity coupled with the morphological and behavioural diversity described for the species suggest that adaptive genetic variation underlying fitness traits may be responsible. We employed restriction site-associated DNA sequencing to genotype 4439 quality filtered single nucleotide polymorphism (SNP) loci for 518 individuals collected across a broad geographical area including British Columbia, Washington, Oregon and California. A subset of putatively neutral markers (N = 4068) identified a significant amount of variation among three broad populations: northern British Columbia, Columbia River/southern coast and 'dwarf' adults (F(CT) = 0.02, P ≪ 0.001). Additionally, 162 SNPs were identified as adaptive through outlier tests, and inclusion of these markers revealed a signal of adaptive variation related to geography and life history. The majority of the 162 adaptive SNPs were not independent and formed four groups of linked loci. Analyses with matsam software found that 42 of these outlier SNPs were significantly associated with geography, run timing and dwarf life history, and 27 of these 42 SNPs aligned with known genes or highly conserved genomic regions using the genome browser available for sea lamprey. This study provides both neutral and adaptive context for observed genetic divergence among collections and thus reconciles previous findings of population genetic heterogeneity within a species that displays extensive gene flow.

  9. Copy number variation is a fundamental aspect of the placental genome.

    Science.gov (United States)

    Hannibal, Roberta L; Chuong, Edward B; Rivera-Mulia, Juan Carlos; Gilbert, David M; Valouev, Anton; Baker, Julie C

    2014-05-01

    Discovery of lineage-specific somatic copy number variation (CNV) in mammals has led to debate over whether CNVs are mutations that propagate disease or whether they are a normal, and even essential, aspect of cell biology. We show that 1,000 N polyploid trophoblast giant cells (TGCs) of the mouse placenta contain 47 regions, totaling 138 Megabases, where genomic copies are underrepresented (UR). UR domains originate from a subset of late-replicating heterochromatic regions containing gene deserts and genes involved in cell adhesion and neurogenesis. While lineage-specific CNVs have been identified in mammalian cells, classically in the immune system where V(D)J recombination occurs, we demonstrate that CNVs form during gestation in the placenta by an underreplication mechanism, not by recombination nor deletion. Our results reveal that large scale CNVs are a normal feature of the mammalian placental genome, which are regulated systematically during embryogenesis and are propagated by a mechanism of underreplication.

  10. Copy number variation is a fundamental aspect of the placental genome.

    Directory of Open Access Journals (Sweden)

    Roberta L Hannibal

    2014-05-01

    Full Text Available Discovery of lineage-specific somatic copy number variation (CNV in mammals has led to debate over whether CNVs are mutations that propagate disease or whether they are a normal, and even essential, aspect of cell biology. We show that 1,000 N polyploid trophoblast giant cells (TGCs of the mouse placenta contain 47 regions, totaling 138 Megabases, where genomic copies are underrepresented (UR. UR domains originate from a subset of late-replicating heterochromatic regions containing gene deserts and genes involved in cell adhesion and neurogenesis. While lineage-specific CNVs have been identified in mammalian cells, classically in the immune system where V(DJ recombination occurs, we demonstrate that CNVs form during gestation in the placenta by an underreplication mechanism, not by recombination nor deletion. Our results reveal that large scale CNVs are a normal feature of the mammalian placental genome, which are regulated systematically during embryogenesis and are propagated by a mechanism of underreplication.

  11. Illumina based whole mitochondrial genome of Junonia iphita reveals minor intraspecific variation

    Directory of Open Access Journals (Sweden)

    Catherine Vanlalruati

    2015-12-01

    Full Text Available In the present study, the near complete mitochondrial genome (mitogenome of Junonia iphita (Lepidoptera: Nymphalidae: Nymphalinae was determined to be 14,892 bp. The gene order and orientation are identical to those in other butterfly species. The phylogenetic tree constructed from the whole mitogenomes using the 13 protein coding genes (PCGs defines the genetic relatedness of the two J. iphita species collected from two different regions. All the Junonia species clustered together, and were further subdivided into clade one consisting of J. almana and J. orithya and clade two comprising of the two J. iphita which were collected from Indo and Indochinese subregions separated by river barrier. Comparison between the two J. iphita sequences revealed minor variations and Single Nucleotide Polymorphisms were identified at 51 sites amounting to 0.4% of the entire mitochondrial genome.

  12. An integrated map of genetic variation from 1.092 human genomes

    DEFF Research Database (Denmark)

    Abecasis, Goncalo R.; Auton, Adam; Brooks, Lisa D.

    2012-01-01

    By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination...... deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding...... consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites...

  13. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  14. Genetic architecture of bone quality variation in layer chickens revealed by a genome-wide association study

    Science.gov (United States)

    Guo, Jun; Sun, Congjiao; Qu, Liang; Shen, Manman; Dou, Taocun; Ma, Meng; Wang, Kehua; Yang, Ning

    2017-01-01

    Skeletal problems in layer chickens are gaining attention due to animal welfare and economic losses in the egg industry. The genetic improvement of bone traits has been proposed as a potential solution to these issues; however, genetic architecture is not well understood. We conducted a genome-wide association study (GWAS) on bone quality using a sample of 1534 hens genotyped with a 600 K Chicken Genotyping Array. Using a linear mixed model approach, a novel locus close to GSG1L, associated with femur bone mineral density (BMD), was uncovered in this study. In addition, nine SNPs in genes were associated with bone quality. Three of these genes, RANKL, ADAMTS and SOST, were known to be associated with osteoporosis in humans, which makes them good candidate genes for osteoporosis in chickens. Genomic partitioning analysis supports the fact that common variants contribute to the variations of bone quality. We have identified several strong candidate genes and genomic regions associated with bone traits measured in end-of-lay cage layers, which accounted for 1.3–7.7% of the phenotypic variance. These SNPs could provide the relevant information to help elucidate which genes affect bone quality in chicken. PMID:28383518

  15. Whole genome re-sequencing reveals genome-wide variations among parental lines of 16 mapping populations in chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Thudi, Mahendar; Khan, Aamir W; Kumar, Vinay; Gaur, Pooran M; Katta, Krishnamohan; Garg, Vanika; Roorkiwal, Manish; Samineni, Srinivasan; Varshney, Rajeev K

    2016-01-27

    Chickpea (Cicer arietinum L.) is the second most important grain legume cultivated by resource poor farmers in South Asia and Sub-Saharan Africa. In order to harness the untapped genetic potential available for chickpea improvement, we re-sequenced 35 chickpea genotypes representing parental lines of 16 mapping populations segregating for abiotic (drought, heat, salinity), biotic stresses (Fusarium wilt, Ascochyta blight, Botrytis grey mould, Helicoverpa armigera) and nutritionally important (protein content) traits using whole genome re-sequencing approach. A total of 192.19 Gb data, generated on 35 genotypes of chickpea, comprising 973.13 million reads, with an average sequencing depth of ~10 X for each line. On an average 92.18 % reads from each genotype were aligned to the chickpea reference genome with 82.17 % coverage. A total of 2,058,566 unique single nucleotide polymorphisms (SNPs) and 292,588 Indels were detected while comparing with the reference chickpea genome. Highest number of SNPs were identified on the Ca4 pseudomolecule. In addition, copy number variations (CNVs) such as gene deletions and duplications were identified across the chickpea parental genotypes, which were minimum in PI 489777 (1 gene deletion) and maximum in JG 74 (1,497). A total of 164,856 line specific variations (144,888 SNPs and 19,968 Indels) with the highest percentage were identified in coding regions in ICC 1496 (21 %) followed by ICCV 97105 (12 %). Of 539 miscellaneous variations, 339, 138 and 62 were inter-chromosomal variations (CTX), intra-chromosomal variations (ITX) and inversions (INV) respectively. Genome-wide SNPs, Indels, CNVs, PAVs, and miscellaneous variations identified in different mapping populations are a valuable resource in genetic research and helpful in locating genes/genomic segments responsible for economically important traits. Further, the genome-wide variations identified in the present study can be used for developing high density SNP arrays for

  16. Analysis of high-identity segmental duplications in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Carelli Francesco N

    2011-08-01

    Full Text Available Abstract Background Segmental duplications (SDs are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (Vitis vinifera genome (PN40024. Results We demonstrate that recent SDs (> 94% identity and >= 10 kb in size are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence. We detected mitochondrial and plastid DNA and genes (10% of gene annotation in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress. Conclusions These data show the great influence of SDs and organelle DNA transfers in modeling the Vitis vinifera nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.

  17. Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome

    Science.gov (United States)

    Adrian, Andrew B.; Corchado, Johnny Cruz; Comeron, Josep M.

    2016-01-01

    In all eukaryotic species examined, meiotic recombination, and crossovers in particular, occur non‐randomly along chromosomes. The cause for this non-random distribution remains poorly understood but some specific DNA sequence motifs have been shown to be enriched near crossover hotspots in a number of species. We present analyses using machine learning algorithms to investigate whether DNA motif distribution across the genome can be used to predict crossover variation in Drosophila melanogaster, a species without hotspots. Our study exposes a combinatorial non-linear influence of motif presence able to account for a significant fraction of the genome-wide variation in crossover rates at all genomic scales investigated, from 20% at 5-kb to almost 70% at 2,500-kb scale. The models are particularly predictive for regions with the highest and lowest crossover rates and remain highly informative after removing sub-telomeric and -centromeric regions known to have strongly reduced crossover rates. Transcriptional activity during early meiosis and differences in motif use between autosomes and the X chromosome add to the predictive power of the models. Moreover, we show that population-specific differences in crossover rates can be partly explained by differences in motif presence. Our results suggest that crossover distribution in Drosophila is influenced by both meiosis-specific chromatin dynamics and very local constitutive open chromatin associated with DNA motifs that prevent nucleosome stabilization. These findings provide new information on the genetic factors influencing variation in recombination rates and a baseline to study epigenetic mechanisms responsible for plastic recombination as response to different biotic and abiotic conditions and stresses. PMID:27492232

  18. Tandem gene arrays in Trypanosoma brucei: Comparative phylogenomic analysis of duplicate sequence variation

    Directory of Open Access Journals (Sweden)

    Jackson Andrew P

    2007-04-01

    Full Text Available Abstract Background The genome sequence of the protistan parasite Trypanosoma brucei contains many tandem gene arrays. Gene duplicates are created through tandem duplication and are expressed through polycistronic transcription, suggesting that the primary purpose of long, tandem arrays is to increase gene dosage in an environment where individual gene promoters are absent. This report presents the first account of the tandem gene arrays in the T. brucei genome, employing several related genome sequences to establish how variation is created and removed. Results A systematic survey of tandem gene arrays showed that substantial sequence variation existed across the genome; variation from different regions of an array often produced inconsistent phylogenetic affinities. Phylogenetic relationships of gene duplicates were consistent with concerted evolution being a widespread homogenising force. However, tandem duplicates were not usually identical; therefore, any homogenising effect was coincident with divergence among duplicates. Allelic gene conversion was detected using various criteria and was apparently able to both remove and introduce sequence variation. Tandem arrays containing structural heterogeneity demonstrated how sequence homogenisation and differentiation can occur within a single locus. Conclusion The use of multiple genome sequences in a comparative analysis of tandem gene arrays identified substantial sequence variation among gene duplicates. The distribution of sequence variation is determined by a dynamic balance of conservative and innovative evolutionary forces. Gene trees from various species showed that intraspecific duplicates evolve in concert, perhaps through frequent gene conversion, although this does not prevent sequence divergence, especially where structural heterogeneity physically separates a duplicate from its neighbours. In describing dynamics of sequence variation that have consequences beyond gene dosage, this

  19. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  20. Genomic regions showing copy number variations associate with resistance or susceptibility to gastrointestinal nematodes in Angus cattle.

    Science.gov (United States)

    Hou, Yali; Liu, George E; Bickhart, Derek M; Matukumalli, Lakshmi K; Li, Congjun; Song, Jiuzhou; Gasbarre, Louis C; Van Tassell, Curtis P; Sonstegard, Tad S

    2012-03-01

    Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to gastrointestinal nematodes. In this study, we performed a large-scale analysis of CNVs using SNP genotyping data from 472 animals of the same population. We detected 811 candidate CNV regions, which represent 141.8 Mb (~4.7%) of the genome. To investigate the functional impacts of CNVs, we created 2 groups of 100 individual animals with extremely low or high estimated breeding values of eggs per gram of feces and referred to these groups as parasite resistant (PR) or parasite susceptible (PS), respectively. We identified 297 (~51 Mb) and 282 (~48 Mb) CNV regions from PR and PS groups, respectively. Approximately 60% of the CNV regions were specific to the PS group or PR group of animals. Selected PR- or PS-specific CNVs were further experimentally validated by quantitative PCR. A total of 297 PR CNV regions overlapped with 437 Ensembl genes enriched in immunity and defense, like WC1 gene which uniquely expresses on gamma/delta T cells in cattle. Network analyses indicated that the PR-specific genes were predominantly involved in gastrointestinal disease, immunological disease, inflammatory response, cell-to-cell signaling and interaction, lymphoid tissue development, and cell death. By contrast, the 282 PS CNV regions contained 473 Ensembl genes which are overrepresented in environmental interactions. Network analyses indicated that the PS-specific genes were particularly enriched for inflammatory response, immune cell trafficking, metabolic disease, cell cycle, and cellular organization and movement.

  1. Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae

    Directory of Open Access Journals (Sweden)

    Yuan Huang

    2017-06-01

    Full Text Available Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in

  2. Cytogenetics and genome-wide copy number variation analysis of a suspect patient with Prader-Willi syndrome%疑似Prader-Willi综合征患儿的细胞遗传学和基因组拷贝数变异检测

    Institute of Scientific and Technical Information of China (English)

    曾琴英; 赵丽娟; 葛军; 朱俊真

    2011-01-01

    目的 对1个疑似Prader-Willi综合征患儿进行基因组拷贝数变异检测,确诊其病因.方法 收集临床诊断疑似Prader-Willi综合征患儿及其父母外周血,常规G显带和高分辨染色体检查并提取患儿基因组DNA行全基因组拷贝数变异检测.结果 患儿及其父母高分辨染色体技术结果未见异常,但全基因组拷贝数检测患儿结果提示染色体15q11.2-13.1区域杂合缺失5 Mb;患儿定期做Baylay、Gesell发育量表检查提示智商为60~70分,符合Prader-Willi综合征的临床特征.结论 染色体15q11.2-13.1区域杂合缺失是该家系Prader-Willi综合征的病因.当Prader-Willi综合征患者在细胞遗传学未发现异常时,应进一步分子遗传学检查可弥补细胞遗传学方法的不足.%Objective To definite the etiopathogenisis by carrying out the genome-wide copy number variation analysis for a suspect patient with Prader-Willi syndrome. Methods The peripheral blood was collected from the patient who was diagnosed as having Prader-Willi syndrome, as well as his parents for conventional cytogenetic G-banding and high resolution chromosome assay. Genomic DNA of the child patient was extracted from the blood to perform the genome-wide copy number variation analysis. Results There was a heterozygosis deletion of a 5Mb region in chromosome 15q11.2-q13.1 by the genome-wide copy number variation analysis, but no abnormality was observed in high resolution chromosome assay in the child patient and his parents. Baylay and Gesell developmental scale was assessed regularly; the results suggested that the IQ of the child patient was 60-70, according with the clinical feature of Prader-Willi syndrome.Conclusion The heterozygosis deletion in chromosome 15q11.2-q13.1 is the cause of Prader-Willi syndrome in this family. Further molecular genetics detection can make up for the insufficiency in cytogenetics methods, when no abnormality is observed at the level of cytogenetics in

  3. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    Science.gov (United States)

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  4. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    Directory of Open Access Journals (Sweden)

    Guozheng Liu

    Full Text Available BACKGROUND: Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L. is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt genome could be helpful for the evolution research of plant mt genomes. METHODOLOGY/PRINCIPAL FINDINGS: We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. CONCLUSION: The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  5. Patterns of Genome-Wide Variation in Glossina fuscipes fuscipes Tsetse Flies from Uganda.

    Science.gov (United States)

    Gloria-Soria, Andrea; Dunn, W Augustine; Telleria, Erich L; Evans, Benjamin R; Okedi, Loyce; Echodu, Richard; Warren, Wesley C; Montague, Michael J; Aksoy, Serap; Caccone, Adalgisa

    2016-06-01

    The tsetse fly Glossina fuscipes fuscipes (Gff) is the insect vector of the two forms of Human African Trypanosomiasis (HAT) that exist in Uganda. Understanding Gff population dynamics, and the underlying genetics of epidemiologically relevant phenotypes is key to reducing disease transmission. Using ddRAD sequence technology, complemented with whole-genome sequencing, we developed a panel of ∼73,000 single-nucleotide polymorphisms (SNPs) distributed across the Gff genome that can be used for population genomics and to perform genome-wide-association studies. We used these markers to estimate genomic patterns of linkage disequilibrium (LD) in Gff, and used the information, in combination with outlier-locus detection tests, to identify candidate regions of the genome under selection. LD in individual populations decays to half of its maximum value (r(2) max/2) between 1359 and 2429 bp. The overall LD estimated for the species reaches r(2) max/2 at 708 bp, an order of magnitude slower than in Drosophila Using 53 infected (Trypanosoma spp.) and uninfected flies from four genetically distinct Ugandan populations adapted to different environmental conditions, we were able to identify SNPs associated with the infection status of the fly and local environmental adaptation. The extent of LD in Gff likely facilitated the detection of loci under selection, despite the small sample size. Furthermore, it is probable that LD in the regions identified is much higher than the average genomic LD due to strong selection. Our results show that even modest sample sizes can reveal significant genetic associations in this species, which has implications for future studies given the difficulties of collecting field specimens with contrasting phenotypes for association analysis.

  6. Patterns of Genome-Wide Variation in Glossina fuscipes fuscipes Tsetse Flies from Uganda

    Directory of Open Access Journals (Sweden)

    Andrea Gloria-Soria

    2016-06-01

    Full Text Available The tsetse fly Glossina fuscipes fuscipes (Gff is the insect vector of the two forms of Human African Trypanosomiasis (HAT that exist in Uganda. Understanding Gff population dynamics, and the underlying genetics of epidemiologically relevant phenotypes is key to reducing disease transmission. Using ddRAD sequence technology, complemented with whole-genome sequencing, we developed a panel of ∼73,000 single-nucleotide polymorphisms (SNPs distributed across the Gff genome that can be used for population genomics and to perform genome-wide-association studies. We used these markers to estimate genomic patterns of linkage disequilibrium (LD in Gff, and used the information, in combination with outlier-locus detection tests, to identify candidate regions of the genome under selection. LD in individual populations decays to half of its maximum value (r2max/2 between 1359 and 2429 bp. The overall LD estimated for the species reaches r2max/2 at 708 bp, an order of magnitude slower than in Drosophila. Using 53 infected (Trypanosoma spp. and uninfected flies from four genetically distinct Ugandan populations adapted to different environmental conditions, we were able to identify SNPs associated with the infection status of the fly and local environmental adaptation. The extent of LD in Gff likely facilitated the detection of loci under selection, despite the small sample size. Furthermore, it is probable that LD in the regions identified is much higher than the average genomic LD due to strong selection. Our results show that even modest sample sizes can reveal significant genetic associations in this species, which has implications for future studies given the difficulties of collecting field specimens with contrasting phenotypes for association analysis.

  7. Millstone: software for multiplex microbial genome analysis and engineering.

    Science.gov (United States)

    Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  8. Genomic plasticity enables phenotypic variation of Pseudomonas syringae pv. tomato DC3000.

    Directory of Open Access Journals (Sweden)

    Zhongmeng Bao

    Full Text Available Whole genome sequencing revealed the presence of a genomic anomaly in the region of 4.7 to 4.9 Mb of the Pseudomonas syringae pv. tomato (Pst DC3000 genome. The average read depth coverage of Pst DC3000 whole genome sequencing results suggested that a 165 kb segment of the chromosome had doubled in copy number. Further analysis confirmed the 165 kb duplication and that the two copies were arranged as a direct tandem repeat. Examination of the corresponding locus in Pst NCPPB1106, the parent strain of Pst DC3000, suggested that the 165 kb duplication most likely formed after the two strains diverged via transposition of an ISPsy5 insertion sequence (IS followed by unequal crossing over between ISPsy5 elements at each end of the duplicated region. Deletion of one copy of the 165 kb region demonstrated that the duplication facilitated enhanced growth in some culture conditions, but did not affect pathogenic growth in host tomato plants. These types of chromosomal structures are predicted to be unstable and we have observed resolution of the 165 kb duplication to single copy and its subsequent re-duplication. These data demonstrate the role of IS elements in recombination events that facilitate genomic reorganization in P. syringae.

  9. EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics

    Directory of Open Access Journals (Sweden)

    Coutant Sophie

    2012-09-01

    Full Text Available Abstract Background Whole exome sequencing (WES has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s and the affected gene(s. For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design. Results We describe EVA (Exome Variation Analyzer, a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene

  10. Genome-wide survey reveals predisposing diabetes type 2-related DNA methylation variations in human peripheral blood.

    Science.gov (United States)

    Toperoff, Gidon; Aran, Dvir; Kark, Jeremy D; Rosenberg, Michael; Dubnikov, Tatyana; Nissan, Batel; Wainstein, Julio; Friedlander, Yechiel; Levy-Lahad, Ephrat; Glaser, Benjamin; Hellman, Asaf

    2012-01-15

    Inter-individual DNA methylation variations were frequently hypothesized to alter individual susceptibility to Type 2 Diabetes Mellitus (T2DM). Sequence-influenced methylations were described in T2DM-associated genomic regions, but evidence for direct, sequence-independent association with disease risk is missing. Here, we explore disease-contributing DNA methylation through a stepwise study design: first, a pool-based, genome-scale screen among 1169 case and control individuals revealed an excess of differentially methylated sites in genomic regions that were previously associated with T2DM through genetic studies. Next, in-depth analyses were performed at selected top-ranking regions. A CpG site in the first intron of the FTO gene showed small (3.35%) but significant (P = 0.000021) hypomethylation of cases relative to controls. The effect was independent of the sequence polymorphism in the region and persists among individuals carrying the sequence-risk alleles. The odds of belonging to the T2DM group increased by 6.1% for every 1% decrease in methylation (OR = 1.061, 95% CI: 1.032-1.090), the odds ratio for decrease of 1 standard deviation of methylation (adjusted to gender) was 1.5856 (95% CI: 1.2824-1.9606) and the sensitivity (area under the curve = 0.638, 95% CI: 0.586-0.690; males = 0.675, females = 0.609) was better than that of the strongest known sequence variant. Furthermore, a prospective study in an independent population cohort revealed significant hypomethylation of young individuals that later progressed to T2DM, relative to the individuals who stayed healthy. Further genomic analysis revealed co-localization with gene enhancers and with binding sites for methylation-sensitive transcriptional regulators. The data showed that low methylation level at the analyzed sites is an early marker of T2DM and suggests a novel mechanism by which early-onset, inter-individual methylation variation at isolated non-promoter genomic sites predisposes to T2DM.

  11. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  12. Validating Genome-Wide Association Candidates Controlling Quantitative Variation in Nodulation1[OPEN

    Science.gov (United States)

    Tiffin, Peter; Guhlin, Joseph; Atkins, Paul; Baltes, Nicholas J.; Denny, Roxanne

    2017-01-01

    Genome-wide association (GWA) studies offer the opportunity to identify genes that contribute to naturally occurring variation in quantitative traits. However, GWA relies exclusively on statistical association, so functional validation is necessary to make strong claims about gene function. We used a combination of gene-disruption platforms (Tnt1 retrotransposons, hairpin RNA-interference constructs, and CRISPR/Cas9 nucleases) together with randomized, well-replicated experiments to evaluate the function of genes that an earlier GWA study in Medicago truncatula had identified as candidates contributing to variation in the symbiosis between legumes and rhizobia. We evaluated ten candidate genes found in six clusters of strongly associated single nucleotide polymorphisms, selected on the basis of their strength of statistical association, proximity to annotated gene models, and root or nodule expression. We found statistically significant effects on nodule production for three candidate genes, each validated in two independent mutants. Annotated functions of these three genes suggest their contributions to quantitative variation in nodule production occur through processes not previously connected to nodulation, including phosphorous supply and salicylic acid-related defense response. These results demonstrate the utility of GWA combined with reverse mutagenesis technologies to discover and validate genes contributing to naturally occurring variation in quantitative traits. The results highlight the potential for GWA to complement forward genetics in identifying the genetic basis of ecologically and economically important traits. PMID:28057894

  13. A genome wide association study between copy number variation (CNV) and human height in Chinese population

    Institute of Scientific and Technical Information of China (English)

    Xi Li; Liang Zhang; Han Yan; Feng Pan; Zhixin Zhang; Yumei Peng; Qi Zhou; Lina He; Xuezhen Zhu; Jing Cheng; Lishu Zhang; Lijun Tan; Yaozhong Liu; Qing Tian; Hongwen Deng; Xiaogang Liu; Shufeng Lei; Tielin Yang; Xiangding Chen; Fang Zhang; Yue Fang; Yan Guo

    2010-01-01

    Copy number variation (CNV) is a type of genetic variation which may have important roles in phenotypic variability and disease susceptibility. To hunt for genetic variants underlying human height variation, we performed a genome wide CNV association study for human height in 618 Chinese unrelated subjects using Affymetrix 500K array set. After adjusting for age and sex, we found that four CNVs at 6p21.3, 8p23.3-23.2, 9p23 and 16p12.1 were associated with human height (with borderline significant p value: 0.013, 0.011, 0.024, 0.049; respectively). However, after multiple tests correction, none of them was associated with human height. We observed that the gain of copy number (more than 2 copies) at 8p23.3-23.2 was associated with lower height (normal copy number vs. gain of copy number; 161.2 cm vs. 153.7 cm, p = 0.011), which accounted for 0.9% of height variation. Loss of copy number (less than 2 copies) at 6p21.3 was associated with 0.8% lower height (loss of copy number vs. normal copy number: 154.5 cm vs. 161.1 cm, p = 0.013). Since no important genes influencing height located in CNVs at loci of 8p23.3-23.2 and 6p21.3, the two CNVs may cause the structural rearrangements of neighbored important candidate genes, thus regulates the variation of height. Our results expand our knowledge of the genetic factors underlying height variation and the biological regulation of human height.

  14. Genome-size Variation in Switchgrass (Panicum virgatum: Flow Cytometry and Cytology Reveal Rampant Aneuploidy

    Directory of Open Access Journals (Sweden)

    Denise E. Costich

    2010-11-01

    Full Text Available Switchgrass ( L., a native perennial dominant of the prairies of North America, has been targeted as a model herbaceous species for biofeedstock development. A flow-cytometric survey of a core set of 11 primarily upland polyploid switchgrass accessions indicated that there was considerable variation in genome size within each accession, particularly at the octoploid (2 = 8 = 72 chromosome ploidy level. Highly variable chromosome counts in mitotic cell preparations indicated that aneuploidy was more common in octoploids (86.3% than tetraploids (23.2%. Furthermore, the incidence of hyper- versus hypoaneuploidy is equivalent in tetraploids. This is clearly not the case in octoploids, where close to 90% of the aneuploid counts are lower than the euploid number. Cytogenetic investigation using fluorescent in situ hybridization (FISH revealed an unexpected degree of variation in chromosome structure underlying the apparent genomic instability of this species. These results indicate that rapid advances in the breeding of polyploid biofuel feedstocks, based on the molecular-genetic dissection of biomass characteristics and yield, will be predicated on the continual improvement of our understanding of the cytogenetics of these species.

  15. Evaluating variations of genotype calling: a potential source of spurious associations in genome-wide association studies

    Indian Academy of Sciences (India)

    Xuixiao Hong; Zhenqiang Su; Weigong Ge; Leming Shi; Roger Perkins; Hong Fang; Donna Mendrick; Weida Tong

    2010-04-01

    Genome-wide association studies (GWAS) examine the entire human genome with the goal of identifying genetic variants (usually single nucleotide polymorphisms (SNPs)) that are associated with phenotypic traits such as disease status and drug response. The discordance of significantly associated SNPs for the same disease identified from different GWAS indicates that false associations exist in such results. In addition to the possible sources of spurious associations that have been investigated and discussed intensively, such as sample size and population stratification, an accurate and reproducible genotype calling algorithm is required for concordant GWAS results from different studies. However, variations of genotype calling of an algorithm and their effects on significantly associated SNPs identified in downstream association analyses have not been systematically investigated. In this paper, the variations of genotype calling using the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) algorithm and the resulting influence on the lists of significantly associated SNPs were evaluated using the raw data of 270 HapMap samples analysed with the Affymetrix Human Mapping 500K Array Set (Affy500K) by changing algorithmic parameters. Modified were the Dynamic Model (DM) call confidence threshold (threshold) and the number of randomly selected SNPs (size). Comparative analysis of the calling results and the corresponding lists of significantly associated SNPs identified through association analysis revealed that algorithmic parameters used in BRLMM affected the genotype calls and the significantly associated SNPs. Both the threshold and the size affected the called genotypes and the lists of significantly associated SNPs in association analysis. The effect of the threshold was much larger than the effect of the size. Moreover, the heterozygous calls had lower consistency compared to the homozygous calls.

  16. Eight genetic loci associated with variation in lipoprotein-associated phospholipase A2 mass and activity and coronary heart disease: meta-analysis of genome-wide association studies from five community-based studies

    Science.gov (United States)

    Grallert, Harald; Dupuis, Josée; Bis, Joshua C.; Dehghan, Abbas; Barbalic, Maja; Baumert, Jens; Lu, Chen; Smith, Nicholas L.; Uitterlinden, André G.; Roberts, Robert; Khuseyinova, Natalie; Schnabel, Renate B.; Rice, Kenneth M.; Rivadeneira, Fernando; Hoogeveen, Ron C.; Fontes, João Daniel; Meisinger, Christa; Keaney, John F.; Lemaitre, Rozenn; Aulchenko, Yurii S.; Vasan, Ramachandran S.; Ellis, Stephen; Hazen, Stanley L.; van Duijn, Cornelia M.; Nelson, Jeanenne J.; März, Winfried; Schunkert, Heribert; McPherson, Ruth M.; Stirnadel-Farrant, Heide A.; Psaty, Bruce M.; Gieger, Christian; Siscovick, David; Hofman, Albert; Illig, Thomas; Cushman, Mary; Yamamoto, Jennifer F.; Rotter, Jerome I.; Larson, Martin G.; Stewart, Alexandre F.R.; Boerwinkle, Eric; Witteman, Jacqueline C.M.; Tracy, Russell P.; Koenig, Wolfgang; Benjamin, Emelia J.; Ballantyne, Christie M.

    2012-01-01

    Aims Lipoprotein-associated phospholipase A2 (Lp-PLA2) generates proinflammatory and proatherogenic compounds in the arterial vascular wall and is a potential therapeutic target in coronary heart disease (CHD). We searched for genetic loci related to Lp-PLA2 mass or activity by a genome-wide association study as part of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. Methods and results In meta-analyses of findings from five population-based studies, comprising 13 664 subjects, variants at two loci (PLA2G7, CETP) were associated with Lp-PLA2 mass. The strongest signal was at rs1805017 in PLA2G7 [P = 2.4 × 10−23, log Lp-PLA2 difference per allele (beta): 0.043]. Variants at six loci were associated with Lp-PLA2 activity (PLA2G7, APOC1, CELSR2, LDL, ZNF259, SCARB1), among which the strongest signals were at rs4420638, near the APOE–APOC1–APOC4–APOC2 cluster [P = 4.9 × 10−30; log Lp-PLA2 difference per allele (beta): −0.054]. There were no significant gene–environment interactions between these eight polymorphisms associated with Lp-PLA2 mass or activity and age, sex, body mass index, or smoking status. Four of the polymorphisms (in APOC1, CELSR2, SCARB1, ZNF259), but not PLA2G7, were significantly associated with CHD in a second study. Conclusion Levels of Lp-PLA2 mass and activity were associated with PLA2G7, the gene coding for this protein. Lipoprotein-associated phospholipase A2 activity was also strongly associated with genetic variants related to low-density lipoprotein cholesterol levels. PMID:22003152

  17. Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18.

    Directory of Open Access Journals (Sweden)

    Stephen D Bentley

    2007-02-01

    Full Text Available The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an

  18. Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18.

    Directory of Open Access Journals (Sweden)

    Stephen D Bentley

    2007-02-01

    Full Text Available The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an

  19. Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus.

    Science.gov (United States)

    Wang, Qinghua; Dooner, Hugo K

    2006-11-21

    Maize is probably the most diverse of all crop species. Unexpectedly large differences among haplotypes were first revealed in a comparison of the bz genomic regions of two different inbred lines, McC and B73. Retrotransposon clusters, which comprise most of the repetitive DNA in maize, varied markedly in makeup, and location relative to the genes in the region and genic sequences, later shown to be carried by two helitron transposons, also differed between the inbreds. Thus, the allelic bz regions of these Corn Belt inbreds shared only a minority of the total sequence. To investigate further the variation caused by retrotransposons, helitrons, and other insertions, we have analyzed the organization of the bz genomic region in five additional cultivars selected because of their geographic and genetic diversity: the inbreds A188, CML258, and I137TN, and the land races Coroico and NalTel. This vertical comparison has revealed the existence of several new helitrons, new retrotransposons, members of every superfamily of DNA transposons, numerous miniature elements, and novel insertions flanked at either end by TA repeats, which we call TAFTs (TA-flanked transposons). The extent of variation in the region is remarkable. In pairwise comparisons of eight bz haplotypes, the percentage of shared sequences ranges from 25% to 84%. Chimeric haplotypes were identified that combine retrotransposon clusters found in different haplotypes. We propose that recombination in the common gene space greatly amplifies the variability produced by the retrotransposition explosion in the maize ancestry, creating the heterogeneity in genome organization found in modern maize.

  20. Identification of genome-wide copy number variations among diverse pig breeds by array CGH

    Directory of Open Access Journals (Sweden)

    Li Yan

    2012-12-01

    Full Text Available Abstract Background Recent studies have shown that copy number variation (CNV in mammalian genomes contributes to phenotypic diversity, including health and disease status. In domestic pigs, CNV has been catalogued by several reports, but the extent of CNV and the phenotypic effects are far from clear. The goal of this study was to identify CNV regions (CNVRs in pigs based on array comparative genome hybridization (aCGH. Results Here a custom-made tiling oligo-nucleotide array was used with a median probe spacing of 2506 bp for screening 12 pigs including 3 Chinese native pigs (one Chinese Erhualian, one Tongcheng and one Yangxin pig, 5 European pigs (one Large White, one Pietrain, one White Duroc and two Landrace pigs, 2 synthetic pigs (Chinese new line DIV pigs and 2 crossbred pigs (Landrace × DIV pigs with a Duroc pig as the reference. Two hundred and fifty-nine CNVRs across chromosomes 1–18 and X were identified, with an average size of 65.07 kb and a median size of 98.74 kb, covering 16.85 Mb or 0.74% of the whole genome. Concerning copy number status, 93 (35.91% CNVRs were called as gains, 140 (54.05% were called as losses and the remaining 26 (10.04% were called as both gains and losses. Of all detected CNVRs, 171 (66.02% and 34 (13.13% CNVRs directly overlapped with Sus scrofa duplicated sequences and pig QTLs, respectively. The CNVRs encompassed 372 full length Ensembl transcripts. Two CNVRs identified by aCGH were validated using real-time quantitative PCR (qPCR. Conclusions Using 720 K array CGH (aCGH we described a map of porcine CNVs which facilitated the identification of structural variations for important phenotypes and the assessment of the genetic diversity of pigs.

  1. MD-SeeGH: a platform for integrative analysis of multi-dimensional genomic data

    Directory of Open Access Journals (Sweden)

    Ng Raymond T

    2008-05-01

    Full Text Available Abstract Background Recent advances in global genomic profiling methodologies have enabled multi-dimensional characterization of biological systems. Complete analysis of these genomic profiles require an in depth look at parallel profiles of segmental DNA copy number status, DNA methylation state, single nucleotide polymorphisms, as well as gene expression profiles. Due to the differences in data types it is difficult to conduct parallel analysis of multiple datasets from diverse platforms. Results To address this issue, we have developed an integrative genomic analysis platform MD-SeeGH, a software tool that allows users to rapidly and directly analyze genomic datasets spanning multiple genomic experiments. With MD-SeeGH, users have the flexibility to easily update datasets in accordance with new genomic builds, make a quality assessment of data using the filtering features, and identify genetic alterations within single or across multiple experiments. Multiple sample analysis in MD-SeeGH allows users to compare profiles from many experiments alongside tracks containing detailed localized gene information, microRNA, CpG islands, and copy number variations. Conclusion MD-SeeGH is a new platform for the integrative analysis of diverse microarray data, facilitating multiple profile analyses and group comparisons.

  2. An approach to incorporate linkage disequilibrium structure into genomic association analysis

    Institute of Scientific and Technical Information of China (English)

    Fengyu Zhang; Diane Wagener

    2008-01-01

    In this study, we propose to use the principal component analysis (PCA) and regression model to incorporate linkage disequilibrium (LD) in genomic association data analysis. To accommodate LD in genomic data and reduce multiple testing, we suggest performing PCA and extracting the PCA score to capture the variation of genomic data, after which regression analysis is used to assess the association of the disease with the principal component score. An empirical analysis result shows that both genotype-basod correlation matrix and haplotype-based LD matrix can produce similar results for PCA. Principal component score seems to be more powerful in detecting genetic association because the principal component score is quantitatively measured and may be able to capture the effect of multiple loci.

  3. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    Science.gov (United States)

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  4. Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

    Directory of Open Access Journals (Sweden)

    Okimoto Ron

    2011-02-01

    Full Text Available Abstract Background Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs to kilobase, and even megabase, sized structural variants (SVs, such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken. Results We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome. Conclusion We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.

  5. Pathway and network analysis of cancer genomes

    DEFF Research Database (Denmark)

    Creixell, Pau; Reimand, Jueri; Haider, Syed

    2015-01-01

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been...

  6. Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

    Science.gov (United States)

    MacRae, Sheila L; Zhang, Quanwei; Lemetre, Christophe; Seim, Inge; Calder, Robert B; Hoeijmakers, Jan; Suh, Yousin; Gladyshev, Vadim N; Seluanov, Andrei; Gorbunova, Vera; Vijg, Jan; Zhang, Zhengdong D

    2015-01-01

    Genome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM genes appeared to be strongly conserved, with copy number variation in only four genes. Interestingly, we found NMR to have a higher copy number of CEBPG, a regulator of DNA repair, and TINF2, a protector of telomere integrity. NMR, as well as human, was also found to have a lower rate of germline nucleotide substitution than the mouse. Together, the data suggest that the long-lived NMR, as well as human, has more robust GM than mouse and identifies new targets for the analysis of the exceptional longevity of the NMR. PMID:25645816

  7. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges

    Science.gov (United States)

    Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song

    2013-01-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121

  8. Genome analysis methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGDBj Registered...ear Year of genome analysis Sequencing method Sequencing method Read counts Read counts Covered genome region Covered...otation method Number of predicted genes Number of predicted genes Genome database Genome database informati... License Update History of This Database Site Policy | Contact Us Genome analysis... methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  9. Genomic and Network Patterns of Schizophrenia Genetic Variation in Human Evolutionary Accelerated Regions

    OpenAIRE

    Xu, Ke; Schadt, Eric E.; Pollard, Katherine S.; Roussos, Panos; Joel T Dudley

    2015-01-01

    The population persistence of schizophrenia despite associated reductions in fitness and fecundity suggests that the genetic basis of schizophrenia has a complex evolutionary history. A recent meta-analysis of schizophrenia genome-wide association studies offers novel opportunities for assessment of the evolutionary trajectories of schizophrenia-associated loci. In this study, we hypothesize that components of the genetic architecture of schizophrenia are attributable to human lineage-specifi...

  10. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation

    OpenAIRE

    2016-01-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris ...

  11. Genomic and Network Patterns of Schizophrenia Genetic Variation in Human Evolutionary Accelerated Regions

    OpenAIRE

    Xu, Ke; Schadt, Eric E.; Pollard, Katherine S.; Roussos, Panos; Dudley, Joel T

    2015-01-01

    The population persistence of schizophrenia despite associated reductions in fitness and fecundity suggests that the genetic basis of schizophrenia has a complex evolutionary history. A recent meta-analysis of schizophrenia genome-wide association studies offers novel opportunities for assessment of the evolutionary trajectories of schizophrenia-associated loci. In this study, we hypothesize that components of the genetic architecture of schizophrenia are attributable to human lineage-specifi...

  12. Identity by descent: variation in meiosis, across genomes, and in populations.

    Science.gov (United States)

    Thompson, Elizabeth A

    2013-06-01

    Gene identity by descent (IBD) is a fundamental concept that underlies genetically mediated similarities among relatives. Gene IBD is traced through ancestral meioses and is defined relative to founders of a pedigree, or to some time point or mutational origin in the coalescent of a set of extant genes in a population. The random process underlying changes in the patterns of IBD across the genome is recombination, so the natural context for defining IBD is the ancestral recombination graph (ARG), which specifies the complete ancestry of a collection of chromosomes. The ARG determines both the sequence of coalescent ancestries across the chromosome and the extant segments of DNA descending unbroken by recombination from their most recent common ancestor (MRCA). DNA segments IBD from a recent common ancestor have high probability of being of the same allelic type. Non-IBD DNA is modeled as of independent allelic type, but the population frame of reference for defining allelic independence can vary. Whether of IBD, allelic similarity, or phenotypic covariance, comparisons may be made to other genomic regions of the same gametes, or to the same genomic regions in other sets of gametes or diploid individuals. In this review, I present IBD as the framework connecting evolutionary and coalescent theory with the analysis of genetic data observed on individuals. I focus on the high variance of the processes that determine IBD, its changes across the genome, and its impact on observable data.

  13. The phosphoprotein gene of a dolphin morbillivirus isolate exhibits genomic variation at the editing site.

    Science.gov (United States)

    Bolt, G; Alexandersen, S; Blixenkrone-Møller, M

    1995-12-01

    The nucleotide sequence of the phosphoprotein (P) gene of a dolphin morbillivirus (DMV) isolate was determined. Like those of other morbilliviruses the DMV P gene encoded P and C proteins in overlapping open reading frames and V protein by editing the P gene transcript. Among P mRNA based clones the editing site variants GGGC, GGGG, GAGC and GGGGGGC predicting a P protein, and the variants GGGGC and GGGGGG predicting a V protein, were found. Surprisingly, the three variants GGGC, GGGG and GAGC were also found among clones generated from genomic RNA of the DMV isolate. Thus, more than one viral genome type appeared to be present in cells infected with the DMV isolate. By a similar analysis of the virus genomes in the tissue from which the DMV isolate was obtained, only the GGGC type was found, indicating that the GGGG and GAGC types arose during adaptation of the virus to growth in cell cultures. No editing site variants likely to have arisen by editing the GAGC type were encountered, and it remains ot be determined whether mRNA encoding V protein can be transcribed from genomes with this editing site. Using antisera raised against the common N terminus and unique C termini of the predicted P and V proteins, the in vivo expression of these proteins was demonstrated.

  14. Integrative Genomic Analysis of Complex traits

    DEFF Research Database (Denmark)

    Ehsani, Ali Reza

    In the last decade rapid development in biotechnologies has made it possible to extract extensive information about practically all levels of biological organization. An ever-increasing number of studies are reporting miltilayered datasets on the entire DNA sequence, transceroption, protein...... expression, and metabolite abundance of more and more populations in a multitude of invironments. However, a solid model for including all of this complex information in one analysis, to disentangle genetic variation and the underlying genetic architecture of complex traits and diseases, has not yet been...... proposed. This thesis introduced a novel way to integrate such huge data sets in an efficient and informative procedure to dissect the comæexity of obesity related traits (e.g. body wight, body fat, feed intake, etc) and map the flow from DNA through RNA ending with individual phenotypes....

  15. Construction, alignment and analysis of twelve framework physical maps that represent the ten genome types of the genus Oryza.

    Science.gov (United States)

    Kim, HyeRan; Hurwitz, Bonnie; Yu, Yeisoo; Collura, Kristi; Gill, Navdeep; SanMiguel, Phillip; Mullikin, James C; Maher, Christopher; Nelson, William; Wissotski, Marina; Braidotti, Michele; Kudrna, David; Goicoechea, José Luis; Stein, Lincoln; Ware, Doreen; Jackson, Scott A; Soderlund, Carol; Wing, Rod A

    2008-01-01

    We describe the establishment and analysis of a genus-wide comparative framework composed of 12 bacterial artificial chromosome fingerprint and end-sequenced physical maps representing the 10 genome types of Oryza aligned to the O. sativa ssp. japonica reference genome sequence. Over 932 Mb of end sequence was analyzed for repeats, simple sequence repeats, miRNA and single nucleotide variations, providing the most extensive analysis of Oryza sequence to date.

  16. Genome Analysis of Streptococcus pyogenes Associated with Pharyngitis and Skin Infections

    Science.gov (United States)

    Ibrahim, Joe; Eisen, Jonathan A.; Jospin, Guillaume; Coil, David A.; Khazen, Georges

    2016-01-01

    Streptococcus pyogenes is a very important human pathogen, commonly associated with skin or throat infections but can also cause life-threatening situations including sepsis, streptococcal toxic shock syndrome, and necrotizing fasciitis. Various studies involving typing and molecular characterization of S. pyogenes have been published to date; however next-generation sequencing (NGS) studies provide a comprehensive collection of an organism’s genetic variation. In this study, the genomes of nine S. pyogenes isolates associated with pharyngitis and skin infection were sequenced and studied for the presence of virulence genes, resistance elements, prophages, genomic recombination, and other genomic features. Additionally, a comparative phylogenetic analysis of the isolates with global clones highlighted their possible evolutionary lineage and their site of infection. The genomes were found to also house a multitude of features including gene regulation systems, virulence factors and antimicrobial resistance mechanisms. PMID:27977735

  17. Genome Analysis of Streptococcus pyogenes Associated with Pharyngitis and Skin Infections.

    Science.gov (United States)

    Ibrahim, Joe; Eisen, Jonathan A; Jospin, Guillaume; Coil, David A; Khazen, Georges; Tokajian, Sima

    2016-01-01

    Streptococcus pyogenes is a very important human pathogen, commonly associated with skin or throat infections but can also cause life-threatening situations including sepsis, streptococcal toxic shock syndrome, and necrotizing fasciitis. Various studies involving typing and molecular characterization of S. pyogenes have been published to date; however next-generation sequencing (NGS) studies provide a comprehensive collection of an organism's genetic variation. In this study, the genomes of nine S. pyogenes isolates associated with pharyngitis and skin infection were sequenced and studied for the presence of virulence genes, resistance elements, prophages, genomic recombination, and other genomic features. Additionally, a comparative phylogenetic analysis of the isolates with global clones highlighted their possible evolutionary lineage and their site of infection. The genomes were found to also house a multitude of features including gene regulation systems, virulence factors and antimicrobial resistance mechanisms.

  18. DivStat: a user-friendly tool for single nucleotide polymorphism analysis of genomic diversity.

    Directory of Open Access Journals (Sweden)

    Inês Soares

    Full Text Available Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs. Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.

  19. Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

    Science.gov (United States)

    Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

    2012-01-01

    Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in

  20. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    Science.gov (United States)

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-02-17

    The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  1. Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

    Directory of Open Access Journals (Sweden)

    Doan Ryan

    2012-02-01

    Full Text Available Abstract Background The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs, insertion/deletion polymorphisms (INDELs, and copy number variants (CNVs in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Results Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. Conclusions This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  2. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    KAUST Repository

    Doan, Ryan

    2012-02-17

    BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  3. Comparative analysis of microsatellites in chloroplast genomes of lower and higher plants.

    Science.gov (United States)

    George, Biju; Bhatt, Bhavin S; Awasthi, Mayur; George, Binu; Singh, Achuit K

    2015-11-01

    Microsatellites, or simple sequence repeats (SSRs), contain repetitive DNA sequence where tandem repeats of one to six base pairs are present number of times. Chloroplast genome sequences have been  shown to possess extensive variations in the length, number and distribution of SSRs. However, a comparative analysis of chloroplast microsatellites is not available. Considering their potential importance in generating genomic diversity, we have systematically analysed the abundance and distribution of simple and compound microsatellites in 164 sequenced chloroplast genomes from wide range of plants. The key findings of these studies are (1) a large number of mononucleotide repeats as compared to SSR(2-6)(di-, tri-, tetra-, penta-, hexanucleotide repeats) are present in all chloroplast genomes investigated, (2) lower plants such as algae show wide variation in relative abundance, density and distribution of microsatellite repeats as compared to flowering plants, (3) longer SSRs are excluded from coding regions of most chloroplast genomes, (4) GC content has a weak influence on number, relative abundance and relative density of mononucleotide as well as SSR(2-6). However, GC content strongly showed negative correlation with relative density (R (2) = 0.5, P plants possesses relatively more genomic diversity compared to higher plants.

  4. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  5. Genomic analysis of plant chromosomes based on meiotic pairing

    Directory of Open Access Journals (Sweden)

    Lisete Chamma Davide

    2007-12-01

    Full Text Available This review presents the principles and applications of classical genomic analysis, with emphasis on plant breeding. The main mathematical models used to estimate the preferential chromosome pairing in diploid or polyploid, interspecific or intergenera hybrids are presented and discussed, with special reference to the applications and studies for the definition of genome relationships among species of the Poaceae family.

  6. Initial sequencing and analysis of the human genome.

    Science.gov (United States)

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  7. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  8. Analysis of Simple Sequence Repeats in Genomes of Rhizobia

    Institute of Scientific and Technical Information of China (English)

    GAO Ya-mei; HAN Yi-qiang; TANG Hui; SUN Dong-mei; WANG Yan-jie; WANG Wei-dong

    2008-01-01

    Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se-quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and 5. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and 5. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.

  9. Analysis of intra-genomic GC content homogeneity within prokaryotes

    DEFF Research Database (Denmark)

    Bohlin, J; Snipen, L; Hardy, S.P.

    2010-01-01

    both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content...

  10. Analysis of segmental duplications reveals a distinct pattern of continuation-of-synteny between human and mouse genomes.

    Science.gov (United States)

    Mehan, Michael R; Almonte, Maricel; Slaten, Erin; Freimer, Nelson B; Rao, P Nagesh; Ophoff, Roel A

    2007-03-01

    About 5% of the human genome consists of large-scale duplicated segments of almost identical sequences. Segmental duplications (SDs) have been proposed to be involved in non-allelic homologous recombination leading to recurrent genomic variation and disease. It has also been suggested that these SDs are associated with syntenic rearrangements that have shaped the human genome. We have analyzed 14 members of a single family of closely related SDs in the human genome, some of which are associated with common inversion polymorphisms at chromosomes 8p23 and 4p16. Comparative analysis with the mouse genome revealed syntenic inversions for these two human polymorphic loci. In addition, 12 of the 14 SDs, while absent in the mouse genome, occur at the breaks of synteny; suggesting a non-random involvement of these sequences in genome evolution. Furthermore, we observed a syntenic familial relationship between 8 and 12 breakpoint-loci, where broken synteny that ends at one family member resumes at another, even across different chromosomes. Subsequent genome-wide assessment revealed that this relationship, which we named continuation-of-synteny, is not limited to the 8p23 family and occurs 46 times in the human genome with high frequency at specific chromosomes. Our analysis supports a non-random breakage model of genomic evolution with an active involvement of segmental duplications for specific regions of the human genome.

  11. Analysis of the Vibrionaceae pan-genome

    OpenAIRE

    Kahlke, Tim

    2013-01-01

    Paper 2 of this thesis is not available in Munin: 2. Tim Kahlke, Alexander Goesmann and Peik Haugen: 'The Vibrionaceae pan-genome hints at gene expression as the major driving force for unequal gene distributions on Vibrionaceae chromosomes' (manuscript) In the presented work the bacterial family Vibrionaceae was used as a model to investigate bacterial diversity on a gene level and to analyze the underlying concepts of bacterial niche adaptation and evolution. For this, the genomes ...

  12. Next-Gen phylogeography of rainforest trees: exploring landscape-level cpDNA variation from whole-genome sequencing.

    Science.gov (United States)

    van der Merwe, M; McPherson, H; Siow, J; Rossetto, M

    2014-01-01

    Standardized phylogeographic studies across codistributed taxa can identify important refugia and biogeographic barriers, and potentially uncover how changes in adaptive constraints through space and time impact on the distribution of genetic diversity. The combination of next-generation sequencing and methodologies that enable uncomplicated analysis of the full chloroplast genome may provide an invaluable resource for such studies. Here, we assess the potential of a shotgun-based method across twelve nonmodel rainforest trees sampled from two evolutionary distinct regions. Whole genomic shotgun sequencing libraries consisting of pooled individuals were used to assemble species-specific chloroplast references (in silicio). For each species, the pooled libraries allowed for the detection of variation within and between data sets (each representing a geographic region). The potential use of nuclear rDNA as an additional marker from the NGS libraries was investigated by mapping reads against available references. We successfully obtained phylogeographically informative sequence data from a range of previously unstudied rainforest trees. Greater levels of diversity were found in northern refugial rainforests than in southern expansion areas. The genetic signatures of varying evolutionary histories were detected, and interesting associative patterns between functional characteristics and genetic diversity were identified. This approach can suit a wide range of landscape-level studies. As the key laboratory-based steps do not require prior species-specific knowledge and can be easily outsourced, the techniques described here are even suitable for researchers without access to wet-laboratory facilities, making evolutionary ecology questions increasingly accessible to the research community.

  13. Comparative genomic analysis and phylogenetic position of Theileria equi

    Directory of Open Access Journals (Sweden)

    Kappmeyer Lowell S

    2012-11-01

    Full Text Available Abstract Background Transmission of arthropod-borne apicomplexan parasites that cause disease and result in death or persistent infection represents a major challenge to global human and animal health. First described in 1901 as Piroplasma equi, this re-emergent apicomplexan parasite was renamed Babesia equi and subsequently Theileria equi, reflecting an uncertain taxonomy. Understanding mechanisms by which apicomplexan parasites evade immune or chemotherapeutic elimination is required for development of effective vaccines or chemotherapeutics. The continued risk of transmission of T. equi from clinically silent, persistently infected equids impedes the goal of returning the U. S. to non-endemic status. Therefore comparative genomic analysis of T. equi was undertaken to: 1 identify genes contributing to immune evasion and persistence in equid hosts, 2 identify genes involved in PBMC infection biology and 3 define the phylogenetic position of T. equi relative to sequenced apicomplexan parasites. Results The known immunodominant proteins, EMA1, 2 and 3 were discovered to belong to a ten member gene family with a mean amino acid identity, in pairwise comparisons, of 39%. Importantly, the amino acid diversity of EMAs is distributed throughout the length of the proteins. Eight of the EMA genes were simultaneously transcribed. As the agents that cause bovine theileriosis infect and transform host cell PBMCs, we confirmed that T. equi infects equine PBMCs, however, there is no evidence of host cell transformation. Indeed, a number of genes identified as potential manipulators of the host cell phenotype are absent from the T. equi genome. Comparative genomic analysis of T. equi revealed the phylogenetic positioning relative to seven apicomplexan parasites using deduced amino acid sequences from 150 genes placed it as a sister taxon to Theileria spp. Conclusions The EMA family does not fit the paradigm for classical antigenic variation, and we propose a

  14. Utilizing linkage disequilibrium information from Indian Genome Variation Database for mapping mutations: SCA12 case study

    Indian Academy of Sciences (India)

    Samira Bahl; Ikhlak Ahmed; The Indian Genome Variation Consortium; Mitali Mukerji

    2009-04-01

    Stratification in heterogeneous populations poses an enormous challenge in linkage disequilibrium (LD) based identification of causal loci using surrogate markers. In this study, we demonstrate the enormous potential of endogamous Indian populations for mapping mutations in candidate genes using minimal SNPs, mainly due to larger regions of LD. We show this by a case study of the PPP2R2B gene (∼400 kb) that harbours a CAG repeat, expansion of which has been implicated in spinocerebellar ataxia type 12 (SCA12). Using LD information derived from Indian Genome Variation database (IGVdb) on populations which share similar ethnic and linguistic backgrounds as the SCA12 study population, we could map the causal loci using a minimal set of three SNPs, without the generation of additional basal data from the ethnically matched population. We could also demonstrate transferability of tagSNPs from a related HapMap population for mapping the mutation.

  15. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

    Science.gov (United States)

    Bhaskar, Anand; Song, Yun S

    2014-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.

  16. Variation in the OC locus of Acinetobacter baumannii genomes predicts extensive structural diversity in the lipooligosaccharide.

    Directory of Open Access Journals (Sweden)

    Johanna J Kenyon

    Full Text Available Lipooligosaccharide (LOS is a complex surface structure that is linked to many pathogenic properties of Acinetobacter baumannii. In A. baumannii, the genes responsible for the synthesis of the outer core (OC component of the LOS are located between ilvE and aspS. The content of the OC locus is usually variable within a species, and examination of 6 complete and 227 draft A. baumannii genome sequences available in GenBank non-redundant and Whole Genome Shotgun databases revealed nine distinct new types, OCL4-OCL12, in addition to the three known ones. The twelve gene clusters fell into two distinct groups, designated Group A and Group B, based on similarities in the genes present. OCL6 (Group B was unique in that it included genes for the synthesis of L-Rhamnosep. Genetic exchange of the different configurations between strains has occurred as some OC forms were found in several different sequence types (STs. OCL1 (Group A was the most widely distributed being present in 18 STs, and OCL6 was found in 16 STs. Variation within clones was also observed, with more than one OC locus type found in the two globally disseminated clones, GC1 and GC2, that include the majority of multiply antibiotic resistant isolates. OCL1 was the most abundant gene cluster in both GC1 and GC2 genomes but GC1 isolates also carried OCL2, OCL3 or OCL5, and OCL3 was also present in GC2. As replacement of the OC locus in the major global clones indicates the presence of sub-lineages, a PCR typing scheme was developed to rapidly distinguish Group A and Group B types, and to distinguish the specific forms found in GC1 and GC2 isolates.

  17. Genomic Characterization for Parasitic Weeds of the Genus Striga by Sample Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Matt C. Estep

    2012-03-01

    Full Text Available Generation of ∼2200 Sanger sequence reads or ∼10,000 454 reads for seven Lour. DNA samples (five species allowed identification of the highly repetitive DNA content in these genomes. The 14 most abundant repeats in these species were identified and partially assembled. Annotation indicated that they represent nine long terminal repeat (LTR retrotransposon families, three tandem satellite repeats, one long interspersed element (LINE retroelement, and one DNA transposon. All of these repeats are most closely related to repetitive elements in other closely related plants and are not products of horizontal transfer from their host species. These repeats were differentially abundant in each species, with the LTR retrotransposons and satellite repeats most responsible for variation in genome size. Each species had some repetitive elements that were more abundant and some less abundant than the other species examined, indicating that no single element or any unilateral growth or decrease trend in genome behavior was responsible for variation in genome size and composition. Genome sizes were determined by flow sorting, and the values of 615 Mb [ (L. Kuntze], 1330 Mb [ (Willd. Vatke], 1425 Mb [ (Delile Benth.] and 2460 Mb ( Benth. suggest a ploidy series, a prediction supported by repetitive DNA sequence analysis. Phylogenetic analysis using six chloroplast loci indicated the ancestral relationships of the five most agriculturally important species, with the unexpected result that the one parasite of dicotyledonous plants ( was found to be more closely related to some of the grass parasites than many of the grass parasites are to each other.

  18. Advances in biotechnology and linking outputs to variation in complex traits: Plant and Animal Genome meeting January 2012.

    Science.gov (United States)

    Appels, R; Barrero, R; Bellgard, M

    2012-03-01

    The Plant and Animal Genome (PAG, held annually) meeting in January 2012 provided insights into the advances in plant, animal, and microbe genome studies particularly as they impact on our understanding of complex biological systems. The diverse areas of biology covered included the advances in technologies, variation in complex traits, genome change in evolution, and targeting phenotypic changes, across the broad spectrum of life forms. This overview aims to summarize the major advances in research areas presented in the plenary lectures and does not attempt to summarize the diverse research activities covered throughout the PAG in workshops, posters, presentations, and displays by suppliers of cutting-edge technologies.

  19. Plasticity of the Leishmania genome leading to gene copy number variations and drug resistance [version 1; referees: 5 approved

    Directory of Open Access Journals (Sweden)

    Marie-Claude N. Laffitte

    2016-09-01

    Full Text Available Leishmania has a plastic genome, and drug pressure can select for gene copy number variation (CNV. CNVs can apply either to whole chromosomes, leading to aneuploidy, or to specific genomic regions. For the latter, the amplification of chromosomal regions occurs at the level of homologous direct or inverted repeated sequences leading to extrachromosomal circular or linear amplified DNAs. This ability of Leishmania to respond to drug pressure by CNVs has led to the development of genomic screens such as Cos-Seq, which has the potential of expediting the discovery of drug targets for novel promising drug candidates.

  20. A genome-wide association study of copy number variations with umbilical hernia in swine.

    Science.gov (United States)

    Long, Yi; Su, Ying; Ai, Huashui; Zhang, Zhiyan; Yang, Bin; Ruan, Guorong; Xiao, Shijun; Liao, Xinjun; Ren, Jun; Huang, Lusheng; Ding, Nengshui

    2016-06-01

    Umbilical hernia (UH) is one of the most common congenital defects in pigs, leading to considerable economic loss and serious animal welfare problems. To test whether copy number variations (CNVs) contribute to pig UH, we performed a case-control genome-wide CNV association study on 905 pigs from the Duroc, Landrace and Yorkshire breeds using the Porcine SNP60 BeadChip and penncnv algorithm. We first constructed a genomic map comprising 6193 CNVs that pertain to 737 CNV regions. Then, we identified eight CNVs significantly associated with the risk for UH in the three pig breeds. Six of seven significantly associated CNVs were validated using quantitative real-time PCR. Notably, a rare CNV (CNV14:13030843-13059455) encompassing the NUGGC gene was strongly associated with UH (permutation-corrected P = 0.0015) in Duroc pigs. This CNV occurred exclusively in seven Duroc UH-affected individuals. SNPs surrounding the CNV did not show association signals, indicating that rare CNVs may play an important role in complex pig diseases such as UH. The NUGGC gene has been implicated in human omphalocele and inguinal hernia. Our finding supports that CNVs, including the NUGGC CNV, contribute to the pathogenesis of pig UH.

  1. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    Science.gov (United States)

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  2. Genome-wide copy number variation (CNV) in patients with autoimmune Addison's disease

    Science.gov (United States)

    2011-01-01

    Background Addison's disease (AD) is caused by an autoimmune destruction of the adrenal cortex. The pathogenesis is multi-factorial, involving genetic components and hitherto unknown environmental factors. The aim of the present study was to investigate if gene dosage in the form of copy number variation (CNV) could add to the repertoire of genetic susceptibility to autoimmune AD. Methods A genome-wide study using the Affymetrix GeneChip® Genome-Wide Human SNP Array 6.0 was conducted in 26 patients with AD. CNVs in selected genes were further investigated in a larger material of patients with autoimmune AD (n = 352) and healthy controls (n = 353) by duplex Taqman real-time polymerase chain reaction assays. Results We found that low copy number of UGT2B28 was significantly more frequent in AD patients compared to controls; conversely high copy number of ADAM3A was associated with AD. Conclusions We have identified two novel CNV associations to ADAM3A and UGT2B28 in AD. The mechanism by which this susceptibility is conferred is at present unclear, but may involve steroid inactivation (UGT2B28) and T cell maturation (ADAM3A). Characterization of these proteins may unravel novel information on the pathogenesis of autoimmunity. PMID:21851588

  3. Genome-wide copy number variation (CNV in patients with autoimmune Addison's disease

    Directory of Open Access Journals (Sweden)

    Brønstad Ingeborg

    2011-08-01

    Full Text Available Abstract Background Addison's disease (AD is caused by an autoimmune destruction of the adrenal cortex. The pathogenesis is multi-factorial, involving genetic components and hitherto unknown environmental factors. The aim of the present study was to investigate if gene dosage in the form of copy number variation (CNV could add to the repertoire of genetic susceptibility to autoimmune AD. Methods A genome-wide study using the Affymetrix GeneChip® Genome-Wide Human SNP Array 6.0 was conducted in 26 patients with AD. CNVs in selected genes were further investigated in a larger material of patients with autoimmune AD (n = 352 and healthy controls (n = 353 by duplex Taqman real-time polymerase chain reaction assays. Results We found that low copy number of UGT2B28 was significantly more frequent in AD patients compared to controls; conversely high copy number of ADAM3A was associated with AD. Conclusions We have identified two novel CNV associations to ADAM3A and UGT2B28 in AD. The mechanism by which this susceptibility is conferred is at present unclear, but may involve steroid inactivation (UGT2B28 and T cell maturation (ADAM3A. Characterization of these proteins may unravel novel information on the pathogenesis of autoimmunity.

  4. Evaluating the performance of commercial whole-genome marker sets for capturing common genetic variation

    Directory of Open Access Journals (Sweden)

    Montpetit Alexandre

    2007-06-01

    Full Text Available Abstract Background New technologies have enabled genome-wide association studies to be conducted with hundreds of thousands of genotyped SNPs. Several different first-generation genome-wide panels of SNPs have been commercialized. The total amount of common genetic variation is still unknown; however, the coverage of commercial panels can be evaluated against reference population samples genotyped by the International HapMap project. Less information is available about coverage in samples from other populations. Results In this study we compare four commercial panels: the HumanHap 300 and HumanHap 550 Array Sets from the Illumina Infinium series and the Mapping 100 K and Mapping 500 K Array Sets from the Affymetrix GeneChip series. Tagging performance is compared among HapMap CEPH (CEU, Asian (JPT, CHB and Yoruba (YRI population samples. It is also evaluated in an Estonian population sample with more than 1000 individuals genotyped in two 500-kbp ENCODE regions of chromosome 2: ENr112 on 2p16.3 and ENr131 on 2p37.1. Conclusion We found that in a non-reference Caucasian population, commercial SNP panels provide levels of coverage similar to those in the HapMap CEPH population sample. We present the proportions of universal and population-specific SNPs in all the commercial platforms studied.

  5. Whole Genome Amplification in Genomic Analysis of Single Circulating Tumor Cells.

    Science.gov (United States)

    Gasch, Christin; Pantel, Klaus; Riethdorf, Sabine

    2015-01-01

    Investigation of the genome of organisms is one of the major basics in molecular biology to understand the complex organization of cells. While genomic DNA can easily be isolated from tissues or cell cultures of plant, animal or human origin, DNA extraction from single cells is still challenging. Here, we describe three techniques for the amplification of genomic DNA of fixed single circulating tumor cells (CTC) isolated from blood of cancer patients. This amplification is aimed to increase DNA amounts from those of one cell to yields sufficient for different DNA analyses such as mutational analysis including next-generation sequencing, array-comparative genome hybridization (CGH), and quantitative measurement of gene amplifications. Molecular analysis of CTC as liquid biopsy can be used to identify therapeutic targets in personalized medicine directed, e.g. against human epidermal growth factor receptor 2 (HER2) or epidermal growth factor receptor (EGFR) and to stratify the patients to those therapies.

  6. Molecular epidemiology of bovine rotaviruses. Characterization of rotaviruses isolated from diarrhoeic calves by genome profile analysis.

    Science.gov (United States)

    Legrottaglie, R; Rizzi, V; Agrimi, P

    1995-04-01

    Fifteen bovine rotavirus group A strains were isolated in several Italian regions over the period 1981-1989 from calves in ten neonatal diarrhoea outbreaks. The electrophoretical analysis of the genoma showed genomic variations and five different profiles were observed, including one with thirteen dsRNA segments. The finding of extra RNA fragments, with respect to the regular eleven genome segments, suggests the possibility of simultaneous or sequential infection by more than one electropherotype or a modification in the length of RNA segments during infection.

  7. Microsatellite analysis in the genome of Acanthaceae: An in silico approach

    Directory of Open Access Journals (Sweden)

    Priyadharsini Kaliswamy

    2015-01-01

    Full Text Available Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.

  8. Systematic pharmacogenomics analysis of a Malay whole genome: proof of concept for personalized medicine.

    Directory of Open Access Journals (Sweden)

    Mohd Zaki Salleh

    Full Text Available BACKGROUND: With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. METHODS: Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. PRINCIPAL FINDINGS: Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. CONCLUSIONS: The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.

  9. Chromosomes in the flow to simplify genome analysis.

    Science.gov (United States)

    Doležel, Jaroslav; Vrána, Jan; Safář, Jan; Bartoš, Jan; Kubaláková, Marie; Simková, Hana

    2012-08-01

    Nuclear genomes of human, animals, and plants are organized into subunits called chromosomes. When isolated into aqueous suspension, mitotic chromosomes can be classified using flow cytometry according to light scatter and fluorescence parameters. Chromosomes of interest can be purified by flow sorting if they can be resolved from other chromosomes in a karyotype. The analysis and sorting are carried out at rates of 10(2)-10(4) chromosomes per second, and for complex genomes such as wheat the flow sorting technology has been ground-breaking in reducing genome complexity for genome sequencing. The high sample rate provides an attractive approach for karyotype analysis (flow karyotyping) and the purification of chromosomes in large numbers. In characterizing the chromosome complement of an organism, the high number that can be studied using flow cytometry allows for a statistically accurate analysis. Chromosome sorting plays a particularly important role in the analysis of nuclear genome structure and the analysis of particular and aberrant chromosomes. Other attractive but not well-explored features include the analysis of chromosomal proteins, chromosome ultrastructure, and high-resolution mapping using FISH. Recent results demonstrate that chromosome flow sorting can be coupled seamlessly with DNA array and next-generation sequencing technologies for high-throughput analyses. The main advantages are targeting the analysis to a genome region of interest and a significant reduction in sample complexity. As flow sorters can also sort single copies of chromosomes, shotgun sequencing DNA amplified from them enables the production of haplotype-resolved genome sequences. This review explains the principles of flow cytometric chromosome analysis and sorting (flow cytogenetics), discusses the major uses of this technology in genome analysis, and outlines future directions.

  10. Association between chromosomal aberration of COX8C and tethered spinal cord syndrome: array-based comparative genomic hybridization analysis

    Directory of Open Access Journals (Sweden)

    Qiu-jiong Zhao

    2016-01-01

    Full Text Available Copy number variations have been found in patients with neural tube abnormalities. In this study, we performed genome-wide screening using high-resolution array-based comparative genomic hybridization in three children with tethered spinal cord syndrome and two healthy parents. Of eight copy number variations, four were non-polymorphic. These non-polymorphic copy number variations were associated with Angelman and Prader-Willi syndromes, and microcephaly. Gene function enrichment analysis revealed that COX8C, a gene associated with metabolic disorders of the nervous system, was located in the copy number variation region of Patient 1. Our results indicate that array-based comparative genomic hybridization can be used to diagnose tethered spinal cord syndrome. Our results may help determine the pathogenesis of tethered spinal cord syndrome and prevent occurrence of this disease.

  11. Evolutionary and Taxonomic Implications of Variation in Nuclear Genome Size: Lesson from the Grass Genus Anthoxanthum (Poaceae).

    Science.gov (United States)

    Chumová, Zuzana; Krejčíková, Jana; Mandáková, Terezie; Suda, Jan; Trávníček, Pavel

    2015-01-01

    The genus Anthoxanthum (sweet vernal grass, Poaceae) represents a taxonomically intricate polyploid complex with large phenotypic variation and its evolutionary relationships still poorly resolved. In order to get insight into the geographic distribution of ploidy levels and assess the taxonomic value of genome size data, we determined C- and Cx-values in 628 plants representing all currently recognized European species collected from 197 populations in 29 European countries. The flow cytometric estimates were supplemented by conventional chromosome counts. In addition to diploids, we found two low (rare 3x and common 4x) and one high (~16x-18x) polyploid levels. Mean holoploid genome sizes ranged from 5.52 pg in diploid A. alpinum to 44.75 pg in highly polyploid A. amarum, while the size of monoploid genomes ranged from 2.75 pg in tetraploid A. alpinum to 9.19 pg in diploid A. gracile. In contrast to Central and Northern Europe, which harboured only limited cytological variation, a much more complex pattern of genome sizes was revealed in the Mediterranean, particularly in Corsica. Eight taxonomic groups that partly corresponded to traditionally recognized species were delimited based on genome size values and phenotypic variation. Whereas our data supported the merger of A. aristatum and A. ovatum, eastern Mediterranean populations traditionally referred to as diploid A. odoratum were shown to be cytologically distinct, and may represent a new taxon. Autopolyploid origin was suggested for 4x A. alpinum. In contrast, 4x A. odoratum seems to be an allopolyploid, based on the amounts of nuclear DNA. Intraspecific variation in genome size was observed in all recognized species, the most striking example being the A. aristatum/ovatum complex. Altogether, our study showed that genome size can be a useful taxonomic marker in Anthoxathum to not only guide taxonomic decisions but also help resolve evolutionary relationships in this challenging grass genus.

  12. Analysis of intra-genomic GC content homogeneity within prokaryotes

    Directory of Open Access Journals (Sweden)

    Bohlin Jon

    2010-08-01

    Full Text Available Abstract Background Bacterial genomes possess varying GC content (total guanines (Gs and cytosines (Cs per total of the four bases within the genome but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content of the total genome. A low GCVAR indicates intra-genomic GC homogeneity and high GCVAR heterogeneity. Results The regression analyses indicated that GCVAR was significantly associated with domain (i.e. archaea or bacteria, phylum, and oxygen requirement. GCVAR was significantly higher among anaerobes than both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content was also found but appears to be non-linear and varies greatly among phyla. Conclusions Our findings show that GCVAR is linked with oxygen requirement, while mean genomic GC content is not. We therefore suggest that GCVAR should be used as a complement to mean GC content.

  13. Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum.

    Science.gov (United States)

    Miles, Alistair; Iqbal, Zamin; Vauterin, Paul; Pearson, Richard; Campino, Susana; Theron, Michel; Gould, Kelda; Mead, Daniel; Drury, Eleanor; O'Brien, John; Ruano Rubio, Valentin; MacInnis, Bronwyn; Mwangi, Jonathan; Samarakoon, Upeka; Ranford-Cartwright, Lisa; Ferdig, Michael; Hayton, Karen; Su, Xin-Zhuan; Wellems, Thomas; Rayner, Julian; McVean, Gil; Kwiatkowski, Dominic

    2016-09-01

    The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.

  14. The evolutionary imprint of domestication on genome variation and function of the filamentous fungus Aspergillus oryzae.

    Science.gov (United States)

    Gibbons, John G; Salichos, Leonidas; Slot, Jason C; Rinker, David C; McGary, Kriston L; King, Jonas G; Klich, Maren A; Tabb, David L; McDonald, W Hayes; Rokas, Antonis

    2012-08-01

    The domestication of animals, plants, and microbes fundamentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2-6]. Here, we systematically examined genome-wide sequence and functional variation between the domesticated fungus Aspergillus oryzae, whose saccharification abilities humans have harnessed for thousands of years to produce sake, soy sauce, and miso from starch-rich grains, and its wild relative A. flavus, a potentially toxigenic plant and animal pathogen [7]. We discovered dramatic changes in the sequence variation and abundance profiles of genes and wholesale primary and secondary metabolic pathways between domesticated and wild relative isolates during growth on rice. Our data suggest that, through selection by humans, an atoxigenic lineage of A. flavus gradually evolved into a "cell factory" for enzymes and metabolites involved in the saccharification process. These results suggest that whereas animal and plant domestication was largely driven by Neolithic "genetic tinkering" of developmental pathways, microbe domestication was driven by extensive remodeling of metabolism.

  15. Genomic analysis of hyperthermophilic archaea; Chokonetsusei kosaikin no genomu kaiseki

    Energy Technology Data Exchange (ETDEWEB)

    Kato, C. [Japan Marine Science and Technology Center, Kanagawa (Japan)

    1997-05-20

    Whole genome sequences of five strains of microorganisms have been reported up to the present and many genome analysis projects are in progress in the world. Among archaea (archaebacteria), the genome analysis of Methanococcus jannaschii have been completed and the sequencing data are opened to public. While 134 regulatory genes were identified in Synechocystis sp. PCC 6803 (eubacteria, 3.6 genome size), only 7 regulatory genes were identified in M. jannaschii (1.7Mb). Difference of the genome size is believed to correspond to the quantity of the environmental stresses. In Japan, the genome analysis project on a new hyperthermophilic archaeon, Pyrococcus horikoshii is in progress. P. horikoshii was isolated in a deep sea hydrothermal vent. It shows barophilic growth at maximum high temperature of 103degC under pressure of 30MPa. Thus, the genome analysis of barophilic hyperthermophilic archaea is expected to contribute to the understanding of the origin of life and evolution. 19 refs., 4 figs., 1 tab.

  16. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

    Science.gov (United States)

    Premzl, Marko

    2012-11-01

    The interferon-γ-inducible GTPases, IFGGs, are intracellular proteins involved in immune response against pathogens. A comprehensive comparative genomic review and analysis of eutherian IFGGs was carried out using public genomic sequences. The 64 eutherian IFGG genes were examined in detail and annotated. The eutherian IFGG promoter types were first catalogued followed by a phylogenetic analysis of eutherian IFGGs, which described five major IFGG clusters. The patterns of differential gene expansions and protein regions that may regulate IFGG catalytic features suggested a new classification of eutherian IFGGs. This mini-review has also provided new tests of reliability of public genomic sequences as well as tests of protein molecular evolution.

  17. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  18. Genomic and single nucleotide polymorphism analysis of infectious bronchitis coronavirus.

    Science.gov (United States)

    Abolnik, Celia

    2015-06-01

    Infectious bronchitis virus (IBV) is a Gammacoronavirus that causes a highly contagious respiratory disease in chickens. A QX-like strain was analysed by high-throughput Illumina sequencing and genetic variation across the entire viral genome was explored at the sub-consensus level by single nucleotide polymorphism (SNP) analysis. Thirteen open reading frames (ORFs) in the order 5'-UTR-1a-1ab-S-3a-3b-E-M-4b-4c-5a-5b-N-6b-3'UTR were predicted. The relative frequencies of missense: silent SNPs were calculated to obtain a comparative measure of variability in specific genes. The most variable ORFs in descending order were E, 3b, 5'UTR, N, 1a, S, 1ab, M, 4c, 5a, 6b. The E and 3b protein products play key roles in coronavirus virulence, and RNA folding demonstrated that the mutations in the 5'UTR did not alter the predicted secondary structure. The frequency of SNPs in the Spike (S) protein ORF of 0.67% was below the genomic average of 0.76%. Only three SNPS were identified in the S1 subunit, none of which were located in hypervariable region (HVR) 1 or HVR2. The S2 subunit was considerably more variable containing 87% of the polymorphisms detected across the entire S protein. The S2 subunit also contained a previously unreported multi-A insertion site and a stretch of four consecutive mutated amino acids, which mapped to the stalk region of the spike protein. Template-based protein structure modelling produced the first theoretical model of the IBV spike monomer. Given the lack of diversity observed at the sub-consensus level, the tenet that the HVRs in the S1 subunit are very tolerant of amino acid changes produced by genetic drift is questioned. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Mycobacterial species as case-study of comparative genome analysis.

    Science.gov (United States)

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  20. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    Science.gov (United States)

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L).

  1. Amplified fragment length polymorphism: an adept technique for genome mapping, genetic differentiation, and intraspecific variation in protozoan parasites.

    Science.gov (United States)

    Kumar, Awanish; Misra, Pragya; Dube, Anuradha

    2013-02-01

    With the advent of polymerase chain reaction (PCR), genetic markers are now accessible for all organisms, including parasites. Amplified fragment length polymorphism (AFLP) is a PCR-based marker for the rapid screening of genetic diversity and intraspecific variation. It is a potent fingerprinting technique for genomic DNAs of any origin or complexity and rapidly generates a number of highly replicable markers that allow high-resolution genotyping. AFLPs are convenient and reliable in comparison to other markers like random amplified polymorphic DNA, restriction fragment length polymorphism, and simple sequence repeat in terms of time and cost efficiency, reproducibility, and resolution as it does not require template DNA sequencing. In addition, AFLP essentially probes the entire genome at random, without prior sequence knowledge. So, AFLP markers have emerged as an advance type of genetic marker with broad application in genomic mapping, population genetics, and DNA fingerprinting and are ideally suited as screening tool for molecular markers linked with biological and clinical traits. This review describes the AFLP procedure and its applications and overview in the fingerprinting of a genome, which has been currently used in parasite genome research. We outline the AFLP procedure adapted for Leishmania genome study and discuss the benefits of AFLPs for assessing genetic variation and genome mapping over other existing molecular techniques. We highlight the possible use of AFLPs as genetic markers with its broad application in parasitological research because it allows random screening of the entire genome for linkage with genetic and clinical properties of the parasite. In this review, we have taken a pragmatic approach on the study of AFLP for genome mapping and polymorphism in protozoan parasites and conclude that AFLP is a very useful tool.

  2. Hyperstructures, genome analysis and I-cells

    DEFF Research Database (Denmark)

    Amar, P.; Ballet, P.; Barlovatz-Meimon, G.

    2002-01-01

    New concepts may prove necessary to profit from the avalanche of sequence data on the genome, transcriptome, proteome and interactome and to relate this information to cell physiology. Here, we focus on the concept of large activity-based structures, or hyperstructures, in which a variety of type...

  3. Genome-wide association studies of mri-defined brain infarcts: Meta-analysis from the charge consortium

    OpenAIRE

    2010-01-01

    textabstractBackground and Purpose-Previous studies examining genetic associations with MRI-defined brain infarct have yielded inconsistent findings. We investigated genetic variation underlying covert MRI infarct in persons without histories of transient ischemic attack or stroke. We performed meta-analysis of genome-wide association studies of white participants in 6 studies comprising the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. Methods-Using 2.2 mi...

  4. Whole genome sequencing analysis of Plasmodium vivax using whole genome capture

    Directory of Open Access Journals (Sweden)

    Bright A

    2012-06-01

    Full Text Available Abstract Background Malaria caused by Plasmodium vivax is an experimentally neglected severe disease with a substantial burden on human health. Because of technical limitations, little is known about the biology of this important human pathogen. Whole genome analysis methods on patient-derived material are thus likely to have a substantial impact on our understanding of P. vivax pathogenesis and epidemiology. For example, it will allow study of the evolution and population biology of the parasite, allow parasite transmission patterns to be characterized, and may facilitate the identification of new drug resistance genes. Because parasitemias are typically low and the parasite cannot be readily cultured, on-site leukocyte depletion of blood samples is typically needed to remove human DNA that may be 1000X more abundant than parasite DNA. These features have precluded the analysis of archived blood samples and require the presence of laboratories in close proximity to the collection of field samples for optimal pre-cryopreservation sample preparation. Results Here we show that in-solution hybridization capture can be used to extract P. vivax DNA from human contaminating DNA in the laboratory without the need for on-site leukocyte filtration. Using a whole genome capture method, we were able to enrich P. vivax DNA from bulk genomic DNA from less than 0.5% to a median of 55% (range 20%-80%. This level of enrichment allows for efficient analysis of the samples by whole genome sequencing and does not introduce any gross biases into the data. With this method, we obtained greater than 5X coverage across 93% of the P. vivax genome for four P. vivax strains from Iquitos, Peru, which is similar to our results using leukocyte filtration (greater than 5X coverage across 96% . Conclusion The whole genome capture technique will enable more efficient whole genome analysis of P. vivax from a larger geographic region and from valuable archived sample collections.

  5. Strategies for Integrated Analysis of Genetic, Epigenetic, and Gene Expression Variation in Cancer: Addressing the Challenges

    DEFF Research Database (Denmark)

    Thingholm, Louise Bruun; Andersen, Lars; Makalic, Enes

    2016-01-01

    The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis....... However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction...... to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured...

  6. Genomic Copy Number Variations of the Complement Component C4B Gene Are Associated With Chronic Central Serous Chorioretinopathy

    NARCIS (Netherlands)

    Breukink, M.B.; Schellevis, R.L.; Boon, C.J.F.; Fauser, S.; Hoyng, C.B.; Hollander, A.I. den; Jong, E.K.

    2015-01-01

    PURPOSE: Chronic central serous chorioretinopathy (cCSC) has recently been associated to variants in the complement factor H gene. To further investigate the role of the complement system in cCSC, the genomic copy number variations in the complement component 4 gene (C4) were studied. METHODS: C4A

  7. Genomic Copy Number Variations of the Complement Component C4B Gene Are Associated With Chronic Central Serous Chorioretinopathy

    NARCIS (Netherlands)

    Breukink, M.B.; Schellevis, R.L.; Boon, C.J.F.; Fauser, S.; Hoyng, C.B.; Hollander, A.I. den; Jong, E.K.

    2015-01-01

    PURPOSE: Chronic central serous chorioretinopathy (cCSC) has recently been associated to variants in the complement factor H gene. To further investigate the role of the complement system in cCSC, the genomic copy number variations in the complement component 4 gene (C4) were studied. METHODS: C4A a

  8. Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

    NARCIS (Netherlands)

    Kerstens, H.H.D.; Crooijmans, R.P.M.A.; Dibbits, B.W.; Vereijken, A.; Okimoto, R.; Groenen, M.A.M.

    2011-01-01

    Background Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and

  9. Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber

    NARCIS (Netherlands)

    Zhang, Z.; Mao, L.; Chen, Junshi; Bu, F.; Li, G.; Sun, J.; Li, S.; Sun, H.; Jiao, C.; Blakely, R.; Pan, J.; Cai, R.; Luo, R.; Peer, Van de Y.; Jacobsen, E.; Fei, Z.; Huang, S.

    2015-01-01

    Structural variations (SVs) represent a major source of genetic diversity. However, the functional impact and formation mechanisms of SVs in plant genomes remain largely unexplored. Here, we report a nucleotide-resolution SV map of cucumber (Cucumis sativas) that comprises 26,788 SVs based on deep r

  10. Integration of genomic approaches to uncover sources of variation in age at puberty and reproductive longevity in sows

    Science.gov (United States)