Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.
Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human
Full Text Available Niche adaptation has long been recognized to drive intra-species differentiation and speciation, yet knowledge about its relatedness with hereditary variation of microbial genomes is relatively limited. Using Leptospirillum ferriphilum species as a case study, we present a detailed analysis of genomic features of five recognized strains. Genome-to-genome distance calculation preliminarily determined the roles of spatial distance and environmental heterogeneity that potentially contribute to intra-species variation within L. ferriphilum species at the genome level. Mathematical models were further constructed to extrapolate the expansion of L. ferriphilum genomes (an ‘open’ pan-genome, indicating the emergence of novel genes with new sequenced genomes. The identification of diverse mobile genetic elements (MGEs (such as transposases, integrases, and phage-associated genes revealed the prevalence of horizontal gene transfer events, which is an important evolutionary mechanism that provides avenues for the recruitment of novel functionalities and further for the genetic divergence of microbial genomes. Comprehensive analysis also demonstrated that the genome reduction by gene loss in a broad sense might contribute to the observed diversification. We thus inferred a plausible explanation to address this observation: the community-dependent adaptation that potentially economizes the limiting resources of the entire community. Now that the introduction of new genes is accompanied by a parallel abandonment of some other ones, our results provide snapshots on the biological fitness cost of environmental adaptation within the L. ferriphilum genomes. In short, our genome-wide analyses bridge the relation between genetic variation of L. ferriphilum with its evolutionary adaptation.
Full Text Available BACKGROUND: The genus Camellia, belonging to the family Theaceae, is economically important group in flowering plants. Frequent interspecific hybridization together with polyploidization has made them become taxonomically "difficult taxa". The DNA content is often used to measure genome size variation and has largely advanced our understanding of plant evolution and genome variation. The goals of this study were to investigate patterns of interspecific and intraspecific variation of DNA contents and further explore genome size evolution in a phylogenetic context of the genus. METHODOLOGY/PRINCIPAL FINDINGS: The DNA amount in the genus was determined by using propidium iodide flow cytometry analysis for a total of 139 individual plants representing almost all sections of the two subgenera, Camellia and Thea. An improved WPB buffer was proven to be suitable for the Camellia species, which was able to counteract the negative effects of secondary metabolite and generated high-quality results with low coefficient of variation values (CV <5%. Our results showed trivial effects on different tissues of flowers, leaves and buds as well as cytosolic compounds on the estimation of DNA amount. The DNA content of C. sinensis var. assamica was estimated to be 1C = 3.01 pg by flow cytometric analysis, which is equal to a genome size of about 2940 Mb. CONCLUSION: Intraspecific and interspecific variations were observed in the genus Camellia, and as expected, the latter was larger than the former. Our study suggests a directional trend of increasing genome size in the genus Camellia probably owing to the frequent polyploidization events.
Full Text Available Deedu (DU Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR, neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1, as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG. Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1 shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments.
Zhang, Yanliang; Xu, Qiuyue; Cai, Xuemei; Li, Yixun; Song, Guibo; Wang, Juan; Zhang, Rongchen; Dai, Yong; Duan, Yong
To analyze genomic copy number variations (CNVs) in two sisters with primary amenorrhea and hyperandrogenism. G-banding was performed for karyotype analysis. The whole genome of the two sisters were scanned and analyzed by array-based comparative genomic hybridization (array-CGH). The results were confirmed with real-time quantitative PCR (RT-qPCR). No abnormality was found by conventional G-banded chromosome analysis. Array-CGH has identified 11 identical CNVs from the sisters which, however, overlapped with CNVs reported by the Database of Genomic Variants (http://projects.tcag.ca/variation/). Therefore, they are likely to be benign. In addition, a -8.44 Mb 9p11.1-p13.1 duplication (38,561,587-47,002,387 bp, hg18) and a -80.9 kb 4q13.2 deletion (70,183,990-70,264,889 bp, hg18) were also detected in the elder and younger sister, respectively. The relationship between such CNVs and primary amenorrhea and hyperandrogenism was however uncertain. RT-qPCR results were in accordance with array-CGH. Two CNVs were detected in two sisters by array-CGH, for which further studies are needed to clarify their correlation with primary amenorrhea and hyperandrogenism.
Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti
With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most
Spain, S L; Pedroso, I; Kadeva, N; Miller, M B; Iacono, W G; McGue, M; Stergiakouli, E; Davey Smith, G; Putallaz, M; Lubinski, D; Meaburn, E L; Plomin, R; Simpson, M A
Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case-control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence.
Full Text Available Abstract Background The recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C, along with the genomes of laboratory strains (H37Rv and H37Ra, provides new insights on the mechanisms of adaptation of this bacterium to the human host. Findings The genetic variation found in six M. tuberculosis strains does not involve significant genomic rearrangements. Most of the variation results from deletion and transposition events preferentially associated with insertion sequences and genes of the PE/PPE family but not with genes implicated in virulence. Using a Perl-based software islandsanalyser, which creates a representation of the genetic variation in the genome, we identified differences in the patterns of distribution and frequency of the polymorphisms across the genome. The identification of genes displaying strain-specific polymorphisms and the extrapolation of the number of strain-specific polymorphisms to an unlimited number of genomes indicates that the different strains contain a limited number of unique polymorphisms. Conclusion The comparison of multiple genomes demonstrates that the M. tuberculosis genome is currently undergoing an active process of gene decay, analogous to the adaptation process of obligate bacterial symbionts. This observation opens new perspectives into the evolution and the understanding of the pathogenesis of this bacterium.
Peng, Ting; Wang, Li; Li, Guisen
The APOL1 gene variants has been shown to be associated with an increased risk of multiple kinds of diseases, particularly in African Americans, but not in Caucasians and Asians. In this study, we explored the single nucleotide polymorphism (SNP) and haplotype diversity of APOL1 gene in different races provided by 1000 Genomes project. Variants of APOL1 gene in 1000 Genome Project were obtained and SNPs located in the regulatory region or coding region were selected for genetic variation analysis. Total 2504 individuals from 26 populations were classified as four groups that included Africa, Europe, Asia and Admixed populations. Tag SNPs were selected to evaluate the haplotype diversities in the four populations by HaploStats software. APOL1 gene was surrounded by some of the most polymorphic genes in the human genome, variation of APOL1 gene was common, with up to 613 SNP (1000 Genome Project reported) and 99 of them (16.2%) with MAF ≥ 1%. There were 79 SNPs in the URR and 92 SNPs in 3'UTR. Total 12 SNPs in URR and 24 SNPs in 3'UTR were considered as common variants with MAF ≥ 1%. It is worth noting that URR-1 was presents lower frequencies in European populations, while other three haplotypes taken an opposite pattern; 3'UTR presents several high-frequency variation sites in a short segment, and the differences of its haplotypes among different population were significant (P < 0.01), UTR-1 and UTR-5 presented much higher frequency in African population, while UTR-2, UTR-3 and UTR-4 were much lower. APOL1 coding region showed that two SNP of G1 with higher frequency are actually pull down the haplotype H-1 frequency when considering all populations pooled together, and the diversity among the four populations be widen by the G1 two mutation (P 1 = 3.33E-4 vs P 2 = 3.61E-30). The distributions of APOL1 gene variants and haplotypes were significantly different among the different populations, in either regulatory or coding regions. It could provide
Walker, Joseph F; Zanis, Michael J; Emery, Nancy C
Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.
Walker M Andrew
Full Text Available Abstract Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c, 54 (Dixon, 83 (Ann1 and 9 (Temecula-1. A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes
Jenny van Dongen
Full Text Available DNA methylation is one of the most extensively studied epigenetic marks in humans. Yet, it is largely unknown what causes variation in DNA methylation between individuals. The comparison of DNA methylation profiles of monozygotic (MZ twins offers a unique experimental design to examine the extent to which such variation is related to individual-specific environmental influences and stochastic events or to familial factors (DNA sequence and shared environment. We measured genome-wide DNA methylation in buccal samples from ten MZ pairs (age 8–19 using the Illumina 450k array and examined twin correlations for methylation level at 420,921 CpGs after QC. After selecting CpGs showing the most variation in the methylation level between subjects, the mean genome-wide correlation (rho was 0.54. The correlation was higher, on average, for CpGs within CpG islands (CGIs, compared to CGI shores, shelves and non-CGI regions, particularly at hypomethylated CpGs. This finding suggests that individual-specific environmental and stochastic influences account for more variation in DNA methylation in CpG-poor regions. Our findings also indicate that it is worthwhile to examine heritable and shared environmental influences on buccal DNA methylation in larger studies that also include dizygotic twins.
Schilter, K F; Reis, L M; Schneider, A; Bardakjian, T M; Abdul-Rahman, O; Kozel, B A; Zimmerman, H H; Broeckel, U; Semina, E V
Anophthalmia/microphthalmia (A/M) represent severe developmental ocular malformations. Currently, mutations in known genes explain less than 40% of A/M cases. We performed whole-genome copy number variation analysis in 60 patients affected with isolated or syndromic A/M. Pathogenic deletions of 3q26 (SOX2) were identified in four independent patients with syndromic microphthalmia. Other variants of interest included regions with a known role in human disease (likely pathogenic) as well as novel rearrangements (uncertain significance). A 2.2-Mb duplication of 3q29 in a patient with non-syndromic anophthalmia and an 877-kb duplication of 11p13 (PAX6) and a 1.4-Mb deletion of 17q11.2 (NF1) in two independent probands with syndromic microphthalmia and other ocular defects were identified; while ocular anomalies have been previously associated with 3q29 duplications, PAX6 duplications, and NF1 mutations in some cases, the ocular phenotypes observed here are more severe than previously reported. Three novel regions of possible interest included a 2q14.2 duplication which cosegregated with microphthalmia/microcornea and congenital cataracts in one family, and 2q21 and 15q26 duplications in two additional cases; each of these regions contains genes that are active during vertebrate ocular development. Overall, this study identified causative copy number mutations and regions with a possible role in ocular disease in 17% of A/M cases. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Cardone Maria Francesca
Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.
Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gräf, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio
Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi
With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as
Maiden Martin CJ
Full Text Available Abstract Background The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner. Results The Bacterial Isolate Genome Sequence Database (BIGSDB is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus. The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/. Conclusions Genomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies. BIGSDB
Fadista, João; Thomsen, Bo; Holm, Lars-Erik
to genetic variation in cattle. Results We designed and used a set of NimbleGen CGH arrays that tile across the assayable portion of the cattle genome with approximately 6.3 million probes, at a median probe spacing of 301 bp. This study reports the highest resolution map of copy number variation...... in the cattle genome, with 304 CNV regions (CNVRs) being identified among the genomes of 20 bovine samples from 4 dairy and beef breeds. The CNVRs identified covered 0.68% (22 Mb) of the genome, and ranged in size from 1.7 to 2,031 kb (median size 16.7 kb). About 20% of the CNVs co-localized with segmental...... duplications, while 30% encompass genes, of which the majority is involved in environmental response. About 10% of the human orthologous of these genes are associated with human disease susceptibility and, hence, may have important phenotypic consequences. Conclusions Together, this analysis provides a useful...
Gaelen R Burke
Full Text Available Insect parasitoids must complete part of their life cycle within or on another insect, ultimately resulting in the death of the host insect. One group of parasitoid wasps, the 'microgastroid complex' (Hymenoptera: Braconidae, engage in an association with beneficial symbiotic viruses that are essential for successful parasitism of hosts. These viruses, known as Bracoviruses, persist in an integrated form in the wasp genome, and activate to replicate in wasp ovaries during development to ultimately be delivered into host insects during parasitism. The lethal nature of host-parasitoid interactions, combined with the involvement of viruses in mediating these interactions, has led to the hypothesis that Bracoviruses are engaged in an arms race with hosts, resulting in recurrent adaptation in viral (and host genes. Deep sequencing was employed to characterize sequence variation across the encapsidated Bracovirus genome within laboratory and field populations of the parasitoid wasp species Microplitis demolitor. Contrary to expectations, there was a paucity of evidence for positive directional selection among virulence genes, which generally exhibited signatures of purifying selection. These data suggest that the dynamics of host-parasite interactions may not result in recurrent rounds of adaptation, and that adaptation may be more variable in time than previously expected.
Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gr?f, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio
Abstract High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic...
Cardoso, Joao; Andersen, Mikael Rørdam; Herrgard, Markus
scale and resolution by re-sequencing thousands of strains systematically. In this article, we review challenges in the integration and analysis of large-scale re-sequencing data, present an extensive overview of bioinformatics methods for predicting the effects of genetic variants on protein function......Genetic variation is the motor of evolution and allows organisms to overcome the environmental challenges they encounter. It can be both beneficial and harmful in the process of engineering cell factories for the production of proteins and chemicals. Throughout the history of biotechnology......, there have been efforts to exploit genetic variation in our favor to create strains with favorable phenotypes. Genetic variation can either be present in natural populations or it can be artificially created by mutagenesis and selection or adaptive laboratory evolution. On the other hand, unintended genetic...
Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology’s GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.
Matarin, Mar; Simon-Sanchez, Javier; Fung, Hon-Chung; Scholz, Sonja; Gibbs, J. Raphael; Hernandez, Dena G.; Crews, Cynthia; Britton, Angela; Wavrant De Vrieze, Fabienne; Brott, Thomas G.; Brown, Robert D.; Worrall, Bradford B.; Silliman, Scott; Case, L. Douglas; Hardy, John A.; Rich, Stephen S.; Meschia, James F.; Singleton, Andrew B.
Technological advances in molecular genetics allow rapid and sensitive identification of genomic copy number variants (CNVs). This, in turn, has sparked interest in the function such variation may play in disease. While a role for copy number mutations as a cause of Mendelian disorders is well established, it is unclear whether CNVs may affect risk for common complex disorders. We sought to investigate whether CNVs may modulate risk for ischemic stroke (IS) and to provide a catalog of CNVs in patients with this disorder by analyzing copy number metrics produced as a part of our previous genome-wide single-nucleotide polymorphism (SNP)-based association study of ischemic stroke in a North American white population. We examined CNVs in 263 patients with ischemic stroke (IS). Each identified CNV was compared with changes identified in 275 neurologically normal controls. Our analysis identified 247 CNVs, corresponding to 187 insertions (76%; 135 heterozygous; 25 homozygous duplications or triplications; 2 heterosomic) and 60 deletions (24%; 40 heterozygous deletions;3 homozygous deletions; 14 heterosomic deletions). Most alterations (81%) were the same as, or overlapped with, previously reported CNVs. We report here the first genome-wide analysis of CNVs in IS patients. In summary, our study did not detect any common genomic structural variation unequivocally linked to IS, although we cannot exclude that smaller CNVs or CNVs in genomic regions poorly covered by this methodology may confer risk for IS. The application of genome-wide SNP arrays now facilitates the evaluation of structural changes through the entire genome as part of a genome-wide genetic association study. PMID:18288507
Full Text Available Congenital heart defect (CHD is the most common cause of death from congenital anomaly. Among several candidate epigenetic mechanisms, DNA methylation may play an important role in the etiology of CHDs. We conducted a genome-wide DNA methylation analysis using an Illumina Infinium 450k human methylation assay in a cohort of 24 newborns who had aortic valve stenosis (AVS, with gestational-age matched controls. The study identified significantly-altered CpG methylation at 59 sites in 52 genes in AVS subjects as compared to controls (either hypermethylated or demethylated. Gene Ontology analysis identified biological processes and functions for these genes including positive regulation of receptor-mediated endocytosis. Consistent with prior clinical data, the molecular function categories as determined using DAVID identified low-density lipoprotein receptor binding, lipoprotein receptor binding and identical protein binding to be over-represented in the AVS group. A significant epigenetic change in the APOA5 and PCSK9 genes known to be involved in AVS was also observed. A large number CpG methylation sites individually demonstrated good to excellent diagnostic accuracy for the prediction of AVS status, thus raising possibility of molecular screening markers for this disorder. Using epigenetic analysis we were able to identify genes significantly involved in the pathogenesis of AVS.
Radhakrishna, Uppala; Albayrak, Samet; Alpay-Savasan, Zeynep; Zeb, Amna; Turkoglu, Onur; Sobolewski, Paul; Bahado-Singh, Ray O
Congenital heart defect (CHD) is the most common cause of death from congenital anomaly. Among several candidate epigenetic mechanisms, DNA methylation may play an important role in the etiology of CHDs. We conducted a genome-wide DNA methylation analysis using an Illumina Infinium 450k human methylation assay in a cohort of 24 newborns who had aortic valve stenosis (AVS), with gestational-age matched controls. The study identified significantly-altered CpG methylation at 59 sites in 52 genes in AVS subjects as compared to controls (either hypermethylated or demethylated). Gene Ontology analysis identified biological processes and functions for these genes including positive regulation of receptor-mediated endocytosis. Consistent with prior clinical data, the molecular function categories as determined using DAVID identified low-density lipoprotein receptor binding, lipoprotein receptor binding and identical protein binding to be over-represented in the AVS group. A significant epigenetic change in the APOA5 and PCSK9 genes known to be involved in AVS was also observed. A large number CpG methylation sites individually demonstrated good to excellent diagnostic accuracy for the prediction of AVS status, thus raising possibility of molecular screening markers for this disorder. Using epigenetic analysis we were able to identify genes significantly involved in the pathogenesis of AVS.
Simmons, Sheri L; Dibartolo, Genevieve; Denef, Vincent J; Goltsman, Daniela S Aliaga; Thelen, Michael P; Banfield, Jillian F
Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of
Sheri L Simmons
Full Text Available Deeply sampled community genomic (metagenomic datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x. The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the
Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang
Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research a...
Li, Bi Jun; Li, Hong Lian; Meng, Zining; Zhang, Yong; Lin, Haoran; Yue, Gen Hua; Xia, Jun Hong
Discovering the nature and pattern of genome variation is fundamental in understanding phenotypic diversity among populations. Although several millions of single nucleotide polymorphisms (SNPs) have been discovered in tilapia, the genome-wide characterization of larger structural variants, such as copy number variation (CNV) regions has not been carried out yet. We conducted a genome-wide scan for CNVs in 47 individuals from three tilapia populations. Based on 254 Gb of high-quality paired-end sequencing reads, we identified 4642 distinct high-confidence CNVs. These CNVs account for 1.9% (12.411 Mb) of the used Nile tilapia reference genome. A total of 1100 predicted CNVs were found overlapping with exon regions of protein genes. Further association analysis based on linear model regression found 85 CNVs ranging between 300 and 27,000 base pairs significantly associated to population types (R 2 > 0.9 and P > 0.001). Our study sheds first insights on genome-wide CNVs in tilapia. These CNVs among and within tilapia populations may have functional effects on phenotypes and specific adaptation to particular environments.
Yau, C; Papaspiliopoulos, O; Roberts, G O; Holmes, C
We consider the development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models. Our approach uses a Mixture of Dirichlet Process (MDP) model for the unknown sampling distribution (likelihood) for the observations arising in each state and a computationally efficient data augmentation scheme to aid inference. The method uses novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables. The methodology is computationally efficient, both in terms of MCMC mixing properties, and robustness to the length of the time series being investigated. Moreover, the method is easy to implement requiring little or no user-interaction. We apply our methodology to the analysis of genomic copy number variation.
Harrison, Abby; Lemey, Philippe; Hurles, Matthew; Moyes, Chris; Horn, Susanne; Pryor, Jan; Malani, Joji; Supuri, Mathias; Masta, Andrew; Teriboriki, Burentau; Toatu, Tebuka; Penny, David; Rambaut, Andrew; Shapiro, Beth
Hepatitis B virus (HBV) genomes are small, semi-double-stranded DNA circular genomes that contain alternating overlapping reading frames and replicate through an RNA intermediary phase. This complex biology has presented a challenge to estimating an evolutionary rate for HBV, leading to difficulties resolving the evolutionary and epidemiological history of the virus. Here, we re-examine rates of HBV evolution using a novel data set of 112 within-host, transmission history (pedigree) and among-host genomes isolated over 20 years from the indigenous peoples of the South Pacific, combined with 313 previously published HBV genomes. We employ Bayesian phylogenetic approaches to examine several potential causes and consequences of evolutionary rate variation in HBV. Our results reveal rate variation both between genotypes and across the genome, as well as strikingly slower rates when genomes are sampled in the Hepatitis B e antigen positive state, compared to the e antigen negative state. This Hepatitis B e antigen rate variation was found to be largely attributable to changes during the course of infection in the preCore and Core genes and their regulatory elements. PMID:21765983
Rao, Y S; Li, J; Zhang, R; Lin, X R; Xu, J G; Xie, L; Xu, Z Q; Wang, L; Gan, J K; Xie, X J; He, J; Zhang, X Q
Copy number variation (CNV) is an important source of genetic variation in organisms and a main factor that affects phenotypic variation. A comprehensive study of chicken CNV can provide valuable information on genetic diversity and facilitate future analyses of associations between CNV and economically important traits in chickens. In the present study, an F2 full-sib chicken population (554 individuals), established from a cross between Xinghua and White Recessive Rock chickens, was used to explore CNV in the chicken genome. Genotyping was performed using a chicken 60K SNP BeadChip. A total of 1,875 CNV were detected with the PennCNV algorithm, and the average number of CNV was 3.42 per individual. The CNV were distributed across 383 independent CNV regions (CNVR) and covered 41 megabases (3.97%) of the chicken genome. Seven CNVR in 108 individuals were validated by quantitative real-time PCR, and 81 of these individuals (75%) also were detected with the PennCNV algorithm. In total, 274 CNVR (71.54%) identified in the current study were previously reported. Of these, 147 (38.38%) were reported in at least 2 studies. Additionally, 109 of the CNVR (28.46%) discovered here are novel. A total of 709 genes within or overlapping with the CNVR was retrieved. Out of the 2,742 quantitative trait loci (QTL) collected in the chicken QTL database, 43 QTL had confidence intervals overlapping with the CNVR, and 32 CNVR encompassed one or more functional genes. The functional genes located in the CNVR are likely to be the QTG that are associated with underlying economic traits. This study considerably expands our insight into the structural variation in the genome of chickens and provides an important resource for genomic variation, especially for genomic structural variation related to economic traits in chickens. © 2016 Poultry Science Association Inc.
van Dongen, J.; Ehli, E.A.; Slieker, R.C.; Bartels, M.; Weber, Z.M.; Davies, G.E.; Slagboom, P.E.; Heijmans, B.T.; Boomsma, D.I.
DNA methylation is one of the most extensively studied epigenetic marks in humans. Yet, it is largely unknown what causes variation in DNA methylation between individuals. The comparison of DNA methylation profiles of monozygotic (MZ) twins offers a unique experimental design to examine the extent
HE Jie; GAO Yong; SUN Ye-qing
To analyze the biological effects of space environment, the diversity of genomic DNA between the space flight soybean 194(4126) with phenotype of good yield and good fruit quality induced by space flight and the soybean with ground control was studied by amplified fragment length polymorphism (AFLP) method, and the polymorphism of space flight soybean 194(4126) was 3.56%. The differences of protein expression of seeds and leaves between the two kinds of soybeans were analysed by two-dimensional electrophoresis, PDQuest software and MALDI-TOF mass spectrometry. Results show that the loss and decrease of protein expression in 194(4126) soybean are subjected to the space fight of seeds, and three special proteins including Dehydrin, MAT1 and ceQORH are identified. It is concluded that the space environment changes the phenotype and geno-type of soybeans due to the space flight of seeds.
U.S. Environmental Protection Agency — The data is in the form of genomic sequences deposited in a public database, growth curves, and bioinformatic analysis of sequences. This dataset is associated with...
Wesolowska, Agata; Schmiegelow, Kjeld
Genomic variation is the basis of interindividual differences in observable traits and disease susceptibility. Genetic studies are the driving force of personalized medicine, as many of the differences in treatment efficacy can be attributed to our genomic background. The rapid development...... a considerable amount of the phenotype variability, hence the major difficulty of interpretation lies in the complexity of molecular interactions. This PhD thesis describes the state-of-art of the functional human variation research (Chapter 1) and introduces childhood acute lymphoblastic leukaemia (ALL...... the thesis and includes some final remarks on the perspectives of genomic variation research and personalized medicine. In summary, this thesis demonstrates the feasibility of integrative analyses of genomic variations and introduces large-scale hypothesis-driven SNP exploration studies as an emerging...
Li, Xiu-Qing; Du, Donglei
C+G content (GC content or G+C content) is known to be correlated with genome/chromosome size in bacteria but the relationship for other kingdoms remains unclear. This study analyzed genome size, chromosome size, and base composition in most of the available sequenced genomes in various kingdoms. Genome size tends to increase during evolution in plants and animals, and the same is likely true for bacteria. The genomic C+G contents were found to vary greatly in microorganisms but were quite similar within each animal or plant subkingdom. In animals and plants, the C+G contents are ranked as follows: monocot plants>mammals>non-mammalian animals>dicot plants. The variation in C+G content between chromosomes within species is greater in animals than in plants. The correlation between average chromosome C+G content and chromosome length was found to be positive in Proteobacteria, Actinobacteria (but not in other analyzed bacterial phyla), Ascomycota fungi, and likely also in some plants; negative in some animals, insignificant in two protist phyla, and likely very weak in Archaea. Clearly, correlations between C+G content and chromosome size can be positive, negative, or not significant depending on the kingdoms/groups or species. Different phyla or species exhibit different patterns of correlation between chromosome-size and C+G content. Most chromosomes within a species have a similar pattern of variation in C+G content but outliers are common. The data presented in this study suggest that the C+G content is under genetic control by both trans- and cis- factors and that the correlation between C+G content and chromosome length can be positive, negative, or not significant in different phyla. PMID:24551092
Periwal, Vinita; Scaria, Vinod
Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
Davies, G; Armstrong, N; Bis, J C; Bressler, J; Chouraki, V; Giddaluru, S; Hofer, E; Ibrahim-Verbaas, C A; Kirin, M; Lahti, J; van der Lee, S J; Le Hellard, S; Liu, T; Marioni, R E; Oldmeadow, C; Postmus, I; Smith, A V; Smith, J A; Thalamuthu, A; Thomson, R; Vitart, V; Wang, J; Yu, L; Zgaga, L; Zhao, W; Boxall, R; Harris, S E; Hill, W D; Liewald, D C; Luciano, M; Adams, H; Ames, D; Amin, N; Amouyel, P; Assareh, A A; Au, R; Becker, J T; Beiser, A; Berr, C; Bertram, L; Boerwinkle, E; Buckley, B M; Campbell, H; Corley, J; De Jager, P L; Dufouil, C; Eriksson, J G; Espeseth, T; Faul, J D; Ford, I; Scotland, Generation; Gottesman, R F; Griswold, M E; Gudnason, V; Harris, T B; Heiss, G; Hofman, A; Holliday, E G; Huffman, J; Kardia, S L R; Kochan, N; Knopman, D S; Kwok, J B; Lambert, J-C; Lee, T; Li, G; Li, S-C; Loitfelder, M; Lopez, O L; Lundervold, A J; Lundqvist, A; Mather, K A; Mirza, S S; Nyberg, L; Oostra, B A; Palotie, A; Papenberg, G; Pattie, A; Petrovic, K; Polasek, O; Psaty, B M; Redmond, P; Reppermund, S; Rotter, J I; Schmidt, H; Schuur, M; Schofield, P W; Scott, R J; Steen, V M; Stott, D J; van Swieten, J C; Taylor, K D; Trollor, J; Trompet, S; Uitterlinden, A G; Weinstein, G; Widen, E; Windham, B G; Jukema, J W; Wright, A F; Wright, M J; Yang, Q; Amieva, H; Attia, J R; Bennett, D A; Brodaty, H; de Craen, A J M; Hayward, C; Ikram, M A; Lindenberger, U; Nilsson, L-G; Porteous, D J; Räikkönen, K; Reinvang, I; Rudan, I; Sachdev, P S; Schmidt, R; Schofield, P R; Srikanth, V; Starr, J M; Turner, S T; Weir, D R; Wilson, J F; van Duijn, C; Launer, L; Fitzpatrick, A L; Seshadri, S; Mosley, T H; Deary, I J
General cognitive function is substantially heritable across the human life course from adolescence to old age. We investigated the genetic contribution to variation in this important, health- and well-being-related trait in middle-aged and older adults. We conducted a meta-analysis of genome-wide association studies of 31 cohorts (N=53 949) in which the participants had undertaken multiple, diverse cognitive tests. A general cognitive function phenotype was tested for, and created in each cohort by principal component analysis. We report 13 genome-wide significant single-nucleotide polymorphism (SNP) associations in three genomic regions, 6q16.1, 14q12 and 19q13.32 (best SNP and closest gene, respectively: rs10457441, P=3.93 × 10−9, MIR2113; rs17522122, P=2.55 × 10−8, AKAP6; rs10119, P=5.67 × 10−9, APOE/TOMM40). We report one gene-based significant association with the HMGN1 gene located on chromosome 21 (P=1 × 10−6). These genes have previously been associated with neuropsychiatric phenotypes. Meta-analysis results are consistent with a polygenic model of inheritance. To estimate SNP-based heritability, the genome-wide complex trait analysis procedure was applied to two large cohorts, the Atherosclerosis Risk in Communities Study (N=6617) and the Health and Retirement Study (N=5976). The proportion of phenotypic variation accounted for by all genotyped common SNPs was 29% (s.e.=5%) and 28% (s.e.=7%), respectively. Using polygenic prediction analysis, ~1.2% of the variance in general cognitive function was predicted in the Generation Scotland cohort (N=5487; P=1.5 × 10−17). In hypothesis-driven tests, there was significant association between general cognitive function and four genes previously associated with Alzheimer's disease: TOMM40, APOE, ABCG1 and MEF2C. PMID:25644384
Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang
The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang
Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. PMID:29069473
Schloissnig, Siegfried; Arumugam, Manimozhiyan; Sunagawa, Shinichi
Whereas large-scale efforts have rapidly advanced the understanding and practical impact of human genomic variation, the practical impact of variation is largely unexplored in the human microbiome. We therefore developed a framework for metagenomic variation analysis and applied it to 252 faecal...... polymorphism rates of 0.11 was more variable between gut microbial species than across human hosts. Subjects sampled at varying time intervals exhibited individuality and temporal stability of SNP variation patterns, despite considerable composition changes of their gut microbiota. This indicates...
Bandrés-Ciga, Sara; Ruz, Clara; Barrero, Francisco J; Escamilla-Sevilla, Francisco; Pelegrina, Javier; Vives, Francisco; Duran, Raquel
Parkinson's disease (PD) is the second most common neurodegenerative disease, whose prevalence is projected to be between 8.7 and 9.3 million by 2030. Until about 20 years ago, PD was considered to be the textbook example of a "non-genetic" disorder. Nowadays, PD is generally considered a multifactorial disorder that arises from the combination and complex interaction of genes and environmental factors. To date, a total of 7 genes including SNCA, LRRK2, PARK2, DJ-1, PINK 1, VPS35 and ATP13A2 have been seen to cause unequivocally Mendelian PD. Also, variants with incomplete penetrance in the genes LRRK2 and GBA are considered to be strong risk factors for PD worldwide. Although genetic studies have provided valuable insights into the pathogenic mechanisms underlying PD, the role of structural variation in PD has been understudied in comparison with other genomic variations. Structural genomic variations might substantially account for such genetic substrates yet to be discovered. The present review aims to provide an overview of the structural genomic variants implicated in the pathogenesis of PD.
Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.
Acikel, Cengizhan; Aydin Son, Yesim; Celik, Cemil; Gul, Husamettin
Multifactor dimensionality reduction (MDR) is a nonparametric approach that can be used to detect relevant interactions between single-nucleotide polymorphisms (SNPs). The aim of this study was to build the best genomic model based on SNP associations and to identify candidate polymorphisms that are the underlying molecular basis of the bipolar disorders. This study was performed on Whole-Genome Association Study of Bipolar Disorder (dbGaP [database of Genotypes and Phenotypes] study accession number: phs000017.v3.p1) data. After preprocessing of the genotyping data, three classification-based data mining methods (ie, random forest, naïve Bayes, and k-nearest neighbor) were performed. Additionally, as a nonparametric, model-free approach, the MDR method was used to evaluate the SNP profiles. The validity of these methods was evaluated using true classification rate, recall (sensitivity), precision (positive predictive value), and F-measure. Random forests, naïve Bayes, and k-nearest neighbors identified 16, 13, and ten candidate SNPs, respectively. Surprisingly, the top six SNPs were reported by all three methods. Random forests and k-nearest neighbors were more successful than naïve Bayes, with recall values >0.95. On the other hand, MDR generated a model with comparable predictive performance based on five SNPs. Although different SNP profiles were identified in MDR compared to the classification-based models, all models mapped SNPs to the DOCK10 gene. Three classification-based data mining approaches, random forests, naïve Bayes, and k-nearest neighbors, have prioritized similar SNP profiles as predictors of bipolar disorders, in contrast to MDR, which has found different SNPs through analysis of two-way and three-way interactions. The reduced number of associated SNPs discovered by MDR, without loss in the classification performance, would facilitate validation studies and decision support models, and would reduce the cost to develop predictive and
M. Schaap (Michiel); R.J.L.F. Lemmers (Richard); R. Maassen (Roel); P.J. van der Vliet (Patrick); L.F. Hoogerheide (Lennart); H.K. van Dijk (Herman); N. Basturk (Nalan); P. de Knijff (Peter); S.M. van der Maarel (Silvère)
textabstractBackground: Macrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and
Schaap, M.; Lemmers, R.J.L.F.; Maassen, R.; van der Vliet, P.J.; Hoogerheide, L.F.; van Dijk, H.K.; Basturk, N.; de Knijff, P.; van der Maarel, S.M.
Background: Macrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and function is largely
The present study compares the morphological, palynologycal and genome size (C-value content) characteristics in the long-styled and short-styled plants in three Linum species, that is, ... The analysis of variance (ANOVA) test performed among the three Linum species showed a significant difference in 2C-value content.
Cirilli, Marco; Flati, Tiziano; Gioiosa, Silvia; Tagliaferri, Ilario; Ciacciulli, Angelo; Gao, Zhongshan; Gattolin, Stefano; Geuna, Filippo; Maggi, Francesco; Bottoni, Paolo; Rossini, Laura; Bassi, Daniele; Castrignanò, Tiziana; Chillemi, Giovanni
Applying next-generation sequencing (NGS) technologies to species of agricultural interest has the potential to accelerate the understanding and exploration of genetic resources. The storage, availability and maintenance of huge quantities of NGS-generated data remains a major challenge. The PeachVar-DB portal, available at http://hpc-bioinformatics.cineca.it/peach, is an open-source catalog of genetic variants present in peach (Prunus persica L. Batsch) and wild-related species of Prunus genera, annotated from 146 samples publicly released on the Sequence Read Archive (SRA). We designed a user-friendly web-based interface of the database, providing search tools to retrieve single nucleotide polymorphism (SNP) and InDel variants, along with useful statistics and information. PeachVar-DB results are linked to the Genome Database for Rosaceae (GDR) and the Phytozome database to allow easy access to other external useful plant-oriented resources. In order to extend the genetic diversity covered by the PeachVar-DB further, and to allow increasingly powerful comparative analysis, we will progressively integrate newly released data. © The Author 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: firstname.lastname@example.org.
Joshua Chang Mell
Full Text Available Many bacteria are able to efficiently bind and take up double-stranded DNA fragments, and the resulting natural transformation shapes bacterial genomes, transmits antibiotic resistance, and allows escape from immune surveillance. The genomes of many competent pathogens show evidence of extensive historical recombination between lineages, but the actual recombination events have not been well characterized. We used DNA from a clinical isolate of Haemophilus influenzae to transform competent cells of a laboratory strain. To identify which of the ~40,000 polymorphic differences had recombined into the genomes of four transformed clones, their genomes and their donor and recipient parents were deep sequenced to high coverage. Each clone was found to contain ~1000 donor polymorphisms in 3-6 contiguous runs (8.1±4.5 kb in length that collectively comprised ~1-3% of each transformed chromosome. Seven donor-specific insertions and deletions were also acquired as parts of larger donor segments, but the presence of other structural variation flanking 12 of 32 recombination breakpoints suggested that these often disrupt the progress of recombination events. This is the first genome-wide analysis of chromosomes directly transformed with DNA from a divergent genotype, connecting experimental studies of transformation with the high levels of natural genetic variation found in isolates of the same species.
Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis
Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...
Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael
With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710
Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D
Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul
The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Yan, Honghai; Martin, Sara L; Bekele, Wubishet A; Latta, Robert G; Diederichsen, Axel; Peng, Yuanying; Tinker, Nicholas A
Genome size is an indicator of evolutionary distance and a metric for genome characterization. Here, we report accurate estimates of genome size in 99 accessions from 26 species of Avena. We demonstrate that the average genome size of C genome diploid species (2C = 10.26 pg) is 15% larger than that of A genome species (2C = 8.95 pg), and that this difference likely accounts for a progression of size among tetraploid species, where AB genome configuration had similar genome sizes (average 2C = 25.74 pg). Genome size was mostly consistent within species and in general agreement with current information about evolutionary distance among species. Results also suggest that most of the polyploid species in Avena have experienced genome downsizing in relation to their diploid progenitors. Genome size measurements could provide additional quality control for species identification in germplasm collections, especially in cases where diploid and polyploid species have similar morphology.
Full Text Available To address whether there are differences of variation among repeat motif types and among taxonomic groups, we present here an analysis of variation and correlation of dinucleotide microsatellite repeats in eukaryotic genomes. Ten taxonomic groups were compared, those being primates, mammalia (excluding primates and rodentia, rodentia, birds, fish, amphibians and reptiles, insects, molluscs, plants and fungi, respectively. The data used in the analysis is from the literature published in the Journal of Molecular Ecology Notes. Analysis of variation reveals that there are no significant differences between AC and AG repeat motif types. Moreover, the number of alleles correlates positively with the copy number in both AG and AC repeats. Similar conclusions can be obtained from each taxonomic group. These results strongly suggest that the increase of SSR variation is almost linear with the increase of the copy number of each repeat motif. As well, the results suggest that the variability of SSR in the genomes of low-ranking species seem to be more than that of high-ranking species, excluding primates and fungi.
Liu, Fan; Chen, Yan; Zhu, Gu; Hysi, Pirro G; Wu, Sijie; Adhikari, Kaustubh; Breslin, Krystal; Pospiech, Ewelina; Hamer, Merel A; Peng, Fuduan; Muralidharan, Charanya; Acuna-Alonzo, Victor; Canizales-Quinteros, Samuel; Bedoya, Gabriel; Gallo, Carla; Poletti, Giovanni; Rothhammer, Francisco; Bortolini, Maria Catira; Gonzalez-Jose, Rolando; Zeng, Changqing; Xu, Shuhua; Jin, Li; Uitterlinden, André G; Ikram, M Arfan; van Duijn, Cornelia M; Nijsten, Tamar; Walsh, Susan; Branicki, Wojciech; Wang, Sijia; Ruiz-Linares, Andrés; Spector, Timothy D; Martin, Nicholas G; Medland, Sarah E; Kayser, Manfred
Shape variation of human head hair shows striking variation within and between human populations, while its genetic basis is far from being understood. We performed a series of genome-wide association studies (GWASs) and replication studies in a total of 28 964 subjects from 9 cohorts from multiple geographic origins. A meta-analysis of three European GWASs identified 8 novel loci (1p36.23 ERRFI1/SLC45A1, 1p36.22 PEX14, 1p36.13 PADI3, 2p13.3 TGFA, 11p14.1 LGR4, 12q13.13 HOXC13, 17q21.2 KRTAP, and 20q13.33 PTK6), and confirmed 4 previously known ones (1q21.3 TCHH/TCHHL1/LCE3E, 2q35 WNT10A, 4q21.21 FRAS1, and 10p14 LINC00708/GATA3), all showing genome-wide significant association with hair shape (P 5e-8). All except one (1p36.22 PEX14) were replicated with nominal significance in at least one of the 6 additional cohorts of European, Native American and East Asian origins. Three additional previously known genes (EDAR, OFCC1, and PRSS53) were confirmed at the nominal significance level. A multivariable regression model revealed that 14 SNPs from different genes significantly and independently contribute to hair shape variation, reaching a cross-validated AUC value of 0.66 (95% CI: 0.62-0.70) and an AUC value of 0.64 in an independent validation cohort, providing an improved accuracy compared with a previous model. Prediction outcomes of 2504 individuals from a multiethnic sample were largely consistent with general knowledge on the global distribution of hair shape variation. Our study thus delivers target genes and DNA variants for future functional studies to further evaluate the molecular basis of hair shape in humans. © The Author(s) 2017. Published by Oxford University Press.
Liu, Fan; Chen, Yan; Zhu, Gu; Hysi, Pirro G; Wu, Sijie; Adhikari, Kaustubh; Breslin, Krystal; Pośpiech, Ewelina; Hamer, Merel A; Peng, Fuduan; Muralidharan, Charanya; Acuna-Alonzo, Victor; Canizales-Quinteros, Samuel; Bedoya, Gabriel; Gallo, Carla; Poletti, Giovanni; Rothhammer, Francisco; Bortolini, Maria Catira; Gonzalez-Jose, Rolando; Zeng, Changqing; Xu, Shuhua; Jin, Li; Uitterlinden, André G; Ikram, M Arfan; van Duijn, Cornelia M; Nijsten, Tamar; Walsh, Susan; Branicki, Wojciech; Wang, Sijia; Ruiz-Linares, Andrés; Spector, Timothy D; Martin, Nicholas G; Medland, Sarah E; Kayser, Manfred
Abstract Shape variation of human head hair shows striking variation within and between human populations, while its genetic basis is far from being understood. We performed a series of genome-wide association studies (GWASs) and replication studies in a total of 28 964 subjects from 9 cohorts from multiple geographic origins. A meta-analysis of three European GWASs identified 8 novel loci (1p36.23 ERRFI1/SLC45A1, 1p36.22 PEX14, 1p36.13 PADI3, 2p13.3 TGFA, 11p14.1 LGR4, 12q13.13 HOXC13, 17q21.2 KRTAP, and 20q13.33 PTK6), and confirmed 4 previously known ones (1q21.3 TCHH/TCHHL1/LCE3E, 2q35 WNT10A, 4q21.21 FRAS1, and 10p14 LINC00708/GATA3), all showing genome-wide significant association with hair shape (P < 5e-8). All except one (1p36.22 PEX14) were replicated with nominal significance in at least one of the 6 additional cohorts of European, Native American and East Asian origins. Three additional previously known genes (EDAR, OFCC1, and PRSS53) were confirmed at the nominal significance level. A multivariable regression model revealed that 14 SNPs from different genes significantly and independently contribute to hair shape variation, reaching a cross-validated AUC value of 0.66 (95% CI: 0.62–0.70) and an AUC value of 0.64 in an independent validation cohort, providing an improved accuracy compared with a previous model. Prediction outcomes of 2504 individuals from a multiethnic sample were largely consistent with general knowledge on the global distribution of hair shape variation. Our study thus delivers target genes and DNA variants for future functional studies to further evaluate the molecular basis of hair shape in humans. PMID:29220522
Barbara E Stranger
Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Lisa L Ellis
Full Text Available We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions.
Ellis, Lisa L.; Huang, Wen; Quinn, Andrew M.; Ahuja, Astha; Alfrejd, Ben; Gomez, Francisco E.; Hjelmen, Carl E.; Moore, Kristi L.; Mackay, Trudy F. C.; Johnston, J. Spencer; Tarone, Aaron M.
We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions. PMID:25057905
Full Text Available Chemerin is an adipokine proposed to link obesity and chronic inflammation of adipose tissue. Genetic factors determining chemerin release from adipose tissue are yet unknown. We conducted a meta-analysis of genome-wide association studies (GWAS for serum chemerin in three independent cohorts from Europe: Sorbs and KORA from Germany and PPP-Botnia from Finland (total N = 2,791. In addition, we measured mRNA expression of genes within the associated loci in peripheral mononuclear cells by micro-arrays, and within adipose tissue by quantitative RT-PCR and performed mRNA expression quantitative trait and expression-chemerin association studies to functionally substantiate our loci. Heritability estimate of circulating chemerin levels was 16.2% in the Sorbs cohort. Thirty single nucleotide polymorphisms (SNPs at chromosome 7 within the retinoic acid receptor responder 2 (RARRES2/Leucine Rich Repeat Containing (LRRC61 locus reached genome-wide significance (p<5.0×10-8 in the meta-analysis (the strongest evidence for association at rs7806429 with p = 7.8×10-14, beta = -0.067, explained variance 2.0%. All other SNPs within the cluster were in linkage disequilibrium with rs7806429 (minimum r2 = 0.43 in the Sorbs cohort. The results of the subgroup analyses of males and females were consistent with the results found in the total cohort. No significant SNP-sex interaction was observed. rs7806429 was associated with mRNA expression of RARRES2 in visceral adipose tissue in women (p<0.05 after adjusting for age and body mass index. In conclusion, the present meta-GWAS combined with mRNA expression studies highlights the role of genetic variation in the RARRES2 locus in the regulation of circulating chemerin concentrations.
Ghosh, Sharmila; Qu, Zhipeng; Das, Pranab J.; Fang, Erica; Juras, Rytis; Cothran, E. Gus; McDonell, Sue; Kenney, Daniel G.; Lear, Teri L.; Adelson, David L.; Chowdhary, Bhanu P.; Raudsepp, Terje
We constructed a 400K WG tiling oligoarray for the horse and applied it for the discovery of copy number variations (CNVs) in 38 normal horses of 16 diverse breeds, and the Przewalski horse. Probes on the array represented 18,763 autosomal and X-linked genes, and intergenic, sub-telomeric and chrY sequences. We identified 258 CNV regions (CNVRs) across all autosomes, chrX and chrUn, but not in chrY. CNVs comprised 1.3% of the horse genome with chr12 being most enriched. American Miniature horses had the highest and American Quarter Horses the lowest number of CNVs in relation to Thoroughbred reference. The Przewalski horse was similar to native ponies and draft breeds. The majority of CNVRs involved genes, while 20% were located in intergenic regions. Similar to previous studies in horses and other mammals, molecular functions of CNV-associated genes were predominantly in sensory perception, immunity and reproduction. The findings were integrated with previous studies to generate a composite genome-wide dataset of 1476 CNVRs. Of these, 301 CNVRs were shared between studies, while 1174 were novel and require further validation. Integrated data revealed that to date, 41 out of over 400 breeds of the domestic horse have been analyzed for CNVs, of which 11 new breeds were added in this study. Finally, the composite CNV dataset was applied in a pilot study for the discovery of CNVs in 6 horses with XY disorders of sexual development. A homozygous deletion involving AKR1C gene cluster in chr29 in two affected horses was considered possibly causative because of the known role of AKR1C genes in testicular androgen synthesis and sexual development. While the findings improve and integrate the knowledge of CNVs in horses, they also show that for effective discovery of variants of biomedical importance, more breeds and individuals need to be analyzed using comparable methodological approaches. PMID:25340504
Full Text Available We constructed a 400K WG tiling oligoarray for the horse and applied it for the discovery of copy number variations (CNVs in 38 normal horses of 16 diverse breeds, and the Przewalski horse. Probes on the array represented 18,763 autosomal and X-linked genes, and intergenic, sub-telomeric and chrY sequences. We identified 258 CNV regions (CNVRs across all autosomes, chrX and chrUn, but not in chrY. CNVs comprised 1.3% of the horse genome with chr12 being most enriched. American Miniature horses had the highest and American Quarter Horses the lowest number of CNVs in relation to Thoroughbred reference. The Przewalski horse was similar to native ponies and draft breeds. The majority of CNVRs involved genes, while 20% were located in intergenic regions. Similar to previous studies in horses and other mammals, molecular functions of CNV-associated genes were predominantly in sensory perception, immunity and reproduction. The findings were integrated with previous studies to generate a composite genome-wide dataset of 1476 CNVRs. Of these, 301 CNVRs were shared between studies, while 1174 were novel and require further validation. Integrated data revealed that to date, 41 out of over 400 breeds of the domestic horse have been analyzed for CNVs, of which 11 new breeds were added in this study. Finally, the composite CNV dataset was applied in a pilot study for the discovery of CNVs in 6 horses with XY disorders of sexual development. A homozygous deletion involving AKR1C gene cluster in chr29 in two affected horses was considered possibly causative because of the known role of AKR1C genes in testicular androgen synthesis and sexual development. While the findings improve and integrate the knowledge of CNVs in horses, they also show that for effective discovery of variants of biomedical importance, more breeds and individuals need to be analyzed using comparable methodological approaches.
Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.
Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…
Chung, Ren-Hua; Chiu, Yen-Feng; Hung, Yi-Jen; Lee, Wen-Jane; Wu, Kwan-Dun; Chen, Hui-Ling; Lin, Ming-Wei; Chen, Yii-Der I; Quertermous, Thomas; Hsiung, Chao A
Fasting glucose and fasting insulin are glycemic traits closely related to diabetes, and understanding the role of genetic factors in these traits can help reveal the etiology of type 2 diabetes. Although single nucleotide polymorphisms (SNPs) in several candidate genes have been found to be associated with fasting glucose and fasting insulin, copy number variations (CNVs), which have been reported to be associated with several complex traits, have not been reported for association with these two traits. We aimed to identify CNVs associated with fasting glucose and fasting insulin. We conducted a genome-wide CNV association analysis for fasting plasma glucose (FPG) and fasting plasma insulin (FPI) using a family-based genome-wide association study sample from a Han Chinese population in Taiwan. A family-based CNV association test was developed in this study to identify common CNVs (i.e., CNVs with frequencies ≥ 5%), and a generalized estimating equation approach was used to test the associations between the traits and counts of global rare CNVs (i.e., CNVs with frequencies <5%). We found a significant genome-wide association for common deletions with a frequency of 5.2% in the Scm-like with four mbt domains 1 (SFMBT1) gene with FPG (association p-value = 2×10 -4 and an adjusted p-value = 0.0478 for multiple testing). No significant association was observed between global rare CNVs and FPG or FPI. The deletions in 20 individuals with DNA samples available were successfully validated using PCR-based amplification. The association of the deletions in SFMBT1 with FPG was further evaluated using an independent population-based replication sample obtained from the Taiwan Biobank. An association p-value of 0.065, which was close to the significance level of 0.05, for FPG was obtained by testing 9 individuals with CNVs in the SFMBT1 gene region and 11,692 individuals with normal copies in the replication cohort. Previous studies have found that SNPs in SFMBT1 are
Sun, Yaqi; Wang, Hongyang; Wang, Chao; Yu, Shaobo; Liu, Jing; Zhang, Yu; Fan, Bin; Li, Kui; Liu, Bang
Copy number variations (CNVs) represent a substantial source of structural variants in mammals and contribute to both normal phenotypic variability and disease susceptibility. Although low-resolution CNV maps are produced in many domestic animals, and several reports have been published about the CNVs of porcine genome, the differences between Chinese and western pigs still remain to be elucidated. In this study, we used Porcine SNP60 BeadChip and PennCNV algorithm to perform a genome-wide CNV detection in 302 individuals from six Chinese indigenous breeds (Tongcheng, Laiwu, Luchuan, Bama, Wuzhishan and Ningxiang pigs), three western breeds (Yorkshire, Landrace and Duroc) and one hybrid (Tongcheng×Duroc). A total of 348 CNV Regions (CNVRs) across genome were identified, covering 150.49 Mb of the pig genome or 6.14% of the autosomal genome sequence. In these CNVRs, 213 CNVRs were found to exist only in the six Chinese indigenous breeds, and 60 CNVRs only in the three western breeds. The characters of CNVs in four Chinese normal size breeds (Luchuan, Tongcheng and Laiwu pigs) and two minipig breeds (Bama and Wuzhishan pigs) were also analyzed in this study. Functional annotation suggested that these CNVRs possess a great variety of molecular function and may play important roles in phenotypic and production traits between Chinese and western breeds. Our results are important complementary to the CNV map in pig genome, which provide new information about the diversity of Chinese and western pig breeds, and facilitate further research on porcine genome CNVs. PMID:25198154
Zhan, Bujie; Fadista, João; Thomsen, Bo
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome...... of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation...
Matschegewski, Claudia; Zetzsche, Holger; Hasan, Yaser; Leibeguth, Lena; Briggs, William; Ordon, Frank; Uptmoor, Ralf
Cauliflower (Brassica oleracea var. botrytis) is a vernalization-responsive crop. High ambient temperatures delay harvest time. The elucidation of the genetic regulation of floral transition is highly interesting for a precise harvest scheduling and to ensure stable market supply. This study aims at genetic dissection of temperature-dependent curd induction in cauliflower by genome-wide association studies and gene expression analysis. To assess temperature-dependent curd induction, two greenhouse trials under distinct temperature regimes were conducted on a diversity panel consisting of 111 cauliflower commercial parent lines, genotyped with 14,385 SNPs. Broad phenotypic variation and high heritability (0.93) were observed for temperature-related curd induction within the cauliflower population. GWA mapping identified a total of 18 QTL localized on chromosomes O1, O2, O3, O4, O6, O8, and O9 for curding time under two distinct temperature regimes. Among those, several QTL are localized within regions of promising candidate flowering genes. Inferring population structure and genetic relatedness among the diversity set assigned three main genetic clusters. Linkage disequilibrium (LD) patterns estimated global LD extent of r2 = 0.06 and a maximum physical distance of 400 kb for genetic linkage. Transcriptional profiling of flowering genes FLOWERING LOCUS C (BoFLC) and VERNALIZATION 2 (BoVRN2) was performed, showing increased expression levels of BoVRN2 in genotypes with faster curding. However, functional relevance of BoVRN2 and BoFLC2 could not consistently be supported, which probably suggests to act facultative and/or might evidence for BoVRN2/BoFLC-independent mechanisms in temperature-regulated floral transition in cauliflower. Genetic insights in temperature-regulated curd induction can underpin genetically informed phenology models and benefit molecular breeding strategies toward the development of thermo-tolerant cultivars. PMID:26442034
Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover
Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...
Full Text Available Besides single-nucleotide variants in the human genome, large-scale genomic variants, such as copy number variations (CNVs, are being increasingly discovered as a genetic source of human diversity and the pathogenic factors of diseases. Recent experimental findings have shed light on the links between different genome architectures and CNV mutagenesis. In this review, we summarize various genomic features and discuss their contributions to CNV formation. Genomic repeats, including both low-copy and high-copy repeats, play important roles in CNV instability, which was initially known as DNA recombination events. Furthermore, it has been found that human genomic repeats can also induce DNA replication errors and consequently result in CNV mutations. Some recent studies showed that DNA replication timing, which reflects the high-order information of genomic organization, is involved in human CNV mutations. Our review highlights that genome architecture, from DNA sequence to high-order genomic organization, is an important molecular factor in CNV mutagenesis and human genomic instability.
Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching
The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314
National Oceanic and Atmospheric Administration, Department of Commerce — Conduct analyses of epigenetic and genomic variation in Chinook salmon and steelhead to determine influence on phenotypic expression of life history traits. Genetic,...
Full Text Available To gain insight into the patterns of genetic variation and evolutionary relationships within and between bonobos and chimpanzees, we sequenced 150,000 base pairs of nuclear DNA divided among 15 autosomal regions as well as the complete mitochondrial genomes from 20 bonobos and 58 chimpanzees. Except for western chimpanzees, we found poor genetic separation of chimpanzees based on sample locality. In contrast, bonobos consistently cluster together but fall as a group within the variation of chimpanzees for many of the regions. Thus, while chimpanzees retain genomic variation that predates bonobo-chimpanzee speciation, extensive lineage sorting has occurred within bonobos such that much of their genome traces its ancestry back to a single common ancestor that postdates their origin as a group separate from chimpanzees.
Mills, Ryan E.; Walter, Klaudia; Stewart, Chip
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is......, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications...
Astolfi, P A; Salamini, F; Sgaramella, V
Theoretical and experimental evidences support the hypothesis that the genomes and the epigenomes may be different in the somatic cells of complex organisms. In the genome, the differences range from single base substitutions to chromosome number; in the epigenome, they entail multiple postsynthetic modifications of the chromatin. Somatic genome variations (SGV) may accumulate during development in response both to genetic programs, which may differ from tissue to tissue, and to environmental stimuli, which are often undetected and generally irreproducible. SGV may jeopardize physiological cellular functions, but also create novel coding and regulatory sequences, to be exposed to intraorganismal Darwinian selection. Genomes acknowledged as comparatively poor in genes, such as humans', could thus increase their pristine informational endowment. A better understanding of SGV will contribute to basic issues such as the "nature vs nurture" dualism and the inheritance of acquired characters. On the applied side, they may explain the low yield of cloning via somatic cell nuclear transfer, provide clues to some of the problems associated with transdifferentiation, and interfere with individual DNA analysis. SGV may be unique in the different cells types and in the different developmental stages, and thus explain the several hundred gaps persisting in the human genomes "completed" so far. They may compound the variations associated to our epigenomes and make of each of us an "(epi)genomic" mosaic. An ensuing paradigm is the possibility that a single genome (the ephemeral one assembled at fertilization) has the capacity to generate several different brains in response to different environments.
Zhang, Tongwu; Hu, Songnian; Zhang, Guangyu; Pan, Linlin; Zhang, Xiaowei; Al-Mssallem, Ibrahim S.; Yu, Jun
Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice. PMID:22870184
G. Davies (Gail); N.J. Armstrong (Nicola J.); J.C. Bis (Joshua); J. Bressler (Jan); V. Chouraki (Vincent); S. Giddaluru (Sudheer); E. Hofer; C.A. Ibrahim-Verbaas (Carla); M. Kirin (Mirna); J. Lahti; S.J. van der Lee (Sven); S. Le Hellard (Stephanie); T. Liu; R.E. Marioni (Riccardo); C. Oldmeadow (Christopher); D. Postmus (Douwe); G.D. Smith; J.A. Smith (Jennifer A); A. Thalamuthu (Anbupalam); R. Thomson (Russell); V. Vitart (Veronique); J. Wang; L. Yu; L. Zgaga (Lina); W. Zhao (Wei); R. Boxall (Ruth); S.E. Harris (Sarah); W.D. Hill (W. David); D.C. Liewald (David C.); M. Luciano (Michelle); H.H.H. Adams (Hieab); D. Ames (David); N. Amin (Najaf); P. Amouyel (Philippe); A.A. Assareh; R. Au; J.T. Becker (James); A. Beiser; C. Berr (Claudine); L. Bertram (Lars); E.A. Boerwinkle (Eric); B.M. Buckley (Brendan M.); H. Campbell (Harry); J. Corley; P.L. De Jager; C. Dufouil (Carole); J.G. Eriksson (Johan G.); T. Espeseth (Thomas); J.D. Faul; I. Ford; G. Scotland (Generation); R.F. Gottesman (Rebecca); M.D. Griswold (Michael); V. Gudnason (Vilmundur); T.B. Harris; G. Heiss (Gerardo); A. Hofman (Albert); E.G. Holliday (Elizabeth); J.E. Huffman (Jennifer); S.L.R. Kardia (Sharon); N.A. Kochan (Nicole A.); D.S. Knopman (David); J.B. Kwok; J.-C. Lambert; T. Lee; G. Li; S.-C. Li; M. Loitfelder (Marisa); O.L. Lopez (Oscar); A.J. Lundervold; A. Lundqvist; R. Mather; S.S. Mirza (Saira); L. Nyberg; B.A. Oostra (Ben); A. Palotie (Aarno); G. Papenberg; A. Pattie (Alison); K. Petrovic (Katja); O. Polasek (Ozren); B.M. Psaty (Bruce); P. Redmond (Paul); S. Reppermund; J.I. Rotter; R. Schmidt (Reinhold); M. Schuur (Maaike); P.W. Schofield; R.J. Scott; V.M. Steen (Vidar); D.J. Stott (David J.); J.C. van Swieten (John); K.D. Taylor (Kent); J. Trollor; S. Trompet (Stella); A.G. Uitterlinden (André); G. Weinstein; E. Widen (Elisabeth); B.G. Windham (B Gwen); J.W. Jukema (Jan Wouter); A. Wright (Alan); M.J. Wright (Margaret); Q. Yang (Qiong Fang); H. Amieva (Hélène); J. Attia (John); D.A. Bennett (David); H. Brodaty (Henry); A.J. de Craen (Anton); C. Hayward; M.A. Ikram (Arfan); U. Lindenberger; L.-G. Nilsson; D.J. Porteous (David J.); K. Räikkönen (Katri); I. Reinvang (Ivar); I. Rudan (Igor); P.S. Sachdev (Perminder); R. Schmidt; P. Schofield (Peter); V. Srikanth; J.M. Starr (John); S.T. Turner (Stephen); D.R. Weir (David R.); J.F. Wilson (James F); C.M. van Duijn (Cornelia); L.J. Launer (Lenore); A.L. Fitzpatrick (Annette); S. Seshadri (Sudha); T.H. Mosley (Thomas H.); I.J. Deary (Ian J.)
textabstractGeneral cognitive function is substantially heritable across the human life course from adolescence to old age. We investigated the genetic contribution to variation in this important, health- and well-being-related trait in middle-aged and older adults. We conducted a meta-analysis of
Imamura, Daisuke; Morita, Masatomo; Sekizuka, Tsuyoshi; Mizuno, Tamaki; Takemura, Taichiro; Yamashiro, Tetsu; Chowdhury, Goutam; Pazhani, Gururaja P; Mukhopadhyay, Asish K; Ramamurthy, Thandavarayan; Miyoshi, Shin-Ichi; Kuroda, Makoto; Shinoda, Sumio; Ohnishi, Makoto
Cholera is an acute diarrheal disease and a major public health problem in many developing countries in Asia, Africa, and Latin America. Since the Bay of Bengal is considered the epicenter for the seventh cholera pandemic, it is important to understand the genetic dynamism of Vibrio cholerae from Kolkata, as a representative of the Bengal region. We analyzed whole genome sequence data of V. cholerae O1 isolated from cholera patients in Kolkata, India, from 2007 to 2014 and identified the heterogeneous genomic region in these strains. In addition, we carried out a phylogenetic analysis based on the whole genome single nucleotide polymorphisms to determine the genetic lineage of strains in Kolkata. This analysis revealed the heterogeneity of the Vibrio seventh pandemic island (VSP)-II in Kolkata strains. The ctxB genotype was also heterogeneous and was highly related to VSP-II types. In addition, phylogenetic analysis revealed the shifts in predominant strains in Kolkata. Two distinct lineages, 1 and 2, were found between 2007 and 2010. However, the proportion changed markedly in 2010 and lineage 2 strains were predominant thereafter. Lineage 2 can be divided into four sublineages, I, II, III and IV. The results of this study indicate that lineages 1 and 2-I were concurrently prevalent between 2007 and 2009, and lineage 2-III observed in 2010, followed by the predominance of lineage 2-IV in 2011 and continued until 2014. Our findings demonstrate that the epidemic of cholera in Kolkata was caused by several distinct strains that have been constantly changing within the genetic lineages of V. cholerae O1 in recent years.
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Reddy, Kondreddy Eswar; Thu, Ha Thi; Yoo, Mi Sun; Ramya, Mummadireddy; Reddy, Bheemireddy Anjana; Lien, Nguyen Thi Kim; Trang, Nguyen Thi Phuong; Duong, Bui Thi Thuy; Lee, Hyun-Jeong; Kang, Seung-Won; Quyen, Dong Van
Sacbrood virus (SBV) is one of the most common viral infections of honeybees. The entire genome sequence for nine SBV infecting honeybees, Apis cerana and Apis mellifera, in Vietnam, namely AcSBV-Viet1, AcSBV-Viet2, AcSBV-Viet3, AmSBV-Viet4, AcSBV-Viet5, AmSBV-Viet6, AcSBV-Viet7, AcSBV-Viet8, and AcSBV-Viet9, was determined. These sequences were aligned with seven previously reported complete genome sequences of SBV from other countries, and various genomic regions were compared. The Vietnamese SBVs (VN-SBVs) shared 91-99% identity with each other, and shared 89-94% identity with strains from other countries. The open reading frames (ORFs) of the VN-SBV genomes differed greatly from those of SBVs from other countries, especially in their VP1 sequences. The AmSBV-Viet6 and AcSBV-Viet9 genome encodes 17 more amino acids within this region than the other VN-SBVs. In a phylogenetic analysis, the strains AmSBV-Viet4, AcSBV-Viet2, and AcSBV-Viet3 were clustered in group with AmSBV-UK, AmSBV-Kor21, and AmSBV-Kor19 strains. Whereas, the strains AmSBV-Viet6 and AcSBV-Viet7 clustered separately with the AcSBV strains from Korea and AcSBV-VietSBM2. And the strains AcSBV-Viet8, AcSBV-Viet1, AcSBV-Viet5, and AcSBV-Viet9 clustered with the AcSBV-India, AcSBV-Kor and AcSBV-VietSBM2. In a Simplot graph, the VN-SBVs diverged stronger in their ORF regions than in their 5' or 3' untranslated regions. The VN-SBVs possess genetic characteristics which are more similar to the Asian AcSBV strains than to AmSBV-UK strain. Taken together, our data indicate that host specificity, geographic distance, and viral cross-infections between different bee species may explain the genetic diversity among the VN-SBVs in A. cerana and A. mellifera and other SBV strains. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054
Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J
The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
Full Text Available The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57% of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%. Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
Stewart Lindsay B
Full Text Available Abstract Background Gene copy number variation (CNV is responsible for several important phenotypes of the malaria parasite Plasmodium falciparum, including drug resistance, loss of infected erythrocyte cytoadherence and alteration of receptor usage for erythrocyte invasion. Despite the known effects of CNV, little is known about its extent throughout the genome. Results We performed a whole-genome survey of CNV genes in P. falciparum using comparative genome hybridisation of a diverse set of 16 laboratory culture-adapted isolates to a custom designed high density Affymetrix GeneChip array. Overall, 186 genes showed hybridisation signals consistent with deletion or amplification in one or more isolate. There is a strong association of CNV with gene length, genomic location, and low orthology to genes in other Plasmodium species. Sub-telomeric regions of all chromosomes are strongly associated with CNV genes independent from members of previously described multigene families. However, ~40% of CNV genes were located in more central regions of the chromosomes. Among the previously undescribed CNV genes, several that are of potential phenotypic relevance are identified. Conclusion CNV represents a major form of genetic variation within the P. falciparum genome; the distribution of gene features indicates the involvement of highly non-random mutational and selective processes. Additional studies should be directed at examining CNV in natural parasite populations to extend conclusions to clinical settings.
Lee, Soo-Rang; Jo, Yeong-Seok; Park, Chan-Ho; Friedman, Jonathan M.; Olson, Matthew S.
Understanding the complex influences of landscape and anthropogenic elements that shape the population genetic structure of invasive species provides insight into patterns of colonization and spread. The application of landscape genomics techniques to these questions may offer detailed, previously undocumented insights into factors influencing species invasions. We investigated the spatial pattern of genetic variation and the influences of landscape factors on population similarity in an invasive riparian shrub, saltcedar (Tamarix L.) by analysing 1,997 genomewide SNP markers for 259 individuals from 25 populations collected throughout the southwestern United States. Our results revealed a broad-scale spatial genetic differentiation of saltcedar populations between the Colorado and Rio Grande river basins and identified potential barriers to population similarity along both river systems. River pathways most strongly contributed to population similarity. In contrast, low temperature and dams likely served as barriers to population similarity. We hypothesize that large-scale geographic patterns in genetic diversity resulted from a combination of early introductions from distinct populations, the subsequent influence of natural selection, dispersal barriers and founder effects during range expansion.
Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus
Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies.
Gion, Jean-Marc; Hudson, Corey J; Lesur, Isabelle; Vaillancourt, René E; Potts, Brad M; Freeman, Jules S
Meiotic recombination is a fundamental evolutionary process. It not only generates diversity, but influences the efficacy of natural selection and genome evolution. There can be significant heterogeneity in recombination rates within and between species, however this variation is not well understood outside of a few model taxa, particularly in forest trees. Eucalypts are forest trees of global economic importance, and dominate many Australian ecosystems. We studied recombination rate in Eucalyptus globulus using genetic linkage maps constructed in 10 unrelated individuals, and markers anchored to the Eucalyptus reference genome. This experimental design provided the replication to study whether recombination rate varied between individuals and chromosomes, and allowed us to study the genomic attributes and population genetic parameters correlated with this variation. Recombination rate varied significantly between individuals (range = 2.71 to 3.51 centimorgans/megabase [cM/Mb]), but was not significantly influenced by sex or cross type (F1 vs. F2). Significant differences in recombination rate between chromosomes were also evident (range = 1.98 to 3.81 cM/Mb), beyond those which were due to variation in chromosome size. Variation in chromosomal recombination rate was significantly correlated with gene density (r = 0.94), GC content (r = 0.90), and the number of tandem duplicated genes (r = -0.72) per chromosome. Notably, chromosome level recombination rate was also negatively correlated with the average genetic diversity across six species from an independent set of samples (r = -0.75). The correlations with genomic attributes are consistent with findings in other taxa, however, the direction of the correlation between diversity and recombination rate is opposite to that commonly observed. We argue this is likely to reflect the interaction of selection and specific genome architecture of Eucalyptus. Interestingly, the differences amongst
Full Text Available Drosophila melanogaster Meigen, 1830 has served as a model insect for over a century. Sequencing of the 11 additional Drosophila Fallen, 1823 species marks substantial progress in comparative genomics of this genus. By comparison, practically nothing is known about the genome size or genome sequences of parasitic wasps of Drosophila. Here, we present the first comparative analysis of genome size and karyotype structures of Drosophila parasitoids of the Leptopilina Förster, 1869 and Ganaspis Förster, 1869 species. The gametic genome size of Ganaspis xanthopoda (Ashmead, 1896 is larger than those of the three Leptopilina species studied. The genome sizes of all parasitic wasps studied here are also larger than those known for all Drosophila species. Surprisingly, genome sizes of these Drosophila parasitoids exceed the average value known for all previously studied Hymenoptera. The haploid chromosome number of both Leptopilina heterotoma (Thomson, 1862 and L. victoriae Nordlander, 1980 is ten. A chromosomal fusion appears to have produced a distinct karyotype for L. boulardi (Barbotin, Carton et Keiner-Pillault, 1979 (n = 9, whose genome size is smaller than that of wasps of the L. heterotoma clade. Like L. boulardi, the haploid chromosome number for G. xanthopoda is also nine. Our studies reveal a positive, but non linear, correlation between the genome size and total chromosome length in Drosophila parasitoids. These Drosophila parasitoids differ widely in their host range, and utilize different infection strategies to overcome host defense. Their comparative genomics, in relation to their exceptionally well-characterized hosts, will prove to be valuable for understanding the molecular basis of the host-parasite arms race and how such mechanisms shape the genetic structures of insect communities.
Ukkola-Vuoti, Liisa; Kanduri, Chakravarthi; Oikkonen, Jaana; Buck, Gemma; Blancher, Christine; Raijas, Pirre; Karma, Kai; Lähdesmäki, Harri; Järvelä, Irma
Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs) in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the scores of three music tests (COMB scores): auditory structuring ability, Seashores test for pitch and for time. Data on creativity in music (herein composing, improvising and/or arranging music) was surveyed using a web-based questionnaire.Several CNVRs containing genes that affect neurodevelopment, learning and memory were detected. A deletion at 5q31.1 covering the protocadherin-α gene cluster (Pcdha 1-9) was found co-segregating with low music test scores (COMB) in both sample sets. Pcdha is involved in neural migration, differentiation and synaptogenesis. Creativity in music was found to co-segregate with a duplication covering glucose mutarotase gene (GALM) at 2p22. GALM has influence on serotonin release and membrane trafficking of the human serotonin transporter. Interestingly, genes related to serotonergic systems have been shown to associate not only with psychiatric disorders but also with creativity and music perception. Both, Pcdha and GALM, are related to the serotonergic systems influencing cognitive and motor functions, important for music perception and practice. Finally, a 1.3 Mb duplication was identified in a subject with low COMB scores in the region previously linked with absolute pitch (AP) at 8q24. No differences in the CNV burden was detected among the high/low music test scores or creative/non-creative groups. In summary, CNVs and genes found in this study are related to cognitive functions. Our result suggests new candidate genes for music perception
Full Text Available Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the scores of three music tests (COMB scores: auditory structuring ability, Seashores test for pitch and for time. Data on creativity in music (herein composing, improvising and/or arranging music was surveyed using a web-based questionnaire.Several CNVRs containing genes that affect neurodevelopment, learning and memory were detected. A deletion at 5q31.1 covering the protocadherin-α gene cluster (Pcdha 1-9 was found co-segregating with low music test scores (COMB in both sample sets. Pcdha is involved in neural migration, differentiation and synaptogenesis. Creativity in music was found to co-segregate with a duplication covering glucose mutarotase gene (GALM at 2p22. GALM has influence on serotonin release and membrane trafficking of the human serotonin transporter. Interestingly, genes related to serotonergic systems have been shown to associate not only with psychiatric disorders but also with creativity and music perception. Both, Pcdha and GALM, are related to the serotonergic systems influencing cognitive and motor functions, important for music perception and practice. Finally, a 1.3 Mb duplication was identified in a subject with low COMB scores in the region previously linked with absolute pitch (AP at 8q24. No differences in the CNV burden was detected among the high/low music test scores or creative/non-creative groups. In summary, CNVs and genes found in this study are related to cognitive functions. Our result suggests new candidate genes for
Oikkonen, Jaana; Buck, Gemma; Blancher, Christine; Raijas, Pirre; Karma, Kai; Lähdesmäki, Harri; Järvelä, Irma
Music perception and practice represent complex cognitive functions of the human brain. Recently, evidence for the molecular genetic background of music related phenotypes has been obtained. In order to further elucidate the molecular background of musical phenotypes we analyzed genome wide copy number variations (CNVs) in five extended pedigrees and in 172 unrelated subjects characterized for musical aptitude and creative functions in music. Musical aptitude was defined by combination of the scores of three music tests (COMB scores): auditory structuring ability, Seashores test for pitch and for time. Data on creativity in music (herein composing, improvising and/or arranging music) was surveyed using a web-based questionnaire. Several CNVRs containing genes that affect neurodevelopment, learning and memory were detected. A deletion at 5q31.1 covering the protocadherin-α gene cluster (Pcdha 1-9) was found co-segregating with low music test scores (COMB) in both sample sets. Pcdha is involved in neural migration, differentiation and synaptogenesis. Creativity in music was found to co-segregate with a duplication covering glucose mutarotase gene (GALM) at 2p22. GALM has influence on serotonin release and membrane trafficking of the human serotonin transporter. Interestingly, genes related to serotonergic systems have been shown to associate not only with psychiatric disorders but also with creativity and music perception. Both, Pcdha and GALM, are related to the serotonergic systems influencing cognitive and motor functions, important for music perception and practice. Finally, a 1.3 Mb duplication was identified in a subject with low COMB scores in the region previously linked with absolute pitch (AP) at 8q24. No differences in the CNV burden was detected among the high/low music test scores or creative/non-creative groups. In summary, CNVs and genes found in this study are related to cognitive functions. Our result suggests new candidate genes for music
Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo
We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.
Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning
Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.
Full Text Available Schizophrenia is a devastating neuropsychiatric disorder affecting approximately 1% of the global population, and the disease has imposed a considerable burden on families and society. Although, the exact cause of schizophrenia remains unknown, several lines of scientific evidence have revealed that genetic variants are strongly correlated with the development and early onset of the disease. In fact, the heritability among patients suffering from schizophrenia is as high as 80%. Genomic copy number variations (CNVs are one of the main forms of genomic variations, ubiquitously occurring in the human genome. An increasing number of studies have shown that CNVs account for population diversity and genetically related diseases, including schizophrenia. The last decade has witnessed rapid advances in the development of novel genomic technologies, which have led to the identification of schizophrenia-associated CNVs, insight into the roles of the affected genes in their intervals in schizophrenia, and successful manipulation of the target CNVs. In this review, we focus on the recent discoveries of important CNVs that are associated with schizophrenia and outline the potential values that the study of CNVs will bring to the areas of schizophrenia research, diagnosis, and therapy. Furthermore, with the help of the novel genetic tool known as the Clustered Regularly Interspaced Short Palindromic Repeats-associated nuclease 9 (CRISPR/Cas9 system, the pathogenic CNVs as genomic defects could be corrected. In conclusion, the recent novel findings of schizophrenia-associated CNVs offer an exciting opportunity for schizophrenia research to decipher the pathological mechanisms underlying the onset and development of schizophrenia as well as to provide potential clinical applications in genetic counseling, diagnosis, and therapy for this complex mental disease.
Langley, Charles H.; Stevens, Kristian; Cardeno, Charis; Lee, Yuh Chwen G.; Schrider, Daniel R.; Pool, John E.; Langley, Sasha A.; Suarez, Charlyn; Corbett-Detig, Russell B.; Kolaczkowski, Bryan; Fang, Shu; Nista, Phillip M.; Holloway, Alisha K.; Kern, Andrew D.; Dewey, Colin N.; Song, Yun S.; Hahn, Matthew W.; Begun, David J.
This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5′- and 3′-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species. PMID:22673804
Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao
Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied with the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here we explored within-host genetic diversity of HPV by performing deep sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52 and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC), and were deep-sequenced. After constructing a reference vial genome sequence for each specimen, nucleotide positions showing changes with > 0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with varying numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the tri-nucleotides context encompassing substituted bases revealed that Tp C pN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep sequencing analyses, we show for the first time a comprehensive snapshot of the "within
Full Text Available Abstract Background Eimeria is a genus of parasites in the same phylum (Apicomplexa as human parasites such as Toxoplasma, Cryptosporidium and the malaria parasite Plasmodium. As an apicomplexan whose life-cycle involves a single host, Eimeria is a convenient model for understanding this group of organisms. Although the genomes of the Apicomplexa are diverse, that of Eimeria is unique in being composed of large alternating blocks of sequence with very different characteristics - an arrangement seen in no other organism. This arrangement has impeded efforts to fully sequence the genome of Eimeria, which remains the last of the major apicomplexans to be fully analyzed. In order to increase the value of the genome sequence data and aid in the effort to gain a better understanding of the Eimeria tenella genome, we constructed a whole genome map for the parasite. Results A total of 1245 contigs representing 70.0% of the whole genome assembly sequences (Wellcome Trust Sanger Institute were selected and subjected to marker selection. Subsequently, 2482 HAPPY markers were developed and typed. Of these, 795 were considered as usable markers, and utilized in the construction of a HAPPY map. Markers developed from chromosomally-assigned genes were then integrated into the HAPPY map and this aided the assignment of a number of linkage groups to their respective chromosomes. BAC-end sequences and contigs from whole genome sequencing were also integrated to improve and validate the HAPPY map. This resulted in an integrated HAPPY map consisting of 60 linkage groups that covers approximately half of the estimated 60 Mb genome. Further analysis suggests that the segmental organization first seen in Chromosome 1 is present throughout the genome, with repeat-poor (P regions alternating with repeat-rich (R regions. Evidence of copy-number variation between strains was also uncovered. Conclusions This paper describes the application of a whole genome mapping
Gary J. Olsen
Nesbo, Boucher and Doolittle (2001) used phylogenetic trees of four taxa to assess whether euryarchaeal genes share a common history. They have suggested that of the 521 genes examined, each of the three possible tree topologies relating the four taxa was supported essentially equal numbers of times. They suggest that this might be the result of numerous horizontal gene transfer events, essentially randomizing the relationships between gene histories (as inferred in the 521 gene trees) and organismal relationships (which would be a single underlying tree). Motivated by the fact that the order in which sequences are added to a multiple sequence alignment influences the alignment, and ultimately inferred tree, they were interested in the extent to which the variations among inferred trees might be due to variations in the alignment order. This bears directly on their efforts to evaluate and improve upon methods of multiple sequence alignment. They set out to analyze the influence of alignment order on the tree inferred for 43 genes shared among these same 4 taxa. Because alignments produced by CLUSTALW are directed by a rooted guide tree (the denderogram), there are 15 possible alignment orders of 4 taxa. For each gene they tested all 15 alignment orders, and as a 16th option, allowed CLUSTALW to generate its own guide tree. If we supply all 15 possible rooted guide trees, they expected that at least one of them should be as good at CLUSTAL's own guide tree, but most of the time they differed (sometimes being better than CLUSTAL's default tree and sometimes being worse). The difference seems to be that the user-supplied tree is not given meaningful branch lengths, which effect the assumed probability of amino acid changes. They examined the practicality of modifying CLUSTALW to improve its treatment of user-supplied guide trees. This work became ever increasing bogged down in finding and repairing minor bugs in the CLUSTALW code. This effort was put on hold
Full Text Available Abstract Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker. Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp, regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.
Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive
Checcucci, Alice; Mengoni, Alessio
Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.
Joseph M Gonzales
Full Text Available The determinants of transcriptional regulation in malaria parasites remain elusive. The presence of a well-characterized gene expression cascade shared by different Plasmodium falciparum strains could imply that transcriptional regulation and its natural variation do not contribute significantly to the evolution of parasite drug resistance. To clarify the role of transcriptional variation as a source of stain-specific diversity in the most deadly malaria species and to find genetic loci that dictate variations in gene expression, we examined genome-wide expression level polymorphisms (ELPs in a genetic cross between phenotypically distinct parasite clones. Significant variation in gene expression is observed through direct co-hybridizations of RNA from different P. falciparum clones. Nearly 18% of genes were regulated by a significant expression quantitative trait locus. The genetic determinants of most of these ELPs resided in hotspots that are physically distant from their targets. The most prominent regulatory locus, influencing 269 transcripts, coincided with a Chromosome 5 amplification event carrying the drug resistance gene, pfmdr1, and 13 other genes. Drug selection pressure in the Dd2 parental clone lineage led not only to a copy number change in the pfmdr1 gene but also to an increased copy number of putative neighboring regulatory factors that, in turn, broadly influence the transcriptional network. Previously unrecognized transcriptional variation, controlled by polymorphic regulatory genes and possibly master regulators within large copy number variants, contributes to sweeping phenotypic evolution in drug-resistant malaria parasites.
Li, Yingrui; Zheng, Hancheng; Luo, Ruibang
Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...
This textbook provides an authoritative introduction to both classical and coalescent approaches to population genetics. Written for graduate students and advanced undergraduates by one of the world’s leading authorities in the field, the book focuses on the theoretical background of population...... genetics, while emphasizing the close interplay between theory and empiricism. Traditional topics such as genetic and phenotypic variation, mutation, migration, and linkage are covered and advanced by contemporary coalescent theory, which describes the genealogy of genes in a population, ultimately...... connecting them to a single common ancestor. Effects of selection, particularly genomic effects, are discussed with reference to molecular genetic variation. The book is designed for students of population genetics, bioinformatics, evolutionary biology, molecular evolution, and theoretical biology—as well...
Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk
Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes . Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.
Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest. 2014 Elsevier Ltd. All rights reserved.
Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.
Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of
Song, Yun S.
Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and
Christiansen, Gunna; Andersen, H; Birkelund, Svend
DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed...... no identity or cluster formation between strains. Variation within M. hominis rRNA genes was analyzed by Southern hybridization of EcoRI-cleaved DNA hybridized with a cloned fragment of the rRNA gene from the mycoplasma strain PG50. Five of the M. hominis strains showed identical hybridization patterns....... These hybridization patterns were compared with those of 12 other mycoplasma species, which showed a much more complex band pattern. Cloned nonribosomal RNA gene fragments of M. hominis PG21 DNA were analyzed, and the fragments were used to demonstrate heterogeneity among the strains. A monoclonal antibody against...
Full Text Available Recent studies have found that copy number variations (CNVs are widespread in human and animal genomes. CNVs are a significant source of genetic variation, and have been shown to be associated with phenotypic diversity. However, the effect of CNVs on genetic variation in horses is not well understood. In the present study, CNVs in 6 different breeds of mare horses, Mongolia horse, Abaga horse, Hequ horse and Kazakh horse (all plateau breeds and Debao pony and Thoroughbred, were determined using aCGH. In total, seven hundred CNVs were identified ranging in size from 6.1 Kb to 0.57 Mb across all autosomes, with an average size of 43.08 Kb and a median size of 15.11 Kb. By merging overlapping CNVs, we found a total of three hundred and fifty-three CNV regions (CNVRs. The length of the CNVRs ranged from 6.1 Kb to 1.45 Mb with average and median sizes of 38.49 Kb and 13.1 Kb. Collectively, 13.59 Mb of copy number variation was identified among the horses investigated and accounted for approximately 0.61% of the horse genome sequence. Five hundred and eighteen annotated genes were affected by CNVs, which corresponded to about 2.26% of all horse genes. Through the gene ontology (GO, genetic pathway analysis and comparison of CNV genes among different breeds, we found evidence that CNVs involving 7 genes may be related to the adaptation to severe environment of these plateau horses. This study is the first report of copy number variations in Chinese horses, which indicates that CNVs are ubiquitous in the horse genome and influence many biological processes of the horse. These results will be helpful not only in mapping the horse whole-genome CNVs, but also to further research for the adaption to the high altitude severe environment for plateau horses.
Andersson Jan O
Full Text Available Abstract Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation.
Ku, Chee-Seng; Pawitan, Yudi; Sim, Xueling; Ong, Rick T H; Seielstad, Mark; Lee, Edmund J D; Teo, Yik-Ying; Chia, Kee-Seng; Salim, Agus
Research on the role of copy number variations (CNVs) in the genetic risk of diseases in Asian populations has been hampered by a relative lack of reference CNV maps for Asian populations outside the East Asians. In this article, we report the population characteristics of CNVs in Chinese, Malay, and Asian Indian populations in Singapore. Using the Illumina Human 1M Beadchip array, we identify 1,174 CNV loci in these populations that corroborated with findings when the same samples were typed on the Affymetrix 6.0 platform. We identify 441 novel loci not previously reported in the Database of Genomic Variations (DGV). We observe a considerable number of loci that span all three populations and were previously unreported, as well as population-specific loci that are quite common in the respective populations. From this we observe the distribution of CNVs in the Asian Indian population to be considerably different from the Chinese and Malay populations. About half of the deletion loci and three-quarters of duplication loci overlap UCSC genes. Tens of loci show population differentiation and overlap with genes previously known to be associated with genetic risk of diseases. One of these loci is the CYP2A6 deletion, previously linked to reduced susceptibility to lung cancer. (c) 2010 Wiley-Liss, Inc.
Padhukasahasram, Badri; Rannala, Bruce
Meiotic recombination occurs in the form of two different mechanisms called crossing-over and gene-conversion and both processes have an important role in shaping genetic variation in populations. Although variation in crossing-over rates has been studied extensively using sperm-typing experiments, pedigree studies and population genetic approaches, our knowledge of variation in gene-conversion parameters (ie, rates and mean tract lengths) remains far from complete. To explore variability in population gene-conversion rates and its relationship to crossing-over rate variation patterns, we have developed and validated using coalescent simulations a comprehensive Bayesian full-likelihood method that can jointly infer crossing-over and gene-conversion rates as well as tract lengths from population genomic data under general variable rate models with recombination hotspots. Here, we apply this new method to SNP data from multiple human populations and attempt to characterize for the first time the fine-scale variation in gene-conversion parameters along the human genome. We find that the estimated ratio of gene-conversion to crossing-over rates varies considerably across genomic regions as well as between populations. However, there is a great degree of uncertainty associated with such estimates. We also find substantial evidence for variation in the mean conversion tract length. The estimated tract lengths did not show any negative relationship with the local heterozygosity levels in our analysis.European Journal of Human Genetics advance online publication, 27 February 2013; doi:10.1038/ejhg.2013.30.
Full Text Available The "heterozygote instability" (HI hypothesis suggests that gene conversion events focused on heterozygous sites during meiosis locally increase the mutation rate, but this hypothesis remains largely untested. As humans left Africa they lost variability, which, if HI operates, should have reduced the mutation rate in non-Africans. Relative substitution rates were quantified in diverse humans using aligned whole genome sequences from the 1,000 genomes project. Substitution rate is consistently greater in Africans than in non-Africans, but only in diploid regions of the genome, consistent with a role for heterozygosity. Analysing the same data partitioned into a series of non-overlapping 2 Mb windows reveals a strong, non-linear correlation between the amount of heterozygosity lost "out of Africa" and the difference in substitution rate between Africans and non-Africans. Putative recent mutations, derived variants that occur only once among the 80 human chromosomes sampled, occur preferentially at the centre of 2 Kb windows that have elevated heterozygosity compared both with the same region in a closely related population and with an immediately adjacent region in the same population. More than half of all substitutions appear attributable to variation in heterozygosity. This observation provides strong support for HI with implications for many branches of evolutionary biology.
Full Text Available Abstract Background Although the Ciona intestinalis genome contains many allelic polymorphisms, there is only limited data analyzed systematically. Establishing a dense map of genetic variations in C. intestinalis is necessary not only for linkage analysis, but also for other experimental biology including molecular developmental and evolutionary studies, because animals from natural populations are typically used for experiments. Results Here, we identified over three million candidate short genomic variations within a 110 Mb euchromatin region among five C. intestinalis individuals. The average nucleotide diversity was approximately 1.1%. Genetic variations were found at a similar density in intergenic and gene regions. Non-synonymous and nonsense nucleotide substitutions were found in 12,493 and 1,214 genes accounting for 81.9% and 8.0% of the entire gene set, respectively, and over 60% of genes in the single animal encode non-identical proteins between maternal and paternal alleles. Conclusions Our results provide a framework for studying evolution of the animal genome, as well as a useful resource for a wide range of C. intestinalis researchers.
Full Text Available The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca
Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring
The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)
htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch
Ågren, J Arvid; Greiner, Stephan; Johnson, Marc T J; Wright, Stephen I
Genome size varies dramatically across species, but despite an abundance of attention there is little agreement on the relative contributions of selective and neutral processes in governing this variation. The rate of sex can potentially play an important role in genome size evolution because of its effect on the efficacy of selection and transmission of transposable elements (TEs). Here, we used a phylogenetic comparative approach and whole genome sequencing to investigate the contribution of sex and TE content to genome size variation in the evening primrose (Oenothera) genus. We determined genome size using flow cytometry for 30 species that vary in genetic system and find that variation in sexual/asexual reproduction cannot explain the almost twofold variation in genome size. Moreover, using whole genome sequences of three species of varying genome sizes and reproductive system, we found that genome size was not associated with TE abundance; instead the larger genomes had a higher abundance of simple sequence repeats. Although it has long been clear that sexual reproduction may affect various aspects of genome evolution in general and TE evolution in particular, it does not appear to have played a major role in genome size evolution in the evening primroses. © 2015 The Author(s).
Full Text Available Abstract Background Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. Results We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. Conclusions The GWAMA (Genome-Wide Association Meta-Analysis software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
Moraes, A P; Koehler, S; Cabral, J S; Gomes, S S L; Viccini, L F; Barros, F; Felix, L P; Guerra, M; Forni-Martins, E R
Orchidaceae is a widely distributed plant family with very diverse vegetative and floral morphology, and such variability is also reflected in their karyotypes. However, since only a low proportion of Orchidaceae has been analysed for chromosome data, greater diversity may await to be unveiled. Here we analyse both genome size (GS) and karyotype in two subtribes recently included in the broadened Maxillariinea to detect how much chromosome and GS variation there is in these groups and to evaluate which genome rearrangements are involved in the species evolution. To do so, the GS (14 species), the karyotype - based on chromosome number, heterochromatic banding and 5S and 45S rDNA localisation (18 species) - was characterised and analysed along with published data using phylogenetic approaches. The GS presented a high phylogenetic correlation and it was related to morphological groups in Bifrenaria (larger plants - higher GS). The two largest GS found among genera were caused by different mechanisms: polyploidy in Bifrenaria tyrianthina and accumulation of repetitive DNA in Scuticaria hadwenii. The chromosome number variability was caused mainly through descending dysploidy, and x=20 was estimated as the base chromosome number. Combining GS and karyotype data with molecular phylogeny, our data provide a more complete scenario of the karyotype evolution in Maxillariinae orchids, allowing us to suggest, besides dysploidy, that inversions and transposable elements as two mechanisms involved in the karyotype evolution. Such karyotype modifications could be associated with niche changes that occurred during species evolution. © 2016 German Botanical Society and The Royal Botanical Society of the Netherlands.
Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.
Weier, Jingly F.; Ferlatte, Christy; Weier, Heinz-Ulli G.
In the mature chorion, one of the membranes that exist during pregnancy between the developing fetus and mother, human placental cells form highly specialized tissues composed of mesenchyme and floating or anchoring villi. Using fluorescence in situ hybridization, we found that human invasive cytotrophoblasts isolated from anchoring villi or the uterine wall had gained individual chromosomes; however, chromosome losses were detected infrequently. With chromosomes gained in what appeared to be a chromosome-specific manner, more than half of the invasive cytotrophoblasts in normal pregnancies were found to be hyperdiploid. Interestingly, the rates of hyperdiploid cells depended not only on gestational age, but were strongly associated with the extraembryonic compartment at the fetal-maternal interface from which they were isolated. Since hyperdiploid cells showed drastically reduced DNA replication as measured by bromodeoxyuridine incorporation, we conclude that aneuploidy is a part of the normal process of placentation potentially limiting the proliferative capabilities of invasive cytotrophoblasts. Thus, under the special circumstances of human reproduction, somatic genomic variations may exert a beneficial, anti-neoplastic effect on the organism.
Full Text Available Abstract Background Genomic selection is an effective tool for animal and plant breeding, allowing effective individual selection without phenotypic records through the prediction of genomic breeding value (GBV. To date, genomic selection has focused on a single trait. However, actual breeding often targets multiple correlated traits, and, therefore, joint analysis taking into consideration the correlation between traits, which might result in more accurate GBV prediction than analyzing each trait separately, is suitable for multi-trait genomic selection. This would require an extension of the prediction model for single-trait GBV to multi-trait case. As the computational burden of multi-trait analysis is even higher than that of single-trait analysis, an effective computational method for constructing a multi-trait prediction model is also needed. Results We described a Bayesian regression model incorporating variable selection for jointly predicting GBVs of multiple traits and devised both an MCMC iteration and variational approximation for Bayesian estimation of parameters in this multi-trait model. The proposed Bayesian procedures with MCMC iteration and variational approximation were referred to as MCBayes and varBayes, respectively. Using simulated datasets of SNP genotypes and phenotypes for three traits with high and low heritabilities, we compared the accuracy in predicting GBVs between multi-trait and single-trait analyses as well as between MCBayes and varBayes. The results showed that, compared to single-trait analysis, multi-trait analysis enabled much more accurate GBV prediction for low-heritability traits correlated with high-heritability traits, by utilizing the correlation structure between traits, while the prediction accuracy for uncorrelated low-heritability traits was comparable or less with multi-trait analysis in comparison with single-trait analysis depending on the setting for prior probability that a SNP has zero
Chen, Pingli; Shen, Zhikang; Ming, Luchang; Li, Yibo; Dan, Wenhan; Lou, Guangming; Peng, Bo; Wu, Bian; Li, Yanhua; Zhao, Da; Gao, Guanjun; Zhang, Qinglu; Xiao, Jinghua; Li, Xianghua; Wang, Gongwei; He, Yuqing
Rice seed storage protein (SSP) is an important source of nutrition and energy. Understanding the genetic basis of SSP content and mining favorable alleles that control it will be helpful for breeding new improved cultivars. An association analysis for SSP content was performed to identify underlying genes using 527 diverse Oryza sativa accessions grown in two environments. We identified more than 107 associations for five different traits, including the contents of albumin (Alb), globulin (Glo), prolamin (Pro), glutelin (Glu), and total SSP (Total). A total of 28 associations were located at previously reported QTLs or intervals. A lead SNP sf0709447538, associated for Glu content in the indica subpopulation in 2015, was further validated in near isogenic lines NIL(Zhenshan97) and NIL(Delong208), and the Glu phenotype had significantly difference between two NILs. The association region could be target for map-based cloning of the candidate genes. There were 13 associations in regions close to grain-quality-related genes; five lead single nucleotide polymorphisms (SNPs) were located less than 20 kb upstream from grain-quality-related genes ( PG5a , Wx , AGPS2a , RP6 , and, RM1 ). Several starch-metabolism-related genes ( AGPS2a , OsACS6 , PUL , GBSSII , and ISA2 ) were also associated with SSP content. We identified favorable alleles of functional candidate genes, such as RP6 , RM1 , Wx , and other four candidate genes by haplotype analysis and expression pattern. Genotypes of RP6 and RM1 with higher Pro were not identified in japonica and exhibited much higher expression levels in indica group. The lead SNP sf0601764762, repeatedly detected for Alb content in 2 years in the whole association population, was located in the Wx locus that controls the synthesis of amylose. And Alb content was significantly and negatively correlated with amylose content and the level of 2.3 kb Wx pre-mRNA examined in this study. The associations or candidate genes identified would
Full Text Available Rice seed storage protein (SSP is an important source of nutrition and energy. Understanding the genetic basis of SSP content and mining favorable alleles that control it will be helpful for breeding new improved cultivars. An association analysis for SSP content was performed to identify underlying genes using 527 diverse Oryza sativa accessions grown in two environments. We identified more than 107 associations for five different traits, including the contents of albumin (Alb, globulin (Glo, prolamin (Pro, glutelin (Glu, and total SSP (Total. A total of 28 associations were located at previously reported QTLs or intervals. A lead SNP sf0709447538, associated for Glu content in the indica subpopulation in 2015, was further validated in near isogenic lines NIL(Zhenshan97 and NIL(Delong208, and the Glu phenotype had significantly difference between two NILs. The association region could be target for map-based cloning of the candidate genes. There were 13 associations in regions close to grain-quality-related genes; five lead single nucleotide polymorphisms (SNPs were located less than 20 kb upstream from grain-quality-related genes (PG5a, Wx, AGPS2a, RP6, and, RM1. Several starch-metabolism-related genes (AGPS2a, OsACS6, PUL, GBSSII, and ISA2 were also associated with SSP content. We identified favorable alleles of functional candidate genes, such as RP6, RM1, Wx, and other four candidate genes by haplotype analysis and expression pattern. Genotypes of RP6 and RM1 with higher Pro were not identified in japonica and exhibited much higher expression levels in indica group. The lead SNP sf0601764762, repeatedly detected for Alb content in 2 years in the whole association population, was located in the Wx locus that controls the synthesis of amylose. And Alb content was significantly and negatively correlated with amylose content and the level of 2.3 kb Wx pre-mRNA examined in this study. The associations or candidate genes identified would provide
Prabha, Ratna; Singh, Dhananjaya P; Sinha, Swati; Ahmad, Khurshid; Rai, Anil
With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes. Copyright © 2016 Elsevier B.V. All rights reserved.
Wei Tong; Qiang He; Yong-Jin Park
Mitochondrial genome variations have been detected despite the overall conservation of this gene content, which has been valuable for plant population genetics and evolutionary studies. Here, we describe mitochondrial variation architecture and our performance of a phylogenetic dissection of Korean landrace and weedy rice. A total of 4,717 variations across the mitochondrial genome were identified adjunct with 10 wild rice. Genetic diversity assessment revealed that wild rice has higher nucle...
Laughlin Rick E
Full Text Available Abstract Background The relative growth of the neocortex parallels the emergence of complex cognitive functions across species. To determine the regions of the mammalian genome responsible for natural variations in cortical volume, we conducted a complex trait analysis using 34 strains of recombinant inbred (Rl strains of mice (BXD, as well as their two parental strains (C57BL/6J and DBA/2J. We measured both neocortical volume and total brain volume in 155 coronally sectioned mouse brains that were Nissl stained and embedded in celloidin. After correction for shrinkage, the measured cortical and noncortical brain volumes were entered into a multiple regression analysis, which removed the effects of body size and age from the measurements. Marker regression and interval mapping were computed using WebQTL. Results An ANOVA revealed that more than half of the variance of these regressed phenotypes is genetically determined. We then identified the regions of the genome regulating this heritability. We located genomic regions in which a linkage disequilibrium was present using WebQTL as both a mapping engine and genomic database. For neocortex, we found a genome-wide significant quantitative trait locus (QTL on chromosome 11 (marker D11Mit19, as well as a suggestive QTL on chromosome 16 (marker D16Mit100. In contrast, for noncortex the effect of chromosome 11 was markedly reduced, and a significant QTL appeared on chromosome 19 (D19Mit22. Conclusion This classic pattern of double dissociation argues strongly for different genetic factors regulating relative cortical size, as opposed to brain volume more generally. It is likely, however, that the effects of proximal chromosome 11 extend beyond the neocortex strictly defined. An analysis of single nucleotide polymorphisms in these regions indicated that ciliary neurotrophic factor (Cntf is quite possibly the gene underlying the noncortical QTL. Evidence for a candidate gene modulating neocortical
Full Text Available Sorghum genotypes currently used for grain production in the United States were developed from African landraces that were imported starting in the mid-to-late 19(th century. Farmers and plant breeders selected genotypes for grain production with reduced plant height, early flowering, increased grain yield, adaptation to drought, and improved resistance to lodging, diseases and pests. DNA polymorphisms that distinguish three historically important grain sorghum genotypes, BTx623, BTx642 and Tx7000, were characterized by genome sequencing, genotyping by sequencing, genetic mapping, and pedigree-based haplotype analysis. The distribution and density of DNA polymorphisms in the sequenced genomes varied widely, in part because the lines were derived through breeding and selection from diverse Kafir, Durra, and Caudatum race accessions. Genomic DNA spanning dw1 (SBI-09 and dw3 (SBI-07 had identical haplotypes due to selection for reduced height. Lower SNP density in genes located in pericentromeric regions compared with genes located in euchromatic regions is consistent with background selection in these regions of low recombination. SNP density was higher in euchromatic DNA and varied >100-fold in contiguous intervals that spanned up to 300 Kbp. The localized variation in DNA polymorphism density occurred throughout euchromatic regions where recombination is elevated, however, polymorphism density was not correlated with gene density or DNA methylation. Overall, sorghum chromosomes contain distal euchromatic regions characterized by extensive, localized variation in DNA polymorphism density, and large pericentromeric regions of low gene density, diversity, and recombination.
Arquez, Moises; Uribe, Juan Esteban; Castro, Lyda Raquel
In this work we presented a comparative analysis of the mitochondrial genomes in gastropods. Nucleotide and amino acids composition was calculated and a comparative visual analysis of the start and termination codons was performed. The organization of the genome was compared calculating the number of intergenic sequences, the location of the genes and the number of reorganized genes (breakpoints) in comparison with the sequence that is presumed to be ancestral for the group. In order to calculate variations in the rates of molecular evolution within the group, the relative rate test was performed. In spite of the differences in the size of the genomes, the amino acids number is conserved. The nucleotide and amino acid composition is similar between Vetigastropoda, Ceanogastropoda and Neritimorpha in comparison to Heterobranchia and Patellogastropoda. The mitochondrial genomes of the group are very compact with few intergenic sequences, the only exception is the genome of Patellogastropoda with 26,828 bp. Start codons of the Heterobranchia and Patellogastropoda are very variable and there is also an increase in genome rearrangements for these two groups. Generally, the hypothesis of constant rates of molecular evolution between the groups is rejected, except when the genomes of Caenogastropoda and Vetigastropoda are compared.
Caporale, Lynn Helena
This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.
Keywords. insertion–deletion variations; haematological disease; tumours; human genetics. Journal of Genetics ... domly selected healthy Korean individuals using a blood genomic DNA ... Bioinformatics annotation and 3-D protein structure analysis. In this study ..... 2009 A genome-wide meta-analysis identifies. Journal of ...
Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin
Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986
Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David
Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...
Naeem, Raeece; Hidayah, Lailatul; Preston, Mark D.; Clark, Taane G.; Pain, Arnab
Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis
Zuccolo, Andrea; Sebastian, Aswathy; Talag, Jayson; Yu, Yeisoo; Kim, HyeRan; Collura, Kristi; Kudrna, Dave; Wing, Rod A
The genus Oryza is composed of 10 distinct genome types, 6 diploid and 4 polyploid, and includes the world's most important food crop - rice (Oryza sativa [AA]). Genome size variation in the Oryza is more than 3-fold and ranges from 357 Mbp in Oryza glaberrima [AA] to 1283 Mbp in the polyploid Oryza ridleyi [HHJJ]. Because repetitive elements are known to play a significant role in genome size variation, we constructed random sheared small insert genomic libraries from 12 representative Oryza species and conducted a comprehensive study of the repetitive element composition, distribution and phylogeny in this genus. Particular attention was paid to the role played by the most important classes of transposable elements (Long Terminal Repeats Retrotransposons, Long interspersed Nuclear Elements, helitrons, DNA transposable elements) in shaping these genomes and in their contributing to genome size variation. We identified the elements primarily responsible for the most strikingly genome size variation in Oryza. We demonstrated how Long Terminal Repeat retrotransposons belonging to the same families have proliferated to very different extents in various species. We also showed that the pool of Long Terminal Repeat Retrotransposons is substantially conserved and ubiquitous throughout the Oryza and so its origin is ancient and its existence predates the speciation events that originated the genus. Finally we described the peculiar behavior of repeats in the species Oryza coarctata [HHKK] whose placement in the Oryza genus is controversial. Long Terminal Repeat retrotransposons are the major component of the Oryza genomes analyzed and, along with polyploidization, are the most important contributors to the genome size variation across the Oryza genus. Two families of Ty3-gypsy elements (RIRE2 and Atlantys) account for a significant portion of the genome size variations present in the Oryza genus.
Collins, Ryan L; Brand, Harrison; Redin, Claire E.; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T.; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A.; Lucente, Diane; Levy, Brynn; Sanders, Jan-Stephan; Wapner, Ronald J.; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E.
Background: Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. Results: We sequenced 689 participants with autism spectrum disorder (ASD) and other
Lund, Bendik; Wesolowska-Andersen, Agata; Lausen, Birgitte
Objectives: To investigate association of host genomic variation and risk of infections during treatment for childhood acute lymphoblastic leukaemia (ALL). Methods: We explored association of 34 000 singlenucleotide polymorphisms (SNPs) related primarily to pharmacogenomics and immune function...
Song, Yutong; Gorbatsevych, Oleksandr; Liu, Ying; Mugavero, JoAnn; Shen, Sam H; Ward, Charles B; Asare, Emmanuel; Jiang, Ping; Paul, Aniko V; Mueller, Steffen; Wimmer, Eckard
Computer design and chemical synthesis generated viable variants of poliovirus type 1 (PV1), whose ORF (6,189 nucleotides) carried up to 1,297 "Max" mutations (excess of overrepresented synonymous codon pairs) or up to 2,104 "SD" mutations (randomly scrambled synonymous codons). "Min" variants (excess of underrepresented synonymous codon pairs) are nonviable except for P2 Min , a variant temperature-sensitive at 33 and 39.5 °C. Compared with WT PV1, P2 Min displayed a vastly reduced specific infectivity (si) (WT, 1 PFU/118 particles vs. P2 Min , 1 PFU/35,000 particles), a phenotype that will be discussed broadly. Si of haploid PV presents cellular infectivity of a single genotype. We performed a comprehensive analysis of sequence and structures of the PV genome to determine if evolutionary conserved cis-acting packaging signal(s) were preserved after recoding. We showed that conserved synonymous sites and/or local secondary structures that might play a role in determining packaging specificity do not survive codon pair recoding. This makes it unlikely that numerous "cryptic, sequence-degenerate, dispersed RNA packaging signals mapping along the entire viral genome" [Patel N, et al. (2017) Nat Microbiol 2:17098] play the critical role in poliovirus packaging specificity. Considering all available evidence, we propose a two-step assembly strategy for +ssRNA viruses: step I, acquisition of packaging specificity, either ( a ) by specific recognition between capsid protein(s) and replication proteins (poliovirus), or ( b ) by the high affinity interaction of a single RNA packaging signal (PS) with capsid protein(s) (most +ssRNA viruses so far studied); step II, cocondensation of genome/capsid precursors in which an array of hairpin structures plays a role in virion formation.
Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander
We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.
In the first study of its kind, an international team of genomics researchers has identified new regions of the human genome that are associated with skin color variation in some African populations, opening new avenues for research on skin diseases and cancer in all populations.
Abecasis, Goncalo R.; Auton, Adam; Brooks, Lisa D.
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination ...
Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.
J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.
Csala, Attila; Voorbraak, Frans P. J. M.; Zwinderman, Aeilko H.; Hof, Michel H.
Motivation: Recent technological developments have enabled the possibility of genetic and genomic integrated data analysis approaches, where multiple omics datasets from various biological levels are combined and used to describe (disease) phenotypic variations. The main goal is to explain and
Hannemann, Juliane; Meyer-Staeckling, Sönke; Kemming, Dirk; Alpers, Iris; Joosse, Simon A; Pospisil, Heike; Kurtz, Stefan; Görndt, Jennifer; Püschel, Klaus; Riethdorf, Sabine; Pantel, Klaus; Brandt, Burkhard
During cancer progression, specific genomic aberrations arise that can determine the scope of the disease and can be used as predictive or prognostic markers. The detection of specific gene amplifications or deletions in single blood-borne or disseminated tumour cells that may give rise to the development of metastases is of great clinical interest but technically challenging. In this study, we present a method for quantitative high-resolution genomic analysis of single cells. Cells were isolated under permanent microscopic control followed by high-fidelity whole genome amplification and subsequent analyses by fine tiling array-CGH and qPCR. The assay was applied to single breast cancer cells to analyze the chromosomal region centred by the therapeutical relevant EGFR gene. This method allows precise quantitative analysis of copy number variations in single cell diagnostics.
Full Text Available During cancer progression, specific genomic aberrations arise that can determine the scope of the disease and can be used as predictive or prognostic markers. The detection of specific gene amplifications or deletions in single blood-borne or disseminated tumour cells that may give rise to the development of metastases is of great clinical interest but technically challenging. In this study, we present a method for quantitative high-resolution genomic analysis of single cells. Cells were isolated under permanent microscopic control followed by high-fidelity whole genome amplification and subsequent analyses by fine tiling array-CGH and qPCR. The assay was applied to single breast cancer cells to analyze the chromosomal region centred by the therapeutical relevant EGFR gene. This method allows precise quantitative analysis of copy number variations in single cell diagnostics.
Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Full Text Available Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV, infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.
Drury, C; Dale, K E; Panlilio, J M; Miller, S V; Lirman, D; Larson, E A; Bartels, E; Crawford, D L; Oleksiak, M F
Acropora cervicornis, a threatened, keystone reef-building coral has undergone severe declines (>90 %) throughout the Caribbean. These declines could reduce genetic variation and thus hamper the species' ability to adapt. Active restoration strategies are a common conservation approach to mitigate species' declines and require genetic data on surviving populations to efficiently respond to declines while maintaining the genetic diversity needed to adapt to changing conditions. To evaluate active restoration strategies for the staghorn coral, the genetic diversity of A. cervicornis within and among populations was assessed in 77 individuals collected from 68 locations along the Florida Reef Tract (FRT) and in the Dominican Republic. Genotyping by Sequencing (GBS) identified 4,764 single nucleotide polymorphisms (SNPs). Pairwise nucleotide differences (π) within a population are large (~37 %) and similar to π across all individuals. This high level of genetic diversity along the FRT is similar to the diversity within a small, isolated reef. Much of the genetic diversity (>90 %) exists within a population, yet GBS analysis shows significant variation along the FRT, including 300 SNPs with significant FST values and significant divergence relative to distance. There are also significant differences in SNP allele frequencies over small spatial scales, exemplified by the large FST values among corals collected within Miami-Dade county. Large standing diversity was found within each population even after recent declines in abundance, including significant, potentially adaptive divergence over short distances. The data here inform conservation and management actions by uncovering population structure and high levels of diversity maintained within coral collections among sites previously shown to have little genetic divergence. More broadly, this approach demonstrates the power of GBS to resolve differences among individuals and identify subtle genetic structure
Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan
mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...
Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua
Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
James W Kijas
Full Text Available The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability.
Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W
The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants. Copyright © 2015 Jun et al.
Full Text Available The complex process of allopolyploid speciation includes various mechanisms ranging from species crosses and hybrid genome doubling to genome alterations and the establishment of new allopolyploids as persisting natural entities. Currently, little is known about the genetic mechanisms that underlie hybrid genome doubling, despite the fact that natural allopolyploid formation is highly dependent on this phenomenon. We examined the genetic basis for the spontaneous genome doubling of triploid F1 hybrids between the direct ancestors of allohexaploid common wheat (Triticum aestivum L., AABBDD genome, namely Triticumturgidum L. (AABB genome and Aegilopstauschii Coss. (DD genome. An Ae. tauschii intraspecific lineage that is closely related to the D genome of common wheat was identified by population-based analysis. Two representative accessions, one that produces a high-genome-doubling-frequency hybrid when crossed with a T. turgidum cultivar and the other that produces a low-genome-doubling-frequency hybrid with the same cultivar, were chosen from that lineage for further analyses. A series of investigations including fertility analysis, immunostaining, and quantitative trait locus (QTL analysis showed that (1 production of functional unreduced gametes through nonreductional meiosis is an early step key to successful hybrid genome doubling, (2 first division restitution is one of the cytological mechanisms that cause meiotic nonreduction during the production of functional male unreduced gametes, and (3 six QTLs in the Ae. tauschii genome, most of which likely regulate nonreductional meiosis and its subsequent gamete production processes, are involved in hybrid genome doubling. Interlineage comparisons of Ae. tauschii's ability to cause hybrid genome doubling suggested an evolutionary model for the natural variation pattern of the trait in which non-deleterious mutations in six QTLs may have important roles. The findings of this study demonstrated
Majeský, Ľuboš; Schwarzacher, Trude; Gornall, Richard; Heslop-Harrison, Pat
Chloroplast DNA sequences show substantial variation between higher plant species, and less variation within species, so are typically excellent markers to investigate evolutionary, population and genetic relationships and phylogenies. We sequenced the plastomes of Taraxacum obtusifrons Markl. (O978); T. stridulum Trávniček ined. (S3); and T. amplum Markl. (A978), three apomictic triploid (2n = 3x = 24) dandelions from the T. officinale agg. We aimed to characterize the variation in plastomes, define relationships and correlations with the apomictic microspecies status, and refine placement of the microspecies in the evolutionary or phylogenetic context of the Asteraceae. The chloroplast genomes of accessions O978 and S3 were identical and 151,322 bp long (where the nuclear genes are known to show variation), while A978 was 151,349 bp long. All three genomes contained 135 unique genes, with an additional copy of the trnF-GGA gene in the LSC region and 20 duplicated genes in the IR region, along with short repeats, the typical major Inverted Repeats (IR1 and IR2, 24,431bp long), and Large and Small Single Copy regions (LSC 83,889bp and SSC 18,571bp in O978). Between the two Taraxacum plastomes types, we identified 28 SNPs. The distribution of polymorphisms suggests some parts of the Taraxacum plastome are evolving at a slower rate. There was a hemi-nested inversion in the LSC region that is common to Asteraceae, and an SSC inversion from ndhF to rps15 found only in some Asteraceae lineages. A comparative repeat analysis showed variation between Taraxacum and the phylogenetically close genus Lactuca, with many more direct repeats of 40bp or more in Lactuca (1% larger plastome than Taraxacum). When individual genes and non-coding regions were for Asteraceae phylogeny reconstruction, not all showed the same evolutionary scenario suggesting care is needed for interpretation of relationships if a limited number of markers are used. Studying genotypic diversity in
Wang, Jing; He, Ximiao; Ruan, Jue
Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...
Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.
Nagirnaja, Liina; Palta, Priit; Kasak, Laura
Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs an...
Full Text Available Recently, many studies in livestock have focused on the identification of Copy Number Variants (CNVs using high-density Single Nucleotide Polymorphism (SNP arrays, but few have focused on studying chicken ecotypes coming from many locations. CNVs are polymorphisms, which may influence phenotype and are an important source of genetic variation in populations. The aim of this study was to explore the genetic difference and structure, using a high density SNP chip in 936 individuals from seven different countries (Brazil, Italy, Egypt, Mexico, Rwanda, Sri Lanka and Uganda. The DNA was genotyped with the Affymetrix Axiom®600k Chicken Genotyping Array and processed with stringent quality controls to obtain 559,201 SNPs in 915 individuals. The Log R Ratio (LRR and the B Allele Frequency of SNPs were used to perform the CNV calling with PennCNV software based on a Hidden Markov Model analysis and the LRR was used to perform CNV detection with SVS Golden Helix software.After filtering, a total of 19,027 CNVs were detected with the SVS software, while 9,065 CNVs were identified with the Penn CNV software. The CNVs were summarized in 7,001 Copy Number Variant Regions (CNVRs and 4,414 CNVRs, using the software BedTool.The consensus analysis across the CNVRs allowed the identification of 2,820 consensus CNVR, of which 1,721 were gain, 637 loss and 462 complex, for a total length of 53 Mb corresponding to the 5 % of the GalGal5 chicken autosomes. Only the consensus CNV regions obtained from both detections were considered for further analysis.The intersection analysis performed between the chicken gene database (Gallus_gallus-5.0 and the 1,927 consensus CNVRs allowed the identification (within or partial overlap of a total of 2,354 unique genes with an official gene ID. The CNVRs identified here represent the first comprehensive mapping in several worldwide populations, using a high-density SNP chip.
Cao, Minh Duc; Allison, Lloyd; Dix, Trevor
Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
Brown, D W; Butchko, R A E; Proctor, R H
Fusarium verticillioides (teleomorph Gibberella moniliformis) can be either an endophyte of maize, causing no visible disease, or a pathogen-causing disease of ears, stalks, roots and seedlings. At any stage, this fungus can synthesize fumonisins, a family of mycotoxins structurally similar to the sphingolipid sphinganine. Ingestion of fumonisin-contaminated maize has been associated with a number of animal diseases, including cancer in rodents, and exposure has been correlated with human oesophageal cancer in some regions of the world, and some evidence suggests that fumonisins are a risk factor for neural tube defects. A primary goal of the authors' laboratory is to eliminate fumonisin contamination of maize and maize products. Understanding how and why these toxins are made and the F. verticillioides-maize disease process will allow one to develop novel strategies to limit tissue destruction (rot) and fumonisin production. To meet this goal, genomic sequence data, expressed sequence tags (ESTs) and microarrays are being used to identify F. verticillioides genes involved in the biosynthesis of toxins and plant pathogenesis. This paper describes the current status of F. verticillioides genomic resources and three approaches being used to mine microarray data from a wild-type strain cultured in liquid fumonisin production medium for 12, 24, 48, 72, 96 and 120h. Taken together, these approaches demonstrate the power of microarray technology to provide information on different biological processes.
Urban Alexander E
Full Text Available Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC, a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read. All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions. A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models. This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions. We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole
Full Text Available Melampsora larici-populina is a fungal pathogen responsible for foliar rust disease on poplar trees, which causes damage to forest plantations worldwide, particularly in Northern Europe. The reference genome of the isolate 98AG31 was previously sequenced using a whole genome shotgun strategy, revealing a large genome of 101 megabases containing 16,399 predicted genes, which included secreted protein genes representing poplar rust candidate effectors. In the present study, the genomes of 15 isolates collected over the past 20 years throughout the French territory, representing distinct virulence profiles, were characterized by massively parallel sequencing to assess genetic variation in the poplar rust fungus. Comparison to the reference genome revealed striking structural variations. Analysis of coverage and sequencing depth identified large missing regions between isolates related to the mating type loci. More than 611,824 single-nucleotide polymorphism (SNP positions were uncovered overall, indicating a remarkable level of polymorphism. Based on the accumulation of non-synonymous substitutions in coding sequences and the relative frequencies of synonymous and non-synonymous polymorphisms (i.e. PN/PS, we identify candidate genes that may be involved in fungal pathogenesis. Correlation between non-synonymous SNPs in genes encoding secreted proteins and pathotypes of the studied isolates revealed candidate genes potentially related to virulences 1, 6 and 8 of the poplar rust fungus.
Yuhki, Naoya; O'Brien, S.J.
The major histocompatibility complex (MHC) is a multigene complex of tightly linked homologous genes that encode cell surface antigens that play a key role in immune regulation and response to foreign antigens. In most species, MHC gene products display extreme antigenic polymorphism, and their variability has been interpreted to reflect an adaptive strategy for accommodating rapidly evolving infectious agents that periodically afflict natural populations. Determination of the extent of MHC variation has been limited to populations in which skin grafting is feasible or for which serological reagents have been developed. The authors present here a quantitative analysis of restriction fragment length polymorphism of MHC class I genes in several mammalian species (cats, rodents, humans) known to have very different levels of genetic diversity based on functional MHC assays and on allozyme surveys. When homologous class I probes were employed, a notable concordance was observed between the extent of MHC restriction fragment variation and functional MHC variation detected by skin grafts or genome-wide diversity estimated by allozyme screens. These results confirm the genetically depauperate character of the African cheetah, Acinonyx jubatus, and the Asiatic lion, Panthera leo persica; further, they support the use of class I MHC molecular reagents in estimating the extent and character of genetic diversity in natural populations
Simon, Lauriane; Rabanal, Fernando A; Dubos, Tristan; Oliver, Cecilia; Lauber, Damien; Poulet, Axel; Vogt, Alexander; Mandlbauer, Ariane; Le Goff, Samuel; Sommer, Andreas; Duborjal, Hervé; Tatout, Christophe; Probst, Aline V
Organized in tandem repeat arrays in most eukaryotes and transcribed by RNA polymerase III, expression of 5S rRNA genes is under epigenetic control. To unveil mechanisms of transcriptional regulation, we obtained here in depth sequence information on 5S rRNA genes from the Arabidopsis thaliana genome and identified differential enrichment in epigenetic marks between the three 5S rDNA loci situated on chromosomes 3, 4 and 5. We reveal the chromosome 5 locus as the major source of an atypical, long 5S rRNA transcript characteristic of an open chromatin structure. 5S rRNA genes from this locus translocated in the Landsberg erecta ecotype as shown by linkage mapping and chromosome-specific FISH analysis. These variations in 5S rDNA locus organization cause changes in the spatial arrangement of chromosomes in the nucleus. Furthermore, 5S rRNA gene arrangements are highly dynamic with alterations in chromosomal positions through translocations in certain mutants of the RNA-directed DNA methylation pathway and important copy number variations among ecotypes. Finally, variations in 5S rRNA gene sequence, chromatin organization and transcripts indicate differential usage of 5S rDNA loci in distinct ecotypes. We suggest that both the usage of existing and new 5S rDNA loci resulting from translocations may impact neighboring chromatin organization.
Yuhki, Naoya; O' Brien, S.J. (National Cancer Institute, Frederick, MD (USA))
The major histocompatibility complex (MHC) is a multigene complex of tightly linked homologous genes that encode cell surface antigens that play a key role in immune regulation and response to foreign antigens. In most species, MHC gene products display extreme antigenic polymorphism, and their variability has been interpreted to reflect an adaptive strategy for accommodating rapidly evolving infectious agents that periodically afflict natural populations. Determination of the extent of MHC variation has been limited to populations in which skin grafting is feasible or for which serological reagents have been developed. The authors present here a quantitative analysis of restriction fragment length polymorphism of MHC class I genes in several mammalian species (cats, rodents, humans) known to have very different levels of genetic diversity based on functional MHC assays and on allozyme surveys. When homologous class I probes were employed, a notable concordance was observed between the extent of MHC restriction fragment variation and functional MHC variation detected by skin grafts or genome-wide diversity estimated by allozyme screens. These results confirm the genetically depauperate character of the African cheetah, Acinonyx jubatus, and the Asiatic lion, Panthera leo persica; further, they support the use of class I MHC molecular reagents in estimating the extent and character of genetic diversity in natural populations.
Zagradišnik, Boris; Krgović, Danijela; Herodež, Špela Stangler; Zagorac, Andreja; Ćižmarević, Bogdan; Vokač, Nadja Kokalj
Copy number variations (CNSs) of large genomic regions are an important mechanism implicated in the development of head and neck cancer, however, for most changes their exact role is not well understood. The aim of this study was to find possible associations between gains/losses of genomic regions and clinically distinct subgroups of head and neck cancer patients. Array comparative genomic hybridization (aCGH) analysis was performed on DNA samples in 64 patients with cancer in oral cavity, oropharynx or hypopharynx. Overlapping genomic regions created from gains and losses were used for statistical analysis. Following regions were overrepresented: in tumors with stage I or II a gain of 2.98 Mb on 6p21.2-p11 and a gain of 7.4 Mb on 8q11.1-q11.23; in tumors with grade I histology a gain of 1.1 Mb on 8q24.13, a loss of a large part of p arm of chromosome 3, a loss of a 1.24 Mb on 6q14.3, and a loss of terminal 32 Mb region of 8p23.3; in cases with affected lymph nodes a gain of 0.75 Mb on 3q24, and a gain of 0.9 Mb on 3q26.32-q26.33; in cases with unaffected lymph nodes a gain of 1.1 Mb on 8q23.3, in patients not treated with surgery a gain of 12.2 Mb on 7q21.3-q22.3 and a gain of 0.33 Mb on 20q11.22. Our study identified several genomic regions of interest which appear to be associated with various clinically distinct subgroups of head and neck cancer. They represent a potentially important source of biomarkers useful for the clinical management of head and neck cancer. In particular, the PIK3CA and AGTR1 genes could be singled out to predict the lymph node involvement.
Full Text Available Fen Pan,1 Hong Zhang,1 Xiaoyan Dong,2 Weixing Ye,3 Ping He,4 Shulin Zhang,4 Jeff Xianchao Zhu,5 Nanbert Zhong1,2,6 1Department of Clinical Laboratory, Shanghai Children’s Hospital, Shanghai Jiaotong University, Shanghai, China; 2Department of Respiratory, Shanghai Children’s Hospital, Shanghai Jiaotong University, Shanghai, China; 3Shanghai Personal Biotechnology Co., Ltd, Shanghai, China; 4Department of Medical Microbiology and Immunology, Shanghai Jiao Tong University School of Medicine, Shanghai, China; 5Zhejiang Bioruida Biotechnology co. Ltd, Zhejiang, China; 6New York State Institute for Basic Research in Developmental Disabilities, Staten Island, NY, USA Introduction: Multidrug resistance in Streptococcus pneumoniae has emerged as a serious problem to public health. A further understanding of the genetic diversity in antibiotic-resistant S. pneumoniae isolates is needed. Methods: We conducted whole-genome resequencing for 25 pneumococcal strains isolated from children with different antimicrobial resistance profiles. Comparative analysis focus on detection of single-nucleotide polymorphisms (SNPs and insertions and deletions (indels was conducted. Moreover, phylogenetic analysis was applied to investigate the genetic relationship among these strains. Results: The genome size of the isolates was ~2.1 Mbp, covering >90% of the total estimated size of the reference genome. The overall G+C% content was ~39.5%, and there were 2,200–2,400 open reading frames. All isolates with different drug resistance profiles harbored many indels (range 131–171 and SNPs (range 16,103–28,128. Genetic diversity analysis showed that the variation of different genes were associated with specific antibiotic resistance. Known antibiotic resistance genes (pbps, murMN, ciaH, rplD, sulA, and dpr were identified, and new genes (regR, argH, trkH, and PTS-EII closely related with antibiotic resistance were found, although these genes were primarily annotated
Coll, Francesc; Preston, Mark; Guerra-Assunç ã o, José Afonso; Hill-Cawthorn, Grant; Harris, David; Perdigã o, Joã o; Viveiros, Miguel; Portugal, Isabel; Drobniewski, Francis; Gagneux, Sebastien; Glynn, Judith R.; Pain, Arnab; Parkhill, Julian; McNerney, Ruth; Martin, Nigel; Clark, Taane G.
://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest
Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark
Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.
Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.
Full Text Available Human endogenous retroviruses (HERV sequences account for about 8% of the human genome. Through comparative genomics and literature mining, we identified a total of 29 human-specific HERV-K insertions. We characterized them focusing on their structure and flanking sequence. The results showed that four of the human-specific HERV-K insertions deleted human genomic sequences via non-classical insertion mechanisms. Interestingly, two of the human-specific HERV-K insertion loci contained two HERV-K internals and three LTR elements, a pattern which could be explained by LTR-LTR ectopic recombination or template switching. In addition, we conducted a polymorphic test and observed that twelve out of the 29 elements are polymorphic in the human population. In conclusion, human-specific HERV-K elements have inserted into human genome since the divergence of human and chimpanzee, causing human genomic changes. Thus, we believe that human-specific HERV-K activity has contributed to the genomic divergence between humans and chimpanzees, as well as within the human population.
Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.
Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins. Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...
Joseph, Bindu; Corwin, Jason A.; Kliebenstein, Daniel J.
Recent studies are starting to show that genetic control over stochastic variation is a key evolutionary solution of single celled organisms in the face of unpredictable environments. This has been expanded to show that genetic variation can alter stochastic variation in transcriptional processes within multi-cellular eukaryotes. However, little is known about how genetic diversity can control stochastic variation within more non-cell autonomous phenotypes. Using an Arabidopsis reciprocal RIL population, we showed that there is significant genetic diversity influencing stochastic variation in the plant metabolome, defense chemistry, and growth. This genetic diversity included loci specific for the stochastic variation of each phenotypic class that did not affect the other phenotypic classes or the average phenotype. This suggests that the organism's networks are established so that noise can exist in one phenotypic level like metabolism and not permeate up or down to different phenotypic levels. Further, the genomic variation within the plastid and mitochondria also had significant effects on the stochastic variation of all phenotypic classes. The genetic influence over stochastic variation within the metabolome was highly metabolite specific, with neighboring metabolites in the same metabolic pathway frequently showing different levels of noise. As expected from bet-hedging theory, there was more genetic diversity and a wider range of stochastic variation for defense chemistry than found for primary metabolism. Thus, it is possible to begin dissecting the stochastic variation of whole organismal phenotypes in multi-cellular organisms. Further, there are loci that modulate stochastic variation at different phenotypic levels. Finding the identity of these genes will be key to developing complete models linking genotype to phenotype. PMID:25569687
Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P
While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.
Cui, Peng; Ding, Feng; Lin, Qiang; Zhang, Lingfang; Li, Ang; Zhang, Zhang; Hu, Songnian; Yu, Jun
Here, we evaluate the contribution of two major biological processes—DNA replication and transcription—to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.
Here, we evaluate the contribution of two major biological processes—DNA replication and transcription—to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.
Zhang, Ke; Zhang, Li-Jie; Fang, Ya-Hong; Jin, Xin-Na; Qi, Lei; Wu, Xue-Chang; Zheng, Dao-Qiong
Genomic structural variation (GSV) is a ubiquitous phenomenon observed in the genomes of Saccharomyces cerevisiae strains with different genetic backgrounds; however, the physiological and phenotypic effects of GSV are not well understood. Here, we first revealed the genetic characteristics of a widely used industrial S. cerevisiae strain, ZTW1, by whole genome sequencing. ZTW1 was identified as an aneuploidy strain and a large-scale GSV was observed in the ZTW1 genome compared with the genome of a diploid strain YJS329. These GSV events led to copy number variations (CNVs) in many chromosomal segments as well as one whole chromosome in the ZTW1 genome. Changes in the DNA dosage of certain functional genes directly affected their expression levels and the resultant ZTW1 phenotypes. Moreover, CNVs of large chromosomal regions triggered an aneuploidy stress in ZTW1. This stress decreased the proliferation ability and tolerance of ZTW1 to various stresses, while aneuploidy response stress may also provide some benefits to the fermentation performance of the yeast, including increased fermentation rates and decreased byproduct generation. This work reveals genomic characters of the bioethanol S. cerevisiae strain ZTW1 and suggests that GSV is an important kind of mutation that changes the traits of industrial S. cerevisiae strains. © FEMS 2016. All rights reserved. For permissions, please e-mail: email@example.com.
Castro, Mariana; Castro, Sílvia; Loureiro, João
In the last decade, genomic studies using DNA markers have strongly influenced the current phylogeny of angiosperms. Genome size and ploidy level have contributed to this discussion, being considered important characters in biosystematics, ecology and population biology. Despite the recent increase in studies related to genome size evolution and polyploidy incidence, only a few are available for Scrophulariaceae. In this context, we assessed the value of genome size, mostly as a taxonomic marker, and the role of polyploidy as a process of genesis and maintenance of plant diversity in Scrophulariaceae sensu lato in the Iberian Peninsula. Large-scale analyses of genome size and ploidy-level variation across the Iberian Peninsula were performed using flow cytometry. One hundred and sixty-two populations of 59 distinct taxa were analysed. A bibliographic review on chromosome counts was also performed. From the 59 sampled taxa, 51 represent first estimates of genome size. The majority of the Scrophulariaceae species presented very small to small genome sizes (2C ≤ 7.0 pg). Furthermore, in most of the analysed genera it was possible to use this character to separate several taxa, independently if these genera were homoploid or heteroploid. Also, some genome-related phenomena were detected, such as intraspecific variation of genome size in some genera and the possible occurrence of dysploidy in Verbascum spp. With respect to polyploidy, despite a few new DNA ploidy levels having been detected in Veronica, no multiple cytotypes have been found in any taxa. This work contributed with important basic scientific knowledge on genome size and polyploid incidence in the Scrophulariaceae, providing important background information for subsequent studies, with several perspectives for future studies being opened.
María Clara Arteaga
Full Text Available The present dataset comprises 36,931 SNPs genotyped in 46 maize landraces native to Mexico as well as the teosinte subspecies Zea maiz ssp. parviglumis and ssp. mexicana. These landraces were collected directly from farmers mostly between 2006 and 2010. We accompany these data with a short description of the variation within each landrace, as well as maps, principal component analyses and neighbor joining trees showing the distribution of the genetic diversity relative to landrace, geographical features and maize biogeography. High levels of genetic variation were detected for the maize landraces (HE = 0.234 to 0.318 (mean 0.311, while slightly lower levels were detected in Zea m. mexicana and Zea m. parviglumis (HE = 0.262 and 0.234, respectively. The distribution of genetic variation was better explained by environmental variables given by the interaction of altitude and latitude than by landrace identity. This dataset is a follow up product of the Global Native Maize Project, an initiative to update the data on Mexican maize landraces and their wild relatives, and to generate information that is necessary for implementing the Mexican Biosafety Law. Keywords: Maize, Teosinte, Maize SNP50K BeadChip, Mexican landraces, Proyecto Global de Maíces Nativos
Arteaga, María Clara; Moreno-Letelier, Alejandra; Mastretta-Yanes, Alicia; Vázquez-Lobo, Alejandra; Breña-Ochoa, Alejandra; Moreno-Estrada, Andrés; Eguiarte, Luis E.; Piñero, Daniel
The present dataset comprises 36,931 SNPs genotyped in 46 maize landraces native to Mexico as well as the teosinte subspecies Zea maiz ssp. parviglumis and ssp. mexicana. These landraces were collected directly from farmers mostly between 2006 and 2010. We accompany these data with a short description of the variation within each landrace, as well as maps, principal component analyses and neighbor joining trees showing the distribution of the genetic diversity relative to landrace, geographical features and maize biogeography. High levels of genetic variation were detected for the maize landraces (HE = 0.234 to 0.318 (mean 0.311), while slightly lower levels were detected in Zea m. mexicana and Zea m. parviglumis (HE = 0.262 and 0.234, respectively). The distribution of genetic variation was better explained by environmental variables given by the interaction of altitude and latitude than by landrace identity. This dataset is a follow up product of the Global Native Maize Project, an initiative to update the data on Mexican maize landraces and their wild relatives, and to generate information that is necessary for implementing the Mexican Biosafety Law. PMID:26981357
Postmus, Iris; Warren, Helen R.; Trompet, Stella; Arsenault, Benoit J.; Avery, Christy L.; Bis, Joshua C.; Chasman, Daniel I.; de Keyser, Catherine E.; Deshmukh, Harshal A.; Evans, Daniel S.; Feng, QiPing; Li, Xiaohui; Smit, Roelof A. J.; Smith, Albert V.; Sun, Fangui; Taylor, Kent D.; Arnold, Alice M.; Barnes, Michael R.; Barratt, Bryan J.; Betteridge, John; Boekholdt, S. Matthijs; Boerwinkle, Eric; Buckley, Brendan M.; Chen, Y.-D. Ida; de Craen, Anton J. M.; Cummings, Steven R.; Denny, Joshua C.; Dubé, Marie Pierre; Durrington, Paul N.; Eiriksdottir, Gudny; Ford, Ian; Guo, Xiuqing; Harris, Tamara B.; Heckbert, Susan R.; Hofman, Albert; Hovingh, G. Kees; Kastelein, John J. P.; Launer, Leonore J.; Liu, Ching-Ti; Liu, Yongmei; Lumley, Thomas; McKeigue, Paul M.; Munroe, Patricia B.; Neil, Andrew; Nickerson, Deborah A.; Nyberg, Fredrik; O'Brien, Eoin; O'Donnell, Christopher J.; Post, Wendy; Poulter, Neil; Vasan, Ramachandran S.; Rice, Kenneth; Rich, Stephen S.; Rivadeneira, Fernando; Sattar, Naveed; Sever, Peter; Shaw-Hawkins, Sue; Shields, Denis C.; Slagboom, P. Eline; Smith, Nicholas L.; Smith, Joshua D.; Sotoodehnia, Nona; Stanton, Alice; Stott, David J.; Stricker, Bruno H.; Stürmer, Til; Uitterlinden, André G.; Wei, Wei-Qi; Westendorp, Rudi G. J.; Whitsel, Eric A.; Wiggins, Kerri L.; Wilke, Russell A.; Ballantyne, Christie M.; Colhoun, Helen M.; Cupples, L. Adrienne; Franco, Oscar H.; Gudnason, Vilmundur; Hitman, Graham; Palmer, Colin N. A.; Psaty, Bruce M.; Ridker, Paul M.; Stafford, Jeanette M.; Stein, Charles M.; Tardif, Jean-Claude; Caulfield, Mark J.; Jukema, J. Wouter; Rotter, Jerome I.; Krauss, Ronald M.
In addition to lowering low density lipoprotein cholesterol (LDL-C), statin therapy also raises high density lipoprotein cholesterol (HDL-C) levels. Inter-individual variation in HDL-C response to statins may be partially explained by genetic variation. We performed a meta-analysis of genome-wide
Al-Mezel, Saleh Abdullah R; Ansari, Qamrul Hasan
""There is a real need for this book. It is useful for people who work in areas of nonlinear analysis, optimization theory, variational inequalities, and mathematical economics.""-Nan-Jing Huang, Sichuan University, Chengdu, People's Republic of China
Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G B
To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Ribeiro, Ângela M; Zepeda-Mendoza, M Lisandra; Bertelsen, Mads F; Kristensen, Annemarie T; Jarvis, Erich D; Gilbert, M Thomas P; da Fonseca, Rute R
Hemostasis is a defense mechanism that enhances an organism's survival by minimizing blood loss upon vascular injury. In vertebrates, hemostasis has been evolving with the cardio-vascular and hemodynamic systems over the last 450 million years. Birds and mammals have very similar vascular and hemodynamic systems, thus the mechanism that blocks ruptures in the vasculature is expected to be the same. However, the speed of the process varies across vertebrates, and is particularly slow for birds. Understanding the differences in the hemostasis pathway between birds and mammals, and placing them in perspective to other vertebrates may provide clues to the genetic contribution to variation in blood clotting phenotype in vertebrates. We compiled genomic data corresponding to key elements involved in hemostasis across vertebrates to investigate its genetic basis and understand how it affects fitness. We found that: i) fewer genes are involved in hemostasis in birds compared to mammals; and ii) the largest differences concern platelet membrane receptors and components from the kallikrein-kinin system. We propose that lack of the cytoplasmic domain of the GPIb receptor subunit alpha could be a strong contributor to the prolonged bleeding phenotype in birds. Combined analysis of laboratory assessments of avian hemostasis with the first avian phylogeny based on genomic-scale data revealed that differences in hemostasis within birds are not explained by phylogenetic relationships, but more so by genetic variation underlying components of the hemostatic process, suggestive of natural selection. This work adds to our understanding of the evolution of hemostasis in vertebrates. The overlap with the inflammation, complement and renin-angiotensin (blood pressure regulation) pathways is a potential driver of rapid molecular evolution in the hemostasis network. Comparisons between avian species and mammals allowed us to hypothesize that the observed mammalian innovations might have
Background Thoroughbred racehorses are subject to non-traumatic distal limb bone fractures that occur during racing and exercise. Susceptibility to fracture may be due to underlying disturbances in bone metabolism which have a genetic cause. Fracture risk has been shown to be heritable in several species but this study is the first genetic analysis of fracture risk in the horse. Results Fracture cases (n = 269) were horses that sustained catastrophic distal limb fractures while racing on UK racecourses, necessitating euthanasia. Control horses (n = 253) were over 4 years of age, were racing during the same time period as the cases, and had no history of fracture at the time the study was carried out. The horses sampled were bred for both flat and National Hunt (NH) jump racing. 43,417 SNPs were employed to perform a genome-wide association analysis and to estimate the proportion of genetic variance attributable to the SNPs on each chromosome using restricted maximum likelihood (REML). Significant genetic variation associated with fracture risk was found on chromosomes 9, 18, 22 and 31. Three SNPs on chromosome 18 (62.05 Mb – 62.15 Mb) and one SNP on chromosome 1 (14.17 Mb) reached genome-wide significance (p fracture than cases, p = 1 × 10-4), while a second haplotype increases fracture risk (cases at 3.39 times higher risk of fracture than controls, p = 0.042). Conclusions Fracture risk in the Thoroughbred horse is a complex condition with an underlying genetic basis. Multiple genomic regions contribute to susceptibility to fracture risk. This suggests there is the potential to develop SNP-based estimators for genetic risk of fracture in the Thoroughbred racehorse, using methods pioneered in livestock genetics such as genomic selection. This information would be useful to racehorse breeders and owners, enabling them to reduce the risk of injury in their horses. PMID:24559379
Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor
Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.
Hou, Yong; Wu, Kui; Shi, Xulian
methods, focusing particularly on variations detection. Low-coverage whole-genome sequencing revealed that DOP-PCR had the highest duplication ratio, but an even read distribution and the best reproducibility and accuracy for detection of copy-number variations (CNVs). However, MDA had significantly...... performance using SCRS amplified by different WGA methods. It will guide researchers to determine which WGA method is best suited to individual experimental needs at single-cell level....
Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing; O'Connor, Timothy D; Abecasis, Gonçalo R; Wojcik, Genevieve L; Gignoux, Christopher R; Gourraud, Pierre-Antoine; Lizee, Antoine; Hansen, Mark; Genuario, Rob; Bullis, Dave; Lawley, Cindy; Kenny, Eimear E; Bustamante, Carlos; Beaty, Terri H; Mathias, Rasika A; Barnes, Kathleen C; Qin, Zhaohui S
A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an 'African Diaspora Power Chip' (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry.
Pemberton, Trevor J; Szpiech, Zachary A
Genomic regions of autozygosity (ROAs) represent segments of individual genomes that are homozygous for haplotypes inherited identical-by-descent (IBD) from a common ancestor. ROAs are nonuniformly distributed across the genome, and increased ROA levels are a reported risk factor for numerous complex diseases. Previously, we hypothesized that long ROAs are enriched for deleterious homozygotes as a result of young haplotypes with recent deleterious mutations-relatively untouched by purifying selection-being paired IBD as a consequence of recent parental relatedness, a pattern supported by ROA and whole-exome sequence data on 27 individuals. Here, we significantly bolster support for our hypothesis and expand upon our original analyses using ROA and whole-genome sequence data on 2,436 individuals from The 1000 Genomes Project. Considering CADD deleteriousness scores, we reaffirm our previous observation that long ROAs are enriched for damaging homozygotes worldwide. We show that strongly damaging homozygotes experience greater enrichment than weaker damaging homozygotes, while overall enrichment varies appreciably among populations. Mendelian disease genes and those encoding FDA-approved drug targets have significantly increased rates of gain in damaging homozygotes with increasing ROA coverage relative to all other genes. In genes implicated in eight complex phenotypes for which ROA levels have been identified as a risk factor, rates of gain in damaging homozygotes vary across phenotypes and populations but frequently differ significantly from non-disease genes. These findings highlight the potential confounding effects of population background in the assessment of associations between ROA levels and complex disease risk, which might underlie reported inconsistencies in ROA-phenotype associations. Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
van den Broek, M; Bolat, I; Nijkamp, J F; Ramos, E; Luttik, M A H; Koopman, F; Geertman, J M; de Ridder, D; Pronk, J T; Daran, J-M
Lager brewing strains of Saccharomyces pastorianus are natural interspecific hybrids originating from the spontaneous hybridization of Saccharomyces cerevisiae and Saccharomyces eubayanus. Over the past 500 years, S. pastorianus has been domesticated to become one of the most important industrial microorganisms. Production of lager-type beers requires a set of essential phenotypes, including the ability to ferment maltose and maltotriose at low temperature, the production of flavors and aromas, and the ability to flocculate. Understanding of the molecular basis of complex brewing-related phenotypic traits is a prerequisite for rational strain improvement. While genome sequences have been reported, the variability and dynamics of S. pastorianus genomes have not been investigated in detail. Here, using deep sequencing and chromosome copy number analysis, we showed that S. pastorianus strain CBS1483 exhibited extensive aneuploidy. This was confirmed by quantitative PCR and by flow cytometry. As a direct consequence of this aneuploidy, a massive number of sequence variants was identified, leading to at least 1,800 additional protein variants in S. pastorianus CBS1483. Analysis of eight additional S. pastorianus strains revealed that the previously defined group I strains showed comparable karyotypes, while group II strains showed large interstrain karyotypic variability. Comparison of three strains with nearly identical genome sequences revealed substantial chromosome copy number variation, which may contribute to strain-specific phenotypic traits. The observed variability of lager yeast genomes demonstrates that systematic linking of genotype to phenotype requires a three-dimensional genome analysis encompassing physical chromosomal structures, the copy number of individual chromosomes or chromosomal regions, and the allelic variation of copies of individual genes. Copyright © 2015, van den Broek et al.
Weng, Mao-Lun; Ruhlman, Tracey A; Gibby, Mary; Jansen, Robert K
The phylogeny of 58 Pelargonium species was estimated using five plastid markers (rbcL, matK, ndhF, rpoC1, trnL-F) and one mitochondrial gene (nad5). The results confirmed the monophyly of three major clades and four subclades within Pelargonium but also indicate the need to revise some sectional classifications. This phylogeny was used to examine karyotype evolution in the genus: plotting chromosome sizes, numbers and 2C-values indicates that genome size is significantly correlated with chromosome size but not number. Accelerated rates of nucleotide substitution have been previously detected in both plastid and mitochondrial genes in Pelargonium, but sparse taxon sampling did not enable identification of the phylogenetic distribution of these elevated rates. Using the multigene phylogeny as a constraint, we investigated lineage- and locus-specific heterogeneity of substitution rates in Pelargonium for an expanded number of taxa and demonstrated that both plastid and mitochondrial genes have had accelerated substitution rates but with markedly disparate patterns. In the plastid, the exons of rpoC1 have significantly accelerated substitution rates compared to its intron and the acceleration was mainly due to nonsynonymous substitutions. In contrast, the mitochondrial gene, nad5, experienced substantial acceleration of synonymous substitution rates in three internal branches of Pelargonium, but this acceleration ceased in all terminal branches. Several lineages also have dN/dS ratios significantly greater than one for rpoC1, indicating that positive selection is acting on this gene, whereas the accelerated synonymous substitutions in the mitochondrial gene are the result of elevated mutation rates. Published by Elsevier Inc.
Full Text Available Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%. K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs. Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential.
Lin, Dongdong; Zhang, Jigang; Li, Jingyao; Calhoun, Vince D; Deng, Hong-Wen; Wang, Yu-Ping
The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature
van der Vyver C
Full Text Available Christell van der Vyver1, B Juan Vorster2, Karl J Kunert3, Christopher A Cullis41Institute for Plant Biotechnology, Department of Genetics, University of Stellenbosch, Stellenbosch, South Africa; 2Department of Plant Production and Soil Science, and 3Department of Plant Science, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa; 4Case Western Reserve University, Department of Biology, Cleveland, OH, USAAbstract: Seeds from an inbred Vigna unguiculata (cowpea cultivar were gamma-irradiated with a dose of 180 Gy in order to identify and characterize possible mutations. Three techniques, ie, random amplified polymorphic DNA, microsatellites, and representational difference analysis, were used to characterize possible DNA variation among the mutants and nonirradiated control plants both immediately after irradiation and in subsequent generations. A large portion of putative radiation-induced genome changes had significant similarities to chloroplast sequences. The frequency of mutation at three of these isolated polymorphic regions with chloroplast similarity was further determined by polymerase chain reaction screening using a large number of individual parental, M1, and M2 plants. Analysis of these sequences indicated that the rate at which various regions of the genome is mutated in irradiation experiments differs significantly and also that mutations have variable “repair” rates. Furthermore, regions of the nuclear DNA derived from the chloroplast genome are highly susceptible to modification by radiation treatment. Overall, data have provided detailed information on the effects of gamma irradiation on the cowpea genome and about the ability of the plant to repair these genome changes in subsequent plant generations.Keywords: mutation breeding, gamma radiation, genetic mutations, cowpea, representational difference analysis
Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries...... these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination...... and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations...
Wallberg, Andreas; Glémin, Sylvain; Webster, Matthew T.
Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC), which we infer to generate an allele fixation bias 5 – 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution. PMID:25902173
Full Text Available A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP. Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Full Text Available Various approaches can be applied to uncover the genetic basis of natural phenotypic variation, each with their specific strengths and limitations. Here, we use a replicated genome-wide association approach (Pool-GWAS to fine-scale map genomic regions contributing to natural variation in female abdominal pigmentation in Drosophila melanogaster, a trait that is highly variable in natural populations and highly heritable in the laboratory. We examined abdominal pigmentation phenotypes in approximately 8000 female European D. melanogaster, isolating 1000 individuals with extreme phenotypes. We then used whole-genome Illumina sequencing to identify single nucleotide polymorphisms (SNPs segregating in our sample, and tested these for associations with pigmentation by contrasting allele frequencies between replicate pools of light and dark individuals. We identify two small regions near the pigmentation genes tan and bric-à-brac 1, both corresponding to known cis-regulatory regions, which contain SNPs showing significant associations with pigmentation variation. While the Pool-GWAS approach suffers some limitations, its cost advantage facilitates replication and it can be applied to any non-model system with an available reference genome.
Bastide, Héloïse; Betancourt, Andrea; Nolte, Viola; Tobler, Raymond; Stöbe, Petra; Futschik, Andreas; Schlötterer, Christian
Various approaches can be applied to uncover the genetic basis of natural phenotypic variation, each with their specific strengths and limitations. Here, we use a replicated genome-wide association approach (Pool-GWAS) to fine-scale map genomic regions contributing to natural variation in female abdominal pigmentation in Drosophila melanogaster, a trait that is highly variable in natural populations and highly heritable in the laboratory. We examined abdominal pigmentation phenotypes in approximately 8000 female European D. melanogaster, isolating 1000 individuals with extreme phenotypes. We then used whole-genome Illumina sequencing to identify single nucleotide polymorphisms (SNPs) segregating in our sample, and tested these for associations with pigmentation by contrasting allele frequencies between replicate pools of light and dark individuals. We identify two small regions near the pigmentation genes tan and bric-à-brac 1, both corresponding to known cis-regulatory regions, which contain SNPs showing significant associations with pigmentation variation. While the Pool-GWAS approach suffers some limitations, its cost advantage facilitates replication and it can be applied to any non-model system with an available reference genome.
Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui; Kim, Su Yeon; Korneliussen, Thorfinn; Vinckenbosch, Nicolas; Tian, Geng; Huerta-Sanchez, Emilia; Feder, Alison F; Grarup, Niels; Jørgensen, Torben; Jiang, Tao; Witte, Daniel R; Sandbæk, Annelli; Hellmann, Ines; Lauritzen, Torsten; Hansen, Torben; Pedersen, Oluf; Wang, Jun; Nielsen, Rasmus
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.
Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.
Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494
Amaradasa, B Sajeewa; Everhart, Sydney E
Pathogen exposure to sublethal doses of fungicides may result in mutations that may represent an important and largely overlooked mechanism of introducing new genetic variation into strictly clonal populations, including acquisition of fungicide resistance. We tested this hypothesis using the clonal plant pathogen, Sclerotinia sclerotiorum. Nine susceptible isolates were exposed independently to five commercial fungicides with different modes of action: boscalid (respiration inhibitor), iprodione (unclear mode of action), thiophanate methyl (inhibition of microtubulin synthesis) and azoxystrobin and pyraclostrobin (quinone outside inhibitors). Mycelium of each isolate was inoculated onto a fungicide gradient and sub-cultured from the 50-100% inhibition zone for 12 generations and experiment repeated. Mutational changes were assessed for all isolates at six neutral microsatellite (SSR) loci and for a subset of isolates using amplified fragment length polymorphisms (AFLPs). SSR analysis showed 12 of 85 fungicide-exposed isolates had a total of 127 stepwise mutations with 42 insertions and 85 deletions. Most stepwise deletions were in iprodione- and azoxystrobin-exposed isolates (n = 40/85 each). Estimated mutation rates were 1.7 to 60-fold higher for mutated loci compared to that expected under neutral conditions. AFLP genotyping of 33 isolates (16 non-exposed control and 17 fungicide exposed) generated 602 polymorphic alleles. Cluster analysis with principal coordinate analysis (PCoA) and discriminant analysis of principal components (DAPC) identified fungicide-exposed isolates as a distinct group from non-exposed control isolates (PhiPT = 0.15, P = 0.001). Dendrograms based on neighbor-joining also supported allelic variation associated with fungicide-exposure. Fungicide sensitivity of isolates measured throughout both experiments did not show consistent trends. For example, eight isolates exposed to boscalid had higher EC50 values at the end of the experiment, and
B Sajeewa Amaradasa
Full Text Available Pathogen exposure to sublethal doses of fungicides may result in mutations that may represent an important and largely overlooked mechanism of introducing new genetic variation into strictly clonal populations, including acquisition of fungicide resistance. We tested this hypothesis using the clonal plant pathogen, Sclerotinia sclerotiorum. Nine susceptible isolates were exposed independently to five commercial fungicides with different modes of action: boscalid (respiration inhibitor, iprodione (unclear mode of action, thiophanate methyl (inhibition of microtubulin synthesis and azoxystrobin and pyraclostrobin (quinone outside inhibitors. Mycelium of each isolate was inoculated onto a fungicide gradient and sub-cultured from the 50-100% inhibition zone for 12 generations and experiment repeated. Mutational changes were assessed for all isolates at six neutral microsatellite (SSR loci and for a subset of isolates using amplified fragment length polymorphisms (AFLPs. SSR analysis showed 12 of 85 fungicide-exposed isolates had a total of 127 stepwise mutations with 42 insertions and 85 deletions. Most stepwise deletions were in iprodione- and azoxystrobin-exposed isolates (n = 40/85 each. Estimated mutation rates were 1.7 to 60-fold higher for mutated loci compared to that expected under neutral conditions. AFLP genotyping of 33 isolates (16 non-exposed control and 17 fungicide exposed generated 602 polymorphic alleles. Cluster analysis with principal coordinate analysis (PCoA and discriminant analysis of principal components (DAPC identified fungicide-exposed isolates as a distinct group from non-exposed control isolates (PhiPT = 0.15, P = 0.001. Dendrograms based on neighbor-joining also supported allelic variation associated with fungicide-exposure. Fungicide sensitivity of isolates measured throughout both experiments did not show consistent trends. For example, eight isolates exposed to boscalid had higher EC50 values at the end of the
Amaradasa, B. Sajeewa
Pathogen exposure to sublethal doses of fungicides may result in mutations that may represent an important and largely overlooked mechanism of introducing new genetic variation into strictly clonal populations, including acquisition of fungicide resistance. We tested this hypothesis using the clonal plant pathogen, Sclerotinia sclerotiorum. Nine susceptible isolates were exposed independently to five commercial fungicides with different modes of action: boscalid (respiration inhibitor), iprodione (unclear mode of action), thiophanate methyl (inhibition of microtubulin synthesis) and azoxystrobin and pyraclostrobin (quinone outside inhibitors). Mycelium of each isolate was inoculated onto a fungicide gradient and sub-cultured from the 50–100% inhibition zone for 12 generations and experiment repeated. Mutational changes were assessed for all isolates at six neutral microsatellite (SSR) loci and for a subset of isolates using amplified fragment length polymorphisms (AFLPs). SSR analysis showed 12 of 85 fungicide-exposed isolates had a total of 127 stepwise mutations with 42 insertions and 85 deletions. Most stepwise deletions were in iprodione- and azoxystrobin-exposed isolates (n = 40/85 each). Estimated mutation rates were 1.7 to 60-fold higher for mutated loci compared to that expected under neutral conditions. AFLP genotyping of 33 isolates (16 non-exposed control and 17 fungicide exposed) generated 602 polymorphic alleles. Cluster analysis with principal coordinate analysis (PCoA) and discriminant analysis of principal components (DAPC) identified fungicide-exposed isolates as a distinct group from non-exposed control isolates (PhiPT = 0.15, P = 0.001). Dendrograms based on neighbor-joining also supported allelic variation associated with fungicide-exposure. Fungicide sensitivity of isolates measured throughout both experiments did not show consistent trends. For example, eight isolates exposed to boscalid had higher EC50 values at the end of the experiment
Kristoffer T Bæk
Full Text Available Staphylococcus aureus strains of the 8325 lineage, especially 8325-4 and derivatives lacking prophage, have been used extensively for decades of research. We report herein the results of our deep sequence analysis of strain 8325-4. Assignment of sequence variants compared with the reference strain 8325 (NRS77/PS47 required correction of errors in the 8325 reference genome, and reassessment of variation previously attributed to chemical mutagenesis of the restriction-defective RN4220. Using an extensive strain pedigree analysis, we discovered that 8325-4 contains 16 single nucleotide polymorphisms (SNP arising prior to the construction of RN4220. We identified 5 indels in 8325-4 compared with 8325. Three indels correspond to expected Φ11, 12, 13 excisions, one indel is explained by a sequence assembly artifact, and the final indel (Δ63bp in the spa-sarS intergenic region is common to only a sub-lineage of 8325-4 strains including SH1000. This deletion was found to significantly decrease (75% steady state sarS but not spa transcript levels in post-exponential phase. The sub-lineage 8325-4 was also found to harbor 4 additional SNPs. We also found large sequence variation between 8325, 8325-4 and RN4220 in a cluster of repetitive hypothetical proteins (SA0282 homologs near the Ess secretion cluster. The overall 8325-4 SNP set results in 17 alterations within coding sequences. Remarkably, we discovered that all tested strains of the 8325-4 lineage lack phenol soluble modulin α3 (PSMα3, a virulence determinant implicated in neutrophil chemotaxis, biofilm architecture and surface spreading. Collectively, our results clarify and define the 8325-4 pedigree and reveal clear evidence that mutations existing throughout all branches of this lineage, including the widely used RN6390 and SH1000 strains, could conceivably impact virulence regulation.
Josep M Comeron
Full Text Available The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS. Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and
Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X
PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.
ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong
We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.
Stephen B Montgomery
Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.
Mohanty, Sujata; Khanna, Radhika
Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.
McGuire Patrick E
chromosomal regions. The net effect of these factors in T. aestivum is large variation in diversity among genomes and chromosomes, which impacts the development of SNP markers and their practical utility. Accumulation of new mutations in older polyploid species, such as wild emmer, results in increased diversity and its more uniform distribution across the genome.
Posey, Jennifer E; Harel, Tamar; Liu, Pengfei; Rosenfeld, Jill A; James, Regis A; Coban Akdemir, Zeynep H; Walkiewicz, Magdalena; Bi, Weimin; Xiao, Rui; Ding, Yan; Xia, Fan; Beaudet, Arthur L; Muzny, Donna M; Gibbs, Richard A; Boerwinkle, Eric; Eng, Christine M; Sutton, V Reid; Shaw, Chad A; Plon, Sharon E; Yang, Yaping; Lupski, James R
Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes. We conducted a retrospective analysis of data from a series of 7374 consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology. A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P=1.77×10 -7 ). In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes encoding proteins that interact within
Li, Jie; Wang, Shunli; Li, Shanshan; Ge, Pei; Li, Xiaohui; Ma, Wujun; Zeller, F J; Hsam, Sai L K; Yan, Yueming
The α-gliadins are associated with human celiac disease. A total of 23 noninterrupted full open reading frame α-gliadin genes and 19 pseudogenes were cloned and sequenced from C, M, N, and U genomes of four diploid Aegilops species. Sequence comparison of α-gliadin genes from Aegilops and Triticum species demonstrated an existence of extensive allelic variations in Gli-2 loci of the four Aegilops genomes. Specific structural features were found including the compositions and variations of two polyglutamine domains (QI and QII) and four T cell stimulatory toxic epitopes. The mean numbers of glutamine residues in the QI domain in C and N genomes and the QII domain in C, N, and U genomes were much higher than those in Triticum genomes, and the QI domain in C and N genomes and the QII domain in C, M, N, and U genomes displayed greater length variations. Interestingly, the types and numbers of four T cell stimulatory toxic epitopes in α-gliadins from the four Aegilops genomes were significantly less than those from Triticum A, B, D, and their progenitor genomes. Relationships between the structural variations of the two polyglutamine domains and the distributions of four T cell stimulatory toxic epitopes were found, resulting in the α-gliadin genes from the Aegilops and Triticum genomes to be classified into three groups.
Full Text Available The association between severity of illness of children with osteomyelitis caused by Methicillin-resistant Staphylococcus aureus (MRSA and genomic variation of the causative organism has not been previously investigated. The purpose of this study is to assess genomic heterogeneity among MRSA isolates from children with osteomyelitis who have diverse severity of illness.Children with osteomyelitis were prospectively studied between 2010 and 2011. Severity of illness of the affected children was determined from clinical and laboratory parameters. MRSA isolates were analyzed with next generation sequencing (NGS and optical mapping. Sequence data was used for multi-locus sequence typing (MLST, phylogenetic analysis by maximum likelihood (PAML, and identification of virulence genes and single nucleotide polymorphisms (SNP relative to reference strains.The twelve children studied demonstrated severity of illness scores ranging from 0 (mild to 9 (severe. All isolates were USA300, ST 8, SCC mec IVa MRSA by MLST. The isolates differed from reference strains by 2 insertions (40 Kb each and 2 deletions (10 and 25 Kb but had no rearrangements or copy number variations. There was a higher occurrence of virulence genes among study isolates when compared to the reference strains (p = 0.0124. There were an average of 11 nonsynonymous SNPs per strain. PAML demonstrated heterogeneity of study isolates from each other and from the reference strains.Genomic heterogeneity exists among MRSA isolates causing osteomyelitis among children in a single community. These variations may play a role in the pathogenesis of variation in clinical severity among these children.
Kato, Mamoru; Kawaguchi, Takahisa; Ishikawa, Shumpei; Umeda, Takayoshi; Nakamichi, Reiichiro; Shapero, Michael H; Jones, Keith W; Nakamura, Yusuke; Aburatani, Hiroyuki; Tsunoda, Tatsuhiko
Copy number variations (CNVs) are universal genetic variations, and their association with disease has been increasingly recognized. We designed high-density microarrays for CNVs, and detected 3000-4000 CNVs (4-6% of the genomic sequence) per population that included CNVs previously missed because of smaller sizes and residing in segmental duplications. The patterns of CNVs across individuals were surprisingly simple at the kilo-base scale, suggesting the applicability of a simple genetic analysis for these genetic loci. We utilized the probabilistic theory to determine integer copy numbers of CNVs and employed a recently developed phasing tool to estimate the population frequencies of integer copy number alleles and CNV-SNP haplotypes. The results showed a tendency toward a lower frequency of CNV alleles and that most of our CNVs were explained only by zero-, one- and two-copy alleles. Using the estimated population frequencies, we found several CNV regions with exceptionally high population differentiation. Investigation of CNV-SNP linkage disequilibrium (LD) for 500-900 bi- and multi-allelic CNVs per population revealed that previous conflicting reports on bi-allelic LD were unexpectedly consistent and explained by an LD increase correlated with deletion-allele frequencies. Typically, the bi-allelic LD was lower than SNP-SNP LD, whereas the multi-allelic LD was somewhat stronger than the bi-allelic LD. After further investigation of tag SNPs for CNVs, we conclude that the customary tagging strategy for disease association studies can be applicable for common deletion CNVs, but direct interrogation is needed for other types of CNVs.
Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Herrera, Javier; Talavera, Salvador
Studying the spatial distribution of cytotypes and genome size in plants can provide valuable information about the evolution of polyploid complexes. Here, the spatial distribution of cytological races and the amount of DNA in Dianthus broteri, an Iberian carnation with several ploidy levels, is investigated. Sample chromosome counts and flow cytometry (using propidium iodide) were used to determine overall genome size (2C value) and ploidy level in 244 individuals of 25 populations. Both fresh and dried samples were investigated. Differences in 2C and 1Cx values among ploidy levels within biogeographical provinces were tested using ANOVA. Geographical correlations of genome size were also explored. Extensive variation in chromosomes numbers (2n = 2x = 30, 2n = 4x = 60, 2n = 6x = 90 and 2n = 12x =180) was detected, and the dodecaploid cytotype is reported for the first time in this genus. As regards cytotype distribution, six populations were diploid, 11 were tetraploid, three were hexaploid and five were dodecaploid. Except for one diploid population containing some triploid plants (2n = 45), the remaining populations showed a single cytotype. Diploids appeared in two disjunct areas (south-east and south-west), and so did tetraploids (although with a considerably wider geographic range). Dehydrated leaf samples provided reliable measurements of DNA content. Genome size varied significantly among some cytotypes, and also extensively within diploid (up to 1.17-fold) and tetraploid (1.22-fold) populations. Nevertheless, variations were not straightforwardly congruent with ecology and geographical distribution. Dianthus broteri shows the highest diversity of cytotypes known to date in the genus Dianthus. Moreover, some cytotypes present remarkable internal genome size variation. The evolution of the complex is discussed in terms of autopolyploidy, with primary and secondary contact zones.
Leong-Škorničková, Jana; Šída, Otakar; Jarolímová, Vlasta; Sabu, Mamyil; Fér, Tomáš; Trávníček, Pavel; Suda, Jan
Background and Aims Genome size and chromosome numbers are important cytological characters that significantly influence various organismal traits. However, geographical representation of these data is seriously unbalanced, with tropical and subtropical regions being largely neglected. In the present study, an investigation was made of chromosomal and genome size variation in the majority of Curcuma species from the Indian subcontinent, and an assessment was made of the value of these data for taxonomic purposes. Methods Genome size of 161 homogeneously cultivated plant samples classified into 51 taxonomic entities was determined by propidium iodide flow cytometry. Chromosome numbers were counted in actively growing root tips using conventional rapid squash techniques. Key Results Six different chromosome counts (2n = 22, 42, 63, >70, 77 and 105) were found, the last two representing new generic records. The 2C-values varied from 1·66 pg in C. vamana to 4·76 pg in C. oligantha, representing a 2·87-fold range. Three groups of taxa with significantly different homoploid genome sizes (Cx-values) and distinct geographical distribution were identified. Five species exhibited intraspecific variation in nuclear DNA content, reaching up to 15·1 % in cultivated C. longa. Chromosome counts and genome sizes of three Curcuma-like species (Hitchenia caulina, Kaempferia scaposa and Paracautleya bhatii) corresponded well with typical hexaploid (2n = 6x = 42) Curcuma spp. Conclusions The basic chromosome number in the majority of Indian taxa (belonging to subgenus Curcuma) is x = 7; published counts correspond to 6x, 9x, 11x, 12x and 15x ploidy levels. Only a few species-specific C-values were found, but karyological and/or flow cytometric data may support taxonomic decisions in some species alliances with morphological similarities. Close evolutionary relationships among some cytotypes are suggested based on the similarity in homoploid genome sizes and geographical grouping
Machado, Heather E; Bergland, Alan O; O'Brien, Katherine R; Behrman, Emily L; Schmidt, Paul S; Petrov, Dmitri A
Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster. © 2015 John Wiley & Sons Ltd.
Mocellin, Simone; Tropea, Saveria; Benna, Clara; Rossi, Carlo Riccardo
Dysfunction of the circadian clock and single polymorphisms of some circadian genes have been linked to cancer susceptibility, although data are scarce and findings inconsistent. We aimed to investigate the association between circadian pathway genetic variation and risk of developing common cancers based on the findings of genome-wide association studies (GWASs). Single nucleotide polymorphisms (SNPs) of 17 circadian genes reported by three GWAS meta-analyses dedicated to breast (Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) Consortium; cases, n = 15,748; controls, n = 18,084), prostate (Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE) Consortium; cases, n = 14,160; controls, n = 12,724) and lung carcinoma (Transdisciplinary Research In Cancer of the Lung (TRICL) Consortium; cases, n = 12,160; controls, n = 16,838) in patients of European ancestry were utilized to perform pathway analysis by means of the adaptive rank truncated product (ARTP) method. Data were also available for the following subgroups: estrogen receptor negative breast cancer, aggressive prostate cancer, squamous lung carcinoma and lung adenocarcinoma. We found a highly significant statistical association between circadian pathway genetic variation and the risk of breast (pathway P value = 1.9 × 10 -6 ; top gene RORA, gene P value = 0.0003), prostate (pathway P value = 4.1 × 10 -6 ; top gene ARNTL, gene P value = 0.0002) and lung cancer (pathway P value = 6.9 × 10 -7 ; top gene RORA, gene P value = 2.0 × 10 -6 ), as well as all their subgroups. Out of 17 genes investigated, 15 were found to be significantly associated with the risk of cancer: four genes were shared by all three malignancies (ARNTL, CLOCK, RORA and RORB), two by breast and lung cancer (CRY1 and CRY2) and three by prostate and lung cancer (NPAS2, NR1D1 and PER3), whereas four genes were specific for lung cancer
Full Text Available Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs in Chinese.Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs. We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height.We detected 10 significant association signals for height (p<0.05 in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41 × 10(-4 was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048. Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L, was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators.Our findings suggest the important genetic variants underlying height variation in Chinese.
Full Text Available Cardiovascular diseases are a large contributor to causes of early death in developed countries. Some of these conditions, such as sudden cardiac death and atrial fibrillation, stem from arrhythmias—a spectrum of conditions with abnormal electrical activity in the heart. Genome-wide association studies can identify single nucleotide variations (SNVs that may predispose individuals to developing acquired forms of arrhythmias. Through manual curation of published genome-wide association studies, we have collected a comprehensive list of 75 SNVs associated with cardiac arrhythmias. Ten of the SNVs result in amino acid changes and can be used in proteomic-based detection methods. In an effort to identify additional non-synonymous mutations that affect the proteome, we analyzed the post-translational modification S-nitrosylation, which is known to affect cardiac arrhythmias. We identified loss of seven known S-nitrosylation sites due to non-synonymous single nucleotide variations (nsSNVs. For predicted nitrosylation sites we found 1429 proteins where the sites are modified due to nsSNV. Analysis of the predicted S-nitrosylation dataset for over- or under-representation (compared to the complete human proteome of pathways and functional elements shows significant statistical over-representation of the blood coagulation pathway. Gene Ontology (GO analysis displays statistically over-represented terms related to muscle contraction, receptor activity, motor activity, cystoskeleton components, and microtubule activity. Through the genomic and proteomic context of SNVs and S-nitrosylation sites presented in this study, researchers can look for variation that can predispose individuals to cardiac arrhythmias. Such attempts to elucidate mechanisms of arrhythmia thereby add yet another useful parameter in predicting susceptibility for cardiac diseases.
Kegel, Jessica U; John, Uwe; Valentin, Klaus; Frickenhaus, Stephan
Emiliania huxleyi, a key player in the global carbon cycle is one of the best studied coccolithophores with respect to biogeochemical cycles, climatology, and host-virus interactions. Strains of E. huxleyi show phenotypic plasticity regarding growth behaviour, light-response, calcification, acidification, and virus susceptibility. This phenomenon is likely a consequence of genomic differences, or transcriptomic responses, to environmental conditions or threats such as viral infections. We used an E. huxleyi genome microarray based on the sequenced strain CCMP1516 (reference strain) to perform comparative genomic hybridizations (CGH) of 16 E. huxleyi strains of different geographic origin. We investigated the genomic diversity and plasticity and focused on the identification of genes related to virus susceptibility and coccolith production (calcification). Among the tested 31940 gene models a core genome of 14628 genes was identified by hybridization among 16 E. huxleyi strains. 224 probes were characterized as specific for the reference strain CCMP1516. Compared to the sequenced E. huxleyi strain CCMP1516 variation in gene content of up to 30 percent among strains was observed. Comparison of core and non-core transcripts sets in terms of annotated functions reveals a broad, almost equal functional coverage over all KOG-categories of both transcript sets within the whole annotated genome. Within the variable (non-core) genome we identified genes associated with virus susceptibility and calcification. Genes associated with virus susceptibility include a Bax inhibitor-1 protein, three LRR receptor-like protein kinases, and mitogen-activated protein kinase. Our list of transcripts associated with coccolith production will stimulate further research, e.g. by genetic manipulation. In particular, the V-type proton ATPase 16 kDa proteolipid subunit is proposed to be a plausible target gene for further calcification studies.
Full Text Available Majority of the 38 known cat species are classified as small and they inhabit five of the seven continents. They survive in a vast range of habitats but still 12 out of the 18 threatened felids are small cats. However, there has not been enough progress in the field of small cat research as they generally get overshadowed by the charismatic big cats. Here we attempt to create a resource for small cat research especially of the genus Felis which has six species out of which two are classified as vulnerable by IUCN and at least one more is at risk. We collected tissue samples of four Felis chaus (Jungle cat from central India and used available whole genome sequences of nine individuals from four other Felis species, two individuals of Prionailurus bengalensis and an Otocolobus manul. These whole genome sequences were filtered and aligned with the already published domestic cat (Felis catus genome assembly. Felids are closely related species and reads from all species in our study aligned with the domestic cat genome with a rate of at least 93%. We estimated the existing genomic variation by calculating heterozygous SNP encounter rate. So far, it seems that all wild cats have more genetic variation than Felis catus species. This can be attributed to the inbreeding in these cats. Among the wild cats, Felis silvestris seems to have the highest level of genetic variation. To understand the reasons behind the distribution of genetic variation in small cats, we estimated the demographic histories of each of the species using PSMC. This method can only detect demographic changes more than 1000 generations ago. We observe that roughly all species share a parallel history in terms of population increase. The most interesting and important feature might be that all wild small cat population sizes increased exponentially around twenty thousand years ago as opposed to domestic cat and big cats which declined around this time. Another interesting feature of
Raman, Gurusamy; Park, SeonJoo
Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.
Full Text Available Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC region (82,805 bp, with some variations in the inverted repeat region A (IRA/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19 was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA and ribosomal protein subunit L23 (rpl23 genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.
Kasperaviciūte, Dalia; Catarino, Claudia B; Heinzen, Erin L; Depondt, Chantal; Cavalleri, Gianpiero L; Caboclo, Luis O; Tate, Sarah K; Jamnadas-Khoda, Jenny; Chinthapalli, Krishna; Clayton, Lisa M S; Shianna, Kevin V; Radtke, Rodney A; Mikati, Mohamad A; Gallentine, William B; Husain, Aatif M; Alhusaini, Saud; Leppert, David; Middleton, Lefkos T; Gibson, Rachel A; Johnson, Michael R; Matthews, Paul M; Hosford, David; Heuser, Kjell; Amos, Leslie; Ortega, Marcos; Zumsteg, Dominik; Wieser, Heinz-Gregor; Steinhoff, Bernhard J; Krämer, Günter; Hansen, Jörg; Dorn, Thomas; Kantanen, Anne-Mari; Gjerstad, Leif; Peuralinna, Terhi; Hernandez, Dena G; Eriksson, Kai J; Kälviäinen, Reetta K; Doherty, Colin P; Wood, Nicholas W; Pandolfo, Massimo; Duncan, John S; Sander, Josemir W; Delanty, Norman; Goldstein, David B; Sisodiya, Sanjay M
Partial epilepsies have a substantial heritability. However, the actual genetic causes are largely unknown. In contrast to many other common diseases for which genetic association-studies have successfully revealed common variants associated with disease risk, the role of common variation in partial epilepsies has not yet been explored in a well-powered study. We undertook a genome-wide association-study to identify common variants which influence risk for epilepsy shared amongst partial epilepsy syndromes, in 3445 patients and 6935 controls of European ancestry. We did not identify any genome-wide significant association. A few single nucleotide polymorphisms may warrant further investigation. We exclude common genetic variants with effect sizes above a modest 1.3 odds ratio for a single variant as contributors to genetic susceptibility shared across the partial epilepsies. We show that, at best, common genetic variation can only have a modest role in predisposition to the partial epilepsies when considered across syndromes in Europeans. The genetic architecture of the partial epilepsies is likely to be very complex, reflecting genotypic and phenotypic heterogeneity. Larger meta-analyses are required to identify variants of smaller effect sizes (odds ratio<1.3) or syndrome-specific variants. Further, our results suggest research efforts should also be directed towards identifying the multiple rare variants likely to account for at least part of the heritability of the partial epilepsies. Data emerging from genome-wide association-studies will be valuable during the next serious challenge of interpreting all the genetic variation emerging from whole-genome sequencing studies.
Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Full Text Available Brain arteriovenous malformations (BAVM are clusters of abnormal blood vessels, with shunting of blood from the arterial to venous circulation and a high risk of rupture and intracranial hemorrhage. Most BAVMs are sporadic, but also occur in patients with Hereditary Hemorrhagic Telangiectasia, a Mendelian disorder caused by mutations in genes in the transforming growth factor beta (TGFβ signaling pathway.To investigate whether copy number variations (CNVs contribute to risk of sporadic BAVM, we performed a genome-wide association study in 371 sporadic BAVM cases and 563 healthy controls, all Caucasian. Cases and controls were genotyped using the Affymetrix 6.0 array. CNVs were called using the PennCNV and Birdsuite algorithms and analyzed via segment-based and gene-based approaches. Common and rare CNVs were evaluated for association with BAVM.A CNV region on 1p36.13, containing the neuroblastoma breakpoint family, member 1 gene (NBPF1, was significantly enriched with duplications in BAVM cases compared to controls (P = 2.2×10(-9; NBPF1 was also significantly associated with BAVM in gene-based analysis using both PennCNV and Birdsuite. We experimentally validated the 1p36.13 duplication; however, the association did not replicate in an independent cohort of 184 sporadic BAVM cases and 182 controls (OR = 0.81, P = 0.8. Rare CNV analysis did not identify genes significantly associated with BAVM.We did not identify common CNVs associated with sporadic BAVM that replicated in an independent cohort. Replication in larger cohorts is required to elucidate the possible role of common or rare CNVs in BAVM pathogenesis.
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.
The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640
Full Text Available Abstract Background The goat (Capra hircus represents one of the most important farm animal species. It is reared in all continents with an estimated world population of about 800 million of animals. Despite its importance, studies on the goat genome are still in their infancy compared to those in other farm animal species. Comparative mapping between cattle and goat showed only a few rearrangements in agreement with the similarity of chromosome banding. We carried out a cross species cattle-goat array comparative genome hybridization (aCGH experiment in order to identify copy number variations (CNVs in the goat genome analysing animals of different breeds (Saanen, Camosciata delle Alpi, Girgentana, and Murciano-Granadina using a tiling oligonucleotide array with ~385,000 probes designed on the bovine genome. Results We identified a total of 161 CNVs (an average of 17.9 CNVs per goat, with the largest number in the Saanen breed and the lowest in the Camosciata delle Alpi goat. By aggregating overlapping CNVs identified in different animals we determined CNV regions (CNVRs: on the whole, we identified 127 CNVRs covering about 11.47 Mb of the virtual goat genome referred to the bovine genome (0.435% of the latter genome. These 127 CNVRs included 86 loss and 41 gain and ranged from about 24 kb to about 1.07 Mb with a mean and median equal to 90,292 bp and 49,530 bp, respectively. To evaluate whether the identified goat CNVRs overlap with those reported in the cattle genome, we compared our results with those obtained in four independent cattle experiments. Overlapping between goat and cattle CNVRs was highly significant (P Conclusions We describe a first map of goat CNVRs. This provides information on a comparative basis with the cattle genome by identifying putative recurrent interspecies CNVs between these two ruminant species. Several goat CNVs affect genes with important biological functions. Further studies are needed to evaluate the
Full Text Available High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia. The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as “crucial substances” in subgingival plaque, which may
Zhang, Yifei; Zhen, Min; Zhan, Yalin; Song, Yeqing; Zhang, Qian; Wang, Jinfeng
High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia . The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as "crucial substances" in subgingival plaque, which may reflect changes in
Full Text Available Abstract Background Studies of the Xenopus organizer have laid the foundation for our understanding of the conserved signaling pathways that pattern vertebrate embryos during gastrulation. The two primary activities of the organizer, BMP and Wnt inhibition, can regulate a spectrum of genes that pattern essentially all aspects of the embryo during gastrulation. As our knowledge of organizer signaling grows, it is imperative that we begin knitting together our gene-level knowledge into genome-level signaling models. The goal of this paper was to identify complete lists of genes regulated by different aspects of organizer signaling, thereby providing a deeper understanding of the genomic mechanisms that underlie these complex and fundamental signaling events. Results To this end, we ectopically overexpress Noggin and Dkk-1, inhibitors of the BMP and Wnt pathways, respectively, within ventral tissues. After isolating embryonic ventral halves at early and late gastrulation, we analyze the transcriptional response to these molecules within the generated ectopic organizers using oligonucleotide microarrays. An efficient statistical analysis scheme, combined with a new Gene Ontology biological process annotation of the Xenopus genome, allows reliable and faithful clustering of molecules based upon their roles during gastrulation. From this data, we identify new organizer-related expression patterns for 19 genes. Moreover, our data sub-divides organizer genes into separate head and trunk organizing groups, which each show distinct responses to Noggin and Dkk-1 activity during gastrulation. Conclusion Our data provides a genomic view of the cohorts of genes that respond to Noggin and Dkk-1 activity, allowing us to separate the role of each in organizer function. These patterns demonstrate a model where BMP inhibition plays a largely inductive role during early developmental stages, thereby initiating the suites of genes needed to pattern dorsal tissues
Postmus, Iris; Warren, Helen R; Trompet, Stella
BACKGROUND: In addition to lowering low density lipoprotein cholesterol (LDL-C), statin therapy also raises high density lipoprotein cholesterol (HDL-C) levels. Inter-individual variation in HDL-C response to statins may be partially explained by genetic variation. METHODS AND RESULTS: We performed...... a meta-analysis of genome-wide association studies (GWAS) to identify variants with an effect on statin-induced high density lipoprotein cholesterol (HDL-C) changes. The 123 most promising signals with p
Carelli Francesco N
Full Text Available Abstract Background Segmental duplications (SDs are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (Vitis vinifera genome (PN40024. Results We demonstrate that recent SDs (> 94% identity and >= 10 kb in size are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence. We detected mitochondrial and plastid DNA and genes (10% of gene annotation in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress. Conclusions These data show the great influence of SDs and organelle DNA transfers in modeling the Vitis vinifera nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.
M.H.M. de Moor; P.T. Costa Jr; A. Terracciano; R.F. Krueger; E.J.C. de Geus (Eco); T. Toshiko; B.W.J.H. Penninx (Brenda); T. Esko; P.A.F. Madden (Pamela); J. Derringer; N. Amin (Najaf); G.A.H.M. Willemsen (Gonneke); J.J. Hottenga (Jouke Jan); M.A. Distel (Marijn); M. Uda (Manuela); S. Sanna (Serena); P. Spinhoven; C.A. Hartman; P.F. Sullivan (Patrick); A. Realo; J. Allik; A.C. Heath; M.L. Pergadia; P. Lin; R. Grucza; T. Nutile; M. Ciullo; D. Rujescu (Dan); I. Giegling (Ina); B. Konte; E. Widen (Elisabeth); D.L. Cousminer (Diana); J.G. Eriksson; A. Palotie; L. Peltonen; M. Luciano (Michelle); A. Tenesa (Albert); G. Davies; L.M. Lopez; N.K. Hansell (Narelle); S.E. Medland (Sarah Elizabeth); L. Ferrucci; D. Schlessinger; G.W. Montgomery; M.J. Wright (Margaret); Y.S. Aulchenko (Yurii); A.C.J.W. Janssens (Cécile); B.A. Oostra (Ben); A. Metspalu (Andres); I.J. Deary; K. Räikkönen (Katri); L.J. Bierut (Laura); N.G. Martin; C.M. van Duijn (Cornelia); D.I. Boomsma (Dorret); G.R. Abecasis (Gonçalo); A. Agrawal (Arpana)
textabstractPersonality can be thought of as a set of characteristics that influence people's thoughts, feelings and behavior across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide
Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping
Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
Full Text Available Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in
Full Text Available In the present study, the near complete mitochondrial genome (mitogenome of Junonia iphita (Lepidoptera: Nymphalidae: Nymphalinae was determined to be 14,892 bp. The gene order and orientation are identical to those in other butterfly species. The phylogenetic tree constructed from the whole mitogenomes using the 13 protein coding genes (PCGs defines the genetic relatedness of the two J. iphita species collected from two different regions. All the Junonia species clustered together, and were further subdivided into clade one consisting of J. almana and J. orithya and clade two comprising of the two J. iphita which were collected from Indo and Indochinese subregions separated by river barrier. Comparison between the two J. iphita sequences revealed minor variations and Single Nucleotide Polymorphisms were identified at 51 sites amounting to 0.4% of the entire mitochondrial genome.
Roberta L Hannibal
Full Text Available Discovery of lineage-specific somatic copy number variation (CNV in mammals has led to debate over whether CNVs are mutations that propagate disease or whether they are a normal, and even essential, aspect of cell biology. We show that 1,000 N polyploid trophoblast giant cells (TGCs of the mouse placenta contain 47 regions, totaling 138 Megabases, where genomic copies are underrepresented (UR. UR domains originate from a subset of late-replicating heterochromatic regions containing gene deserts and genes involved in cell adhesion and neurogenesis. While lineage-specific CNVs have been identified in mammalian cells, classically in the immune system where V(DJ recombination occurs, we demonstrate that CNVs form during gestation in the placenta by an underreplication mechanism, not by recombination nor deletion. Our results reveal that large scale CNVs are a normal feature of the mammalian placental genome, which are regulated systematically during embryogenesis and are propagated by a mechanism of underreplication.
Thudi, Mahendar; Khan, Aamir W; Kumar, Vinay; Gaur, Pooran M; Katta, Krishnamohan; Garg, Vanika; Roorkiwal, Manish; Samineni, Srinivasan; Varshney, Rajeev K
Chickpea (Cicer arietinum L.) is the second most important grain legume cultivated by resource poor farmers in South Asia and Sub-Saharan Africa. In order to harness the untapped genetic potential available for chickpea improvement, we re-sequenced 35 chickpea genotypes representing parental lines of 16 mapping populations segregating for abiotic (drought, heat, salinity), biotic stresses (Fusarium wilt, Ascochyta blight, Botrytis grey mould, Helicoverpa armigera) and nutritionally important (protein content) traits using whole genome re-sequencing approach. A total of 192.19 Gb data, generated on 35 genotypes of chickpea, comprising 973.13 million reads, with an average sequencing depth of ~10 X for each line. On an average 92.18 % reads from each genotype were aligned to the chickpea reference genome with 82.17 % coverage. A total of 2,058,566 unique single nucleotide polymorphisms (SNPs) and 292,588 Indels were detected while comparing with the reference chickpea genome. Highest number of SNPs were identified on the Ca4 pseudomolecule. In addition, copy number variations (CNVs) such as gene deletions and duplications were identified across the chickpea parental genotypes, which were minimum in PI 489777 (1 gene deletion) and maximum in JG 74 (1,497). A total of 164,856 line specific variations (144,888 SNPs and 19,968 Indels) with the highest percentage were identified in coding regions in ICC 1496 (21 %) followed by ICCV 97105 (12 %). Of 539 miscellaneous variations, 339, 138 and 62 were inter-chromosomal variations (CTX), intra-chromosomal variations (ITX) and inversions (INV) respectively. Genome-wide SNPs, Indels, CNVs, PAVs, and miscellaneous variations identified in different mapping populations are a valuable resource in genetic research and helpful in locating genes/genomic segments responsible for economically important traits. Further, the genome-wide variations identified in the present study can be used for developing high density SNP arrays for
Stickney, A L; Dunowska, M; Cave, N J
The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.
Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Full Text Available Copy number variations (CNVs refer to large insertions, deletions and duplications in the genomic structure ranging from one thousand to several million bases in size. Since the development of next generation sequencing technology, several methods have been well built for detection of copy number variations with high credibility and accuracy. Evidence has shown that CNV occurring in gene region could lead to phenotypic changes due to the alteration in gene structure and dosage. However, it still remains unexplored whether CNVs underlie the phenotypic differences between Chinese and Western domestic pigs. Based on the read-depth methods, we investigated copy number variations using 49 individuals derived from both Chinese and Western pig breeds. A total of 3,131 copy number variation regions (CNVRs were identified with an average size of 13.4 Kb in all individuals during domestication, harboring 1,363 genes. Among them, 129 and 147 CNVRs were Chinese and Western pig specific, respectively. Gene functional enrichments revealed that these CNVRs contribute to strong disease resistance and high prolificacy in Chinese domestic pigs, but strong muscle tissue development in Western domestic pigs. This finding is strongly consistent with the morphologic characteristics of Chinese and Western pigs, indicating that these group-specific CNVRs might have been preserved by artificial selection for the favored phenotypes during independent domestication of Chinese and Western pigs. In this study, we built high-resolution CNV maps in several domestic pig breeds and discovered the group specific CNVs by comparing Chinese and Western pigs, which could provide new insight into genomic variations during pigs' independent domestication, and facilitate further functional studies of CNV-associated genes.
Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.
Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary
Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T
SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research
Full Text Available The tsetse fly Glossina fuscipes fuscipes (Gff is the insect vector of the two forms of Human African Trypanosomiasis (HAT that exist in Uganda. Understanding Gff population dynamics, and the underlying genetics of epidemiologically relevant phenotypes is key to reducing disease transmission. Using ddRAD sequence technology, complemented with whole-genome sequencing, we developed a panel of ∼73,000 single-nucleotide polymorphisms (SNPs distributed across the Gff genome that can be used for population genomics and to perform genome-wide-association studies. We used these markers to estimate genomic patterns of linkage disequilibrium (LD in Gff, and used the information, in combination with outlier-locus detection tests, to identify candidate regions of the genome under selection. LD in individual populations decays to half of its maximum value (r2max/2 between 1359 and 2429 bp. The overall LD estimated for the species reaches r2max/2 at 708 bp, an order of magnitude slower than in Drosophila. Using 53 infected (Trypanosoma spp. and uninfected flies from four genetically distinct Ugandan populations adapted to different environmental conditions, we were able to identify SNPs associated with the infection status of the fly and local environmental adaptation. The extent of LD in Gff likely facilitated the detection of loci under selection, despite the small sample size. Furthermore, it is probable that LD in the regions identified is much higher than the average genomic LD due to strong selection. Our results show that even modest sample sizes can reveal significant genetic associations in this species, which has implications for future studies given the difficulties of collecting field specimens with contrasting phenotypes for association analysis.
Lu, Jianqi; Lou, Haiyi; Fu, Ruiqing; Lu, Dongsheng; Zhang, Feng; Wu, Zhendong; Zhang, Xi; Li, Changhua; Fang, Baijun; Pu, Fangfang; Wei, Jingning; Wei, Qian; Zhang, Chao; Wang, Xiaoji; Lu, Yan; Yan, Shi; Yang, Yajun; Jin, Li; Xu, Shuhua
Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world. To build a representative CNV map for the Han Chinese population. We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery. A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population. Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The human cytochrome P450 3A subfamily metabolises endogenous substances and approximately half of all currently available drugs. There is marked inter-individual variation in hepatic expression of the major adult isoform, CYP3A4; the genetic component of this variability is estimated at 60-90% and, as yet, remains largely uncharacterised. Elucidation of genetic factors determining CYP3A4 activity would permit personalised dose-adjustment in therapies with CYP3A4 drug substrates. CYP3A4 genom...
Antaki, Danny; Brandler, William M; Sebat, Jonathan
Structural variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease. Here, we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2). firstname.lastname@example.org. Supplementary data are available at Bioinformatics online.
Santure, Anna W; De Cauwer, Isabelle; Robinson, Matthew R; Poissant, Jocelyn; Sheldon, Ben C; Slate, Jon
Clutch size and egg mass are life history traits that have been extensively studied in wild bird populations, as life history theory predicts a negative trade-off between them, either at the phenotypic or at the genetic level. Here, we analyse the genomic architecture of these heritable traits in a wild great tit (Parus major) population, using three marker-based approaches - chromosome partitioning, quantitative trait locus (QTL) mapping and a genome-wide association study (GWAS). The variance explained by each great tit chromosome scales with predicted chromosome size, no location in the genome contains genome-wide significant QTL, and no individual SNPs are associated with a large proportion of phenotypic variation, all of which may suggest that variation in both traits is due to many loci of small effect, located across the genome. There is no evidence that any regions of the genome contribute significantly to both traits, which combined with a small, nonsignificant negative genetic covariance between the traits, suggests the absence of genetic constraints on the independent evolution of these traits. Our findings support the hypothesis that variation in life history traits in natural populations is likely to be determined by many loci of small effect spread throughout the genome, which are subject to continued input of variation by mutation and migration, although we cannot exclude the possibility of an additional input of major effect genes influencing either trait. © 2013 John Wiley & Sons Ltd.
Amplified fragment length polymorphism (AFLP) and methylation-sensitive amplification polymorphism (MSAP) analysis were used to investigate the genome of two sibling tobacco cultivars, Yunyan85 and Yunyan87, their parent K326 and the other tobacco cultivar NC89. AFLP analysis indicated that, the genome primary ...
Gussow, Ayal B; Copeland, Brett R; Dhindsa, Ryan S; Wang, Quanli; Petrovski, Slavé; Majoros, William H; Allen, Andrew S; Goldstein, David B
There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.
Davies Jonathan J
Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.
Ehsani, Ali Reza
In the last decade rapid development in biotechnologies has made it possible to extract extensive information about practically all levels of biological organization. An ever-increasing number of studies are reporting miltilayered datasets on the entire DNA sequence, transceroption, protein...... expression, and metabolite abundance of more and more populations in a multitude of invironments. However, a solid model for including all of this complex information in one analysis, to disentangle genetic variation and the underlying genetic architecture of complex traits and diseases, has not yet been...
Full Text Available We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a identification of horizontally transferred genes, (b identification of genomic islands with special properties and (c binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a calculation of the k-mer based barcode image for a provided DNA sequence; (b detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c clustering of provided DNA sequences into groups having similar barcodes; and (d homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.
Boocock, James; Chagné, David; Merriman, Tony R; Black, Michael A
Copy number variation (CNV) is a common feature of eukaryotic genomes, and a growing body of evidence suggests that genes affected by CNV are enriched in processes that are associated with environmental responses. Here we use next generation sequence (NGS) data to detect copy-number variable regions (CNVRs) within the Malus x domestica genome, as well as to examine their distribution and impact. CNVRs were detected using NGS data derived from 30 accessions of M. x domestica analyzed using the read-depth method, as implemented in the CNVrd2 software. To improve the reliability of our results, we developed a quality control and analysis procedure that involved checking for organelle DNA, not repeat masking, and the determination of CNVR identity using a permutation testing procedure. Overall, we identified 876 CNVRs, which spanned 3.5 % of the apple genome. To verify that detected CNVRs were not artifacts, we analyzed the B- allele-frequencies (BAF) within a single nucleotide polymorphism (SNP) array dataset derived from a screening of 185 individual apple accessions and found the CNVRs were enriched for SNPs having aberrant BAFs (P apple scab. We present the first analysis and catalogue of CNVRs in the M. x domestica genome. The enrichment of the CNVRs with R gene models and their overlap with gene loci of agricultural significance draw attention to a form of unexplored genetic variation in apple. This research will underpin further investigation of the role that CNV plays within the apple genome.
Genomes of solid tumors are often highly rearranged and these rearrangements promote cancer progression through disruption of genes mediating immortality, survival, metastasis, and resistance to therapy...
Abou Zaki, Natalia; Salloum, Tamara; Osman, Marwan; Rafei, Rayane; Hamze, Monzer; Tokajian, Sima
Brucella melitensis is the main causative agent of the zoonotic disease brucellosis. This study aimed at typing and characterizing genetic variation in 33 Brucella isolates recovered from patients in Lebanon. Bruce-ladder multiplex PCR and PCR-RFLP of omp31, omp2a and omp2b were performed. Sixteen representative isolates were chosen for draft-genome sequencing and analyzed to determine variations in virulence, resistance, genomic islands, prophages and insertion sequences. Comparative whole-genome single nucleotide polymorphism analysis was also performed. The isolates were confirmed to be B. melitensis. Genome analysis revealed multiple virulence determinants and efflux pumps. Genome comparisons and single nucleotide polymorphisms divided the isolates based on geographical distribution but revealed high levels of similarity between the strains. Sequence divergence in B. melitensis was mainly due to lateral gene transfer of mobile elements. This is the first report of an in-depth genomic characterization of B. melitensis in Lebanon. © FEMS 2017. All rights reserved. For permissions, please e-mail: email@example.com.
Full Text Available Abstract Background Whole exome sequencing (WES has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s and the affected gene(s. For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design. Results We describe EVA (Exome Variation Analyzer, a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene
Pan, Qingchun; Li, Lin; Yang, Xiaohong; Tong, Hao; Xu, Shutu; Li, Zhigang; Li, Weiya; Muehlbauer, Gary J; Li, Jiansheng; Yan, Jianbing
Meiotic recombination is a major driver of genetic diversity, species evolution, and agricultural improvement. Thus, an understanding of the genetic recombination landscape across the maize (Zea mays) genome will provide insight and tools for further study of maize evolution and improvement. Here, we used c. 50 000 single nucleotide polymorphisms to precisely map recombination events in 12 artificial maize segregating populations. We observed substantial variation in the recombination frequency and distribution along the ten maize chromosomes among the 12 populations and identified 143 recombination hot regions. Recombination breakpoints were partitioned into intragenic and intergenic events. Interestingly, an increase in the number of genes containing recombination events was accompanied by a decrease in the number of recombination events per gene. This kept the overall number of intragenic recombination events nearly invariable in a given population, suggesting that the recombination variation observed among populations was largely attributed to intergenic recombination. However, significant associations between intragenic recombination events and variation in gene expression and agronomic traits were observed, suggesting potential roles for intragenic recombination in plant phenotypic diversity. Our results provide a comprehensive view of the maize recombination landscape, and show an association between recombination, gene expression and phenotypic variation, which may enhance crop genetic improvement. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Denise E. Costich
Full Text Available Switchgrass ( L., a native perennial dominant of the prairies of North America, has been targeted as a model herbaceous species for biofeedstock development. A flow-cytometric survey of a core set of 11 primarily upland polyploid switchgrass accessions indicated that there was considerable variation in genome size within each accession, particularly at the octoploid (2 = 8 = 72 chromosome ploidy level. Highly variable chromosome counts in mitotic cell preparations indicated that aneuploidy was more common in octoploids (86.3% than tetraploids (23.2%. Furthermore, the incidence of hyper- versus hypoaneuploidy is equivalent in tetraploids. This is clearly not the case in octoploids, where close to 90% of the aneuploid counts are lower than the euploid number. Cytogenetic investigation using fluorescent in situ hybridization (FISH revealed an unexpected degree of variation in chromosome structure underlying the apparent genomic instability of this species. These results indicate that rapid advances in the breeding of polyploid biofuel feedstocks, based on the molecular-genetic dissection of biomass characteristics and yield, will be predicated on the continual improvement of our understanding of the cytogenetics of these species.
Creixell, Pau; Reimand, Jueri; Haider, Syed
Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been...
The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has
Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...
Babbitt, Gregory A; Cotter, C R
One prominent pattern of mutational frequency, long appreciated in comparative genomics, is the bias of purine/pyrimidine conserving substitutions (transitions) over purine/pyrimidine altering substitutions (transversions). Traditionally, this transitional bias has been thought to be driven by the underlying rates of DNA mutation and/or repair. However, recent sequencing studies of mutation accumulation lines in model organisms demonstrate that substitutions generally do not accumulate at rates that would indicate a transitional bias. These observations have called into question a very basic assumption of molecular evolution; that naturally occurring patterns of molecular variation in noncoding regions accurately reflect the underlying processes of randomly accumulating neutral mutation in nuclear genomes. Here, in Saccharomyces yeasts, we report a very strong inverse association (r = -0.951, P < 0.004) between the genome-wide frequency of substitutions and their average energetic effect on nucleosome formation, as predicted by a structurally based energy model of DNA deformation around the nucleosome core. We find that transitions occurring at sites positioned nearest the nucleosome surface, which are believed to function most importantly in nucleosome formation, alter the deformation energy of DNA to the nucleosome core by only a fraction of the energy changes typical of most transversions. When we examined the same substitutions set against random background sequences as well as an existing study reporting substitutions arising in mutation accumulation lines of Saccharomyces cerevisiae, we failed to find a similar relationship. These results support the idea that natural selection acting to functionally conserve chromatin organization may contribute significantly to genome-wide transitional bias, even in noncoding regions. Because nucleosome core structure is highly conserved across eukaryotes, our observations may also help to further explain locally elevated
Full Text Available Crops are often cultivated in regions where they will face environmental adversities; resulting in substantial yield loss which can ultimately lead to food and societal problems. Thus, significant efforts have been made to breed stress tolerant cultivars in an attempt to minimize these problems and to produce more stability with respect to crop yields across broad geographies. Since stress tolerance is a complex and multi-genic trait, advancements with classical breeding approaches have been challenging. On the other hand, molecular breeding, which is based on transgenics, marker-assisted selection and genome editing technologies; holds great promise to enable farmers to better cope with these challenges. However, identification of the key genetic components underlying the trait is critical and will serve as the foundation for future crop genetic improvement. Recently, genome-wide association studies have made significant contributions to facilitate the discovery of natural variation contributing to stress tolerance in crops. From these studies, the identified loci can serve as targets for genomic selection or editing to enable the molecular design of new cultivars. Here, we summarize research progress on this issue and focus on the genetic basis of drought tolerance as revealed by genome-wide association studies and quantitative trait loci mapping. Although many favorable loci have been identified, elucidation of their molecular mechanisms contributing to increased stress tolerance still remains a challenge. Thus, continuous efforts are still required to functionally dissect this complex trait through comprehensive approaches, such as system biological studies. It is expected that proper application of the acquired knowledge will enable the development of stress tolerant cultivars; allowing agricultural production to become more sustainable under dynamic environmental conditions.
MacRae, Sheila L; Zhang, Quanwei; Lemetre, Christophe; Seim, Inge; Calder, Robert B; Hoeijmakers, Jan; Suh, Yousin; Gladyshev, Vadim N; Seluanov, Andrei; Gorbunova, Vera; Vijg, Jan; Zhang, Zhengdong D
Genome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM genes appeared to be strongly conserved, with copy number variation in only four genes. Interestingly, we found NMR to have a higher copy number of CEBPG, a regulator of DNA repair, and TINF2, a protector of telomere integrity. NMR, as well as human, was also found to have a lower rate of germline nucleotide substitution than the mouse. Together, the data suggest that the long-lived NMR, as well as human, has more robust GM than mouse and identifies new targets for the analysis of the exceptional longevity of the NMR. © 2015 The Authors. Aging Cell published by the Anatomical Society and John Wiley & Sons Ltd.
Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.
Buske Orion J
Full Text Available Abstract Background As genome-wide experiments and annotations become more prevalent, researchers increasingly require tools to help interpret data at this scale. Many functional genomics experiments involve partitioning the genome into labeled segments, such that segments sharing the same label exhibit one or more biochemical or functional traits. For example, a collection of ChlP-seq experiments yields a compendium of peaks, each labeled with one or more associated DNA-binding proteins. Similarly, manually or automatically generated annotations of functional genomic elements, including cis-regulatory modules and protein-coding or RNA genes, can also be summarized as genomic segmentations. Results We present a software toolkit called Segtools that simplifies and automates the exploration of genomic segmentations. The software operates as a series of interacting tools, each of which provides one mode of summarization. These various tools can be pipelined and summarized in a single HTML page. We describe the Segtools toolkit and demonstrate its use in interpreting a collection of human histone modification data sets and Plasmodium falciparum local chromatin structure data sets. Conclusions Segtools provides a convenient, powerful means of interpreting a genomic segmentation.
Full Text Available Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs. Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.
Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V
BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.
BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse\\'s genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.
Full Text Available Banana (Musa spp. is an important crop in the African Great Lakes region in terms of income and food security, with the highest per capita consumption worldwide. Pests, diseases and climate change hamper sustainable production of bananas. New breeding tools with increased crossbreeding efficiency are being investigated to breed for resistant, high yielding hybrids of East African Highland banana (EAHB. These include genomic selection (GS, which will benefit breeding through increased genetic gain per unit time. Understanding trait variation and the correlation among economically important traits is an essential first step in the development and selection of suitable GS models for banana. In this study, we tested the hypothesis that trait variations in bananas are not affected by cross combination, cycle, field management and their interaction with genotype. A training population created using EAHB breeding material and its progeny was phenotyped in two contrasting conditions. A high level of correlation among vegetative and yield related traits was observed. Therefore, genomic selection models could be developed for traits that are easily measured. It is likely that the predictive ability of traits that are difficult to phenotype will be similar to less difficult traits they are highly correlated with. Genotype response to cycle and field management practices varied greatly with respect to traits. Yield related traits accounted for 31-35% of principal component variation under low and high input field management conditions. Resistance to Black Sigatoka was stable across cycles but varied under different field management depending on the genotype. The best cross combination was 1201K-1xSH3217 based on selection response (R of hybrids. Genotyping using simple sequence repeat (SSR markers revealed that the training population was genetically diverse, reflecting a complex pedigree background, which was mostly influenced by the male parents.
Nyine, Moses; Uwimana, Brigitte; Swennen, Rony; Batte, Michael; Brown, Allan; Christelová, Pavla; Hřibová, Eva; Lorenzen, Jim; Doležel, Jaroslav
Banana (Musa spp.) is an important crop in the African Great Lakes region in terms of income and food security, with the highest per capita consumption worldwide. Pests, diseases and climate change hamper sustainable production of bananas. New breeding tools with increased crossbreeding efficiency are being investigated to breed for resistant, high yielding hybrids of East African Highland banana (EAHB). These include genomic selection (GS), which will benefit breeding through increased genetic gain per unit time. Understanding trait variation and the correlation among economically important traits is an essential first step in the development and selection of suitable GS models for banana. In this study, we tested the hypothesis that trait variations in bananas are not affected by cross combination, cycle, field management and their interaction with genotype. A training population created using EAHB breeding material and its progeny was phenotyped in two contrasting conditions. A high level of correlation among vegetative and yield related traits was observed. Therefore, genomic selection models could be developed for traits that are easily measured. It is likely that the predictive ability of traits that are difficult to phenotype will be similar to less difficult traits they are highly correlated with. Genotype response to cycle and field management practices varied greatly with respect to traits. Yield related traits accounted for 31-35% of principal component variation under low and high input field management conditions. Resistance to Black Sigatoka was stable across cycles but varied under different field management depending on the genotype. The best cross combination was 1201K-1xSH3217 based on selection response (R) of hybrids. Genotyping using simple sequence repeat (SSR) markers revealed that the training population was genetically diverse, reflecting a complex pedigree background, which was mostly influenced by the male parents.
Nyine, Moses; Uwimana, Brigitte; Swennen, Rony; Batte, Michael; Brown, Allan; Christelová, Pavla; Hřibová, Eva; Lorenzen, Jim
Banana (Musa spp.) is an important crop in the African Great Lakes region in terms of income and food security, with the highest per capita consumption worldwide. Pests, diseases and climate change hamper sustainable production of bananas. New breeding tools with increased crossbreeding efficiency are being investigated to breed for resistant, high yielding hybrids of East African Highland banana (EAHB). These include genomic selection (GS), which will benefit breeding through increased genetic gain per unit time. Understanding trait variation and the correlation among economically important traits is an essential first step in the development and selection of suitable GS models for banana. In this study, we tested the hypothesis that trait variations in bananas are not affected by cross combination, cycle, field management and their interaction with genotype. A training population created using EAHB breeding material and its progeny was phenotyped in two contrasting conditions. A high level of correlation among vegetative and yield related traits was observed. Therefore, genomic selection models could be developed for traits that are easily measured. It is likely that the predictive ability of traits that are difficult to phenotype will be similar to less difficult traits they are highly correlated with. Genotype response to cycle and field management practices varied greatly with respect to traits. Yield related traits accounted for 31–35% of principal component variation under low and high input field management conditions. Resistance to Black Sigatoka was stable across cycles but varied under different field management depending on the genotype. The best cross combination was 1201K-1xSH3217 based on selection response (R) of hybrids. Genotyping using simple sequence repeat (SSR) markers revealed that the training population was genetically diverse, reflecting a complex pedigree background, which was mostly influenced by the male parents. PMID:28586365
Hess, Jon E; Campbell, Nathan R; Close, David A; Docker, Margaret F; Narum, Shawn R
Unlike most anadromous fishes that have evolved strict homing behaviour, Pacific lamprey (Entosphenus tridentatus) seem to lack philopatry as evidenced by minimal population structure across the species range. Yet unexplained findings of within-region population genetic heterogeneity coupled with the morphological and behavioural diversity described for the species suggest that adaptive genetic variation underlying fitness traits may be responsible. We employed restriction site-associated DNA sequencing to genotype 4439 quality filtered single nucleotide polymorphism (SNP) loci for 518 individuals collected across a broad geographical area including British Columbia, Washington, Oregon and California. A subset of putatively neutral markers (N = 4068) identified a significant amount of variation among three broad populations: northern British Columbia, Columbia River/southern coast and 'dwarf' adults (F(CT) = 0.02, P ≪ 0.001). Additionally, 162 SNPs were identified as adaptive through outlier tests, and inclusion of these markers revealed a signal of adaptive variation related to geography and life history. The majority of the 162 adaptive SNPs were not independent and formed four groups of linked loci. Analyses with matsam software found that 42 of these outlier SNPs were significantly associated with geography, run timing and dwarf life history, and 27 of these 42 SNPs aligned with known genes or highly conserved genomic regions using the genome browser available for sea lamprey. This study provides both neutral and adaptive context for observed genetic divergence among collections and thus reconciles previous findings of population genetic heterogeneity within a species that displays extensive gene flow. © 2012 John Wiley & Sons Ltd.
Full Text Available Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels, and complex variants (1 to 6,000 bp. While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals.
Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles
Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: firstname.lastname@example.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072
Stephen D Bentley
Full Text Available The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an
Matsuda, Fumio; Nakabayashi, Ryo; Yang, Zhigang; Okazaki, Yozo; Yonemaru, Jun-ichi; Ebana, Kaworu; Yano, Masahiro; Saito, Kazuki
Plants produce structurally diverse secondary (specialized) metabolites to increase their fitness for survival under adverse environments. Several bioactive compounds for new drugs have been identified through screening of plant extracts. In this study, genome-wide association studies (GWAS) were conducted to investigate the genetic architecture behind the natural variation of rice secondary metabolites. GWAS using the metabolome data of 175 rice accessions successfully identified 323 associations among 143 single nucleotide polymorphisms (SNPs) and 89 metabolites. The data analysis highlighted that levels of many metabolites are tightly associated with a small number of strong quantitative trait loci (QTLs). The tight association may be a mechanism generating strains with distinct metabolic composition through the crossing of two different strains. The results indicate that one plant species produces more diverse phytochemicals than previously expected, and plants still contain many useful compounds for human applications. PMID:25267402
Bohlin, J; Snipen, L; Hardy, S.P.
the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content...
Angley, L P; Combs, M; Firth, C; Frye, M J; Lipkin, I; Richardson, J L; Munshi-South, J
Brown rats (Rattus norvegicus) are a globally distributed pest. Urban habitats can support large infestations of rats, posing a potential risk to public health from the parasites and pathogens they carry. Despite the potential influence of rodent-borne zoonotic diseases on human health, it is unclear how urban habitats affect the structure and transmission dynamics of ectoparasite and microbial communities (all referred to as "parasites" hereafter) among rat colonies. In this study, we use ecological data on parasites and genomic sequencing of their rat hosts to examine associations between spatial proximity, genetic relatedness and the parasite communities associated with 133 rats at five sites in sections of New York City with persistent rat infestations. We build on previous work showing that rats in New York carry a wide variety of parasites and report that these communities differ significantly among sites, even across small geographical distances. Ectoparasite community similarity was positively associated with geographical proximity; however, there was no general association between distance and microbial communities of rats. Sites with greater overall parasite diversity also had rats with greater infection levels and parasite species richness. Parasite community similarity among sites was not linked to genetic relatedness of rats, suggesting that these communities are not associated with genetic similarity among host individuals or host dispersal among sites. Discriminant analysis identified site-specific associations of several parasite species, suggesting that the presence of some species within parasite communities may allow researchers to determine the sites of origin for newly sampled rats. The results of our study help clarify the roles that colony structure and geographical proximity play in determining the ecology of R. norvegicus as a significant urban reservoir of zoonotic diseases. Our study also highlights the spatial variation present in urban
Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.
It was also shown that the profile generated by taking all dinucleotides together ... Keywords. genome signature; DRAP; HIV-1; chaos game representation. Journal of .... be used to quantify low levels of variation as are observed within species ..... Dayton A.I., Sodroski J.G., Rosen C.A., Goh W.C. and Haseltine. W.A. 1986 ...
The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has given rise to the parallel development of other high-throughput approaches such as determining mRNA expression level changes, gene-deletion phenotypes, chromosomal location of DNA binding proteins, cel...
Keeling Patrick J
Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements
Matt C. Estep
Full Text Available Generation of ∼2200 Sanger sequence reads or ∼10,000 454 reads for seven Lour. DNA samples (five species allowed identification of the highly repetitive DNA content in these genomes. The 14 most abundant repeats in these species were identified and partially assembled. Annotation indicated that they represent nine long terminal repeat (LTR retrotransposon families, three tandem satellite repeats, one long interspersed element (LINE retroelement, and one DNA transposon. All of these repeats are most closely related to repetitive elements in other closely related plants and are not products of horizontal transfer from their host species. These repeats were differentially abundant in each species, with the LTR retrotransposons and satellite repeats most responsible for variation in genome size. Each species had some repetitive elements that were more abundant and some less abundant than the other species examined, indicating that no single element or any unilateral growth or decrease trend in genome behavior was responsible for variation in genome size and composition. Genome sizes were determined by flow sorting, and the values of 615 Mb [ (L. Kuntze], 1330 Mb [ (Willd. Vatke], 1425 Mb [ (Delile Benth.] and 2460 Mb ( Benth. suggest a ploidy series, a prediction supported by repetitive DNA sequence analysis. Phylogenetic analysis using six chloroplast loci indicated the ancestral relationships of the five most agriculturally important species, with the unexpected result that the one parasite of dicotyledonous plants ( was found to be more closely related to some of the grass parasites than many of the grass parasites are to each other.
Marie-Claude N. Laffitte
Full Text Available Leishmania has a plastic genome, and drug pressure can select for gene copy number variation (CNV. CNVs can apply either to whole chromosomes, leading to aneuploidy, or to specific genomic regions. For the latter, the amplification of chromosomal regions occurs at the level of homologous direct or inverted repeated sequences leading to extrachromosomal circular or linear amplified DNAs. This ability of Leishmania to respond to drug pressure by CNVs has led to the development of genomic screens such as Cos-Seq, which has the potential of expediting the discovery of drug targets for novel promising drug candidates.
Joseph, Bindu; Corwin, Jason A.; Li, Baohua
Understanding genome to phenotype linkages has been greatly enabled by genomic sequencing. However, most genome analysis is typically confined to the nuclear genome. We conducted a metabolomic QTL analysis on a reciprocal RIL population structured to examine how variation in the organelle genomes...... was a central hub in the epistatic network controlling the plant metabolome. This epistatic influence manifested such that the cytoplasmic background could alter or hide pairwise epistasis between nuclear loci. Thus, cytoplasmic genetic variation plays a central role in controlling natural variation...... in metabolomic networks. This suggests that cytoplasmic genomes must be included in any future analysis of natural variation....
Johanna J Kenyon
Full Text Available Lipooligosaccharide (LOS is a complex surface structure that is linked to many pathogenic properties of Acinetobacter baumannii. In A. baumannii, the genes responsible for the synthesis of the outer core (OC component of the LOS are located between ilvE and aspS. The content of the OC locus is usually variable within a species, and examination of 6 complete and 227 draft A. baumannii genome sequences available in GenBank non-redundant and Whole Genome Shotgun databases revealed nine distinct new types, OCL4-OCL12, in addition to the three known ones. The twelve gene clusters fell into two distinct groups, designated Group A and Group B, based on similarities in the genes present. OCL6 (Group B was unique in that it included genes for the synthesis of L-Rhamnosep. Genetic exchange of the different configurations between strains has occurred as some OC forms were found in several different sequence types (STs. OCL1 (Group A was the most widely distributed being present in 18 STs, and OCL6 was found in 16 STs. Variation within clones was also observed, with more than one OC locus type found in the two globally disseminated clones, GC1 and GC2, that include the majority of multiply antibiotic resistant isolates. OCL1 was the most abundant gene cluster in both GC1 and GC2 genomes but GC1 isolates also carried OCL2, OCL3 or OCL5, and OCL3 was also present in GC2. As replacement of the OC locus in the major global clones indicates the presence of sub-lineages, a PCR typing scheme was developed to rapidly distinguish Group A and Group B types, and to distinguish the specific forms found in GC1 and GC2 isolates.
Bhaskar, Anand; Song, Yun S
The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.
Bhaskar, Anand; Song, Yun S.
The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the “folded” SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes’ rule of signs for polynomials to the Laplace transform of piecewise continuous functions. PMID:28018011
Full Text Available Abstract Background Addison's disease (AD is caused by an autoimmune destruction of the adrenal cortex. The pathogenesis is multi-factorial, involving genetic components and hitherto unknown environmental factors. The aim of the present study was to investigate if gene dosage in the form of copy number variation (CNV could add to the repertoire of genetic susceptibility to autoimmune AD. Methods A genome-wide study using the Affymetrix GeneChip® Genome-Wide Human SNP Array 6.0 was conducted in 26 patients with AD. CNVs in selected genes were further investigated in a larger material of patients with autoimmune AD (n = 352 and healthy controls (n = 353 by duplex Taqman real-time polymerase chain reaction assays. Results We found that low copy number of UGT2B28 was significantly more frequent in AD patients compared to controls; conversely high copy number of ADAM3A was associated with AD. Conclusions We have identified two novel CNV associations to ADAM3A and UGT2B28 in AD. The mechanism by which this susceptibility is conferred is at present unclear, but may involve steroid inactivation (UGT2B28 and T cell maturation (ADAM3A. Characterization of these proteins may unravel novel information on the pathogenesis of autoimmunity.
Background Addison's disease (AD) is caused by an autoimmune destruction of the adrenal cortex. The pathogenesis is multi-factorial, involving genetic components and hitherto unknown environmental factors. The aim of the present study was to investigate if gene dosage in the form of copy number variation (CNV) could add to the repertoire of genetic susceptibility to autoimmune AD. Methods A genome-wide study using the Affymetrix GeneChip® Genome-Wide Human SNP Array 6.0 was conducted in 26 patients with AD. CNVs in selected genes were further investigated in a larger material of patients with autoimmune AD (n = 352) and healthy controls (n = 353) by duplex Taqman real-time polymerase chain reaction assays. Results We found that low copy number of UGT2B28 was significantly more frequent in AD patients compared to controls; conversely high copy number of ADAM3A was associated with AD. Conclusions We have identified two novel CNV associations to ADAM3A and UGT2B28 in AD. The mechanism by which this susceptibility is conferred is at present unclear, but may involve steroid inactivation (UGT2B28) and T cell maturation (ADAM3A). Characterization of these proteins may unravel novel information on the pathogenesis of autoimmunity. PMID:21851588
Long, Yi; Su, Ying; Ai, Huashui; Zhang, Zhiyan; Yang, Bin; Ruan, Guorong; Xiao, Shijun; Liao, Xinjun; Ren, Jun; Huang, Lusheng; Ding, Nengshui
Umbilical hernia (UH) is one of the most common congenital defects in pigs, leading to considerable economic loss and serious animal welfare problems. To test whether copy number variations (CNVs) contribute to pig UH, we performed a case-control genome-wide CNV association study on 905 pigs from the Duroc, Landrace and Yorkshire breeds using the Porcine SNP60 BeadChip and penncnv algorithm. We first constructed a genomic map comprising 6193 CNVs that pertain to 737 CNV regions. Then, we identified eight CNVs significantly associated with the risk for UH in the three pig breeds. Six of seven significantly associated CNVs were validated using quantitative real-time PCR. Notably, a rare CNV (CNV14:13030843-13059455) encompassing the NUGGC gene was strongly associated with UH (permutation-corrected P = 0.0015) in Duroc pigs. This CNV occurred exclusively in seven Duroc UH-affected individuals. SNPs surrounding the CNV did not show association signals, indicating that rare CNVs may play an important role in complex pig diseases such as UH. The NUGGC gene has been implicated in human omphalocele and inguinal hernia. Our finding supports that CNVs, including the NUGGC CNV, contribute to the pathogenesis of pig UH. © 2016 Stichting International Foundation for Animal Genetics.
Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin
Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937
Bassett, Anne S; Lowther, Chelsea; Merico, Daniele; Costain, Gregory; Chow, Eva W C; van Amelsvoort, Therese; McDonald-McGinn, Donna; Gur, Raquel E; Swillen, Ann; Van den Bree, Marianne; Murphy, Kieran; Gothelf, Doron; Bearden, Carrie E; Eliez, Stephan; Kates, Wendy; Philip, Nicole; Sashi, Vandana; Campbell, Linda; Vorstman, Jacob; Cubells, Joseph; Repetto, Gabriela M; Simon, Tony; Boot, Erik; Heung, Tracy; Evers, Rens; Vingerhoets, Claudia; van Duin, Esther; Zackai, Elaine; Vergaelen, Elfi; Devriendt, Koen; Vermeesch, Joris R; Owen, Michael; Murphy, Clodagh; Michaelovosky, Elena; Kushan, Leila; Schneider, Maude; Fremont, Wanda; Busa, Tiffany; Hooper, Stephen; McCabe, Kathryn; Duijff, Sasja; Isaev, Karin; Pellecchia, Giovanna; Wei, John; Gazzellone, Matthew J; Scherer, Stephen W; Emanuel, Beverly S; Guo, Tingwei; Morrow, Bernice E; Marshall, Christian R
Chromosome 22q11.2 deletion syndrome (22q11.2DS) is associated with a more than 20-fold increased risk for developing schizophrenia. The aim of this study was to identify additional genetic factors (i.e., "second hits") that may contribute to schizophrenia expression. Through an international consortium, the authors obtained DNA samples from 329 psychiatrically phenotyped subjects with 22q11.2DS. Using a high-resolution microarray platform and established methods to assess copy number variation (CNV), the authors compared the genome-wide burden of rare autosomal CNV, outside of the 22q11.2 deletion region, between two groups: a schizophrenia group and those with no psychotic disorder at age ≥25 years. The authors assessed whether genes overlapped by rare CNVs were overrepresented in functional pathways relevant to schizophrenia. Rare CNVs overlapping one or more protein-coding genes revealed significant between-group differences. For rare exonic duplications, six of 19 gene sets tested were enriched in the schizophrenia group; genes associated with abnormal nervous system phenotypes remained significant in a stepwise logistic regression model and showed significant interactions with 22q11.2 deletion region genes in a connectivity analysis. For rare exonic deletions, the schizophrenia group had, on average, more genes overlapped. The additional rare CNVs implicated known (e.g., GRM7, 15q13.3, 16p12.2) and novel schizophrenia risk genes and loci. The results suggest that additional rare CNVs overlapping genes outside of the 22q11.2 deletion region contribute to schizophrenia risk in 22q11.2DS, supporting a multigenic hypothesis for schizophrenia. The findings have implications for understanding expression of psychotic illness and herald the importance of whole-genome sequencing to appreciate the overall genomic architecture of schizophrenia.
Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.
Dec 20, 2006 ... progestins, as well as lipids, cholesterol metabolites, and. Genome ... Gene structure analysis shows strong conservation of exon structures among orthologoues. ..... earlier subfamily classification of NRs (Nuclear Receptors.
This issue is the collection of the papers presented at the 25th NIRS symposium on Human, Mouse Genome Analysis and Radiation Biology. The 14 of the presented papers are indexed individually. (J.P.N.)
Illa, Eudald; Sargent, Daniel J; Lopez Girona, Elena; Bushakra, Jill; Cestaro, Alessandro; Crowhurst, Ross; Pindo, Massimo; Cabrera, Antonio; van der Knaap, Esther; Iezzoni, Amy; Gardiner, Susan; Velasco, Riccardo; Arús, Pere; Chagné, David; Troggio, Michela
Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae.
Full Text Available Abstract Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae.
Full Text Available Since the first report of a genome-wide association study (GWAS on human age-related macular degeneration, GWAS has successfully been used to discover genetic variants for a variety of complex human diseases and/or traits, and thousands of associated loci have been identified. However, the underlying mechanisms for these loci remain largely unknown. To make these GWAS findings more useful, it is necessary to perform in-depth data mining. The data analysis in the post-GWAS era will include the following aspects: fine-mapping of susceptibility regions to identify susceptibility genes for elucidating the biological mechanism of action; joint analysis of susceptibility genes in different diseases; integration of GWAS, transcriptome, and epigenetic data to analyze expression and methylation quantitative trait loci at the whole-genome level, and find single-nucleotide polymorphisms that influence gene expression and DNA methylation; genome-wide association analysis of disease-related DNA copy number variations. Applying these strategies and methods will serve to strengthen GWAS data to enhance the utility and significance of GWAS in improving understanding of the genetics of complex diseases or traits and translate these findings for clinical applications. Keywords: Genome-wide association study, Data mining, Integrative data analysis, Polymorphism, Copy number variation
Full Text Available Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these
Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France
Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were
Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming
The data presented in this article are related to the published entitled "Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria" (Liu et al., 2017) . Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming
The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017) . Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Full Text Available The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017 . Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Permutation Multivariate Analysis of Variance ( PerMANOVA ). We used PerMANOVA to test the null-hypothesis of no... permutation -based version of the multivariate analysis of variance (MANOVA). PerMANOVA uses the distances between samples to partition variance and...coli. Antibiotics, bacteria, community analysis , diabetes, pyrosequencing, wound, wound therapy, 16S rRNA gene Genomic Analysis of Complex
Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M
The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.
agricultural and biological importance. Its capacity to form symbiotic relationships with rhizobia and microrrhizal fungi has fascinated researchers for years. Lotus has a small genome of approximately 470 Mb and a short life cycle of 2 to 3 months, which has made Lotus a model legume plant for many molecular...
Homologous sequences were used in the definition of synteny relationships and subsequent identification of the shared disease response genes. The homologous genes within the human genome were then identified and aligned to the bovine radiation hybrid map in order to identify the mouse/bovine homologous regions.
Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods Genome analysis... methods Data detail Data name Genome analysis methods DOI 10.18908/lsdba.nbdc01194-01-005 De...scription of data contents The current status and related information of the genomic analysis about each org...anism (March, 2014). In the case of organisms carried out genomic analysis, the d...e File name: pgdbj_dna_marker_linkage_map_genome_analysis_methods_en.zip File URL: ftp://ftp.biosciencedbc.j
Kumar, Narender; Mariappan, Vanitha; Baddam, Ramani; Lankapalli, Aditya K; Shaik, Sabiha; Goh, Khean-Lee; Loke, Mun Fai; Perkins, Tim; Benghezal, Mohammed; Hasnain, Seyed E; Vadivelu, Jamuna; Marshall, Barry J; Ahmed, Niyaz
The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host-pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Guillaud-Bataille, Marine; Valent, Alexander; Soularue, Pascal; Perot, Christine; Inda, Maria Mar; Receveur, Aline; Smaïli, Sadek; Roest Crollius, Hugues; Bénard, Jean; Bernheim, Alain; Gidrol, Xavier; Danglot, Gisèle
Comparative genomic hybridization to bacterial artificial chromosome (BAC)-arrays (array-CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci, and the reliable detection of local one-copy-level variations. We report a genome-wide amplification method allowing the same measurement sensitivity, using 1 ng of starting genomic DNA, instead of the classical 1 microg usually necessary. Using a discrete series of DNA fragments, we defined the parameters adapted to the most faithful ligation-mediated PCR amplification and the limits of the technique. The optimized protocol allows a 3000-fold DNA amplification, retaining the quantitative characteristics of the initial genome. Validation of the amplification procedure, using DNA from 10 tumour cell lines hybridized to BAC-arrays of 1500 spots, showed almost perfectly superimposed ratios for the non-amplified and amplified DNAs. Correlation coefficients of 0.96 and 0.99 were observed for regions of low-copy-level variations and all regions, respectively (including in vivo amplified oncogenes). Finally, labelling DNA using two nucleotides bearing the same fluorophore led to a significant increase in reproducibility and to the correct detection of one-copy gain or loss in >90% of the analysed data, even for pseudotriploid tumour genomes.
Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.
Claire L Anderson; Thomas L Kubisiak; C Dana Nelson; Jason A Smith; John M Davis
The genome size of the pine fusiform rust pathogen Cronartium quercuum f.sp. fusiforme (Cqf) was determined by flow cytometric analysis of propidium iodide-stained, intact haploid pycniospores with haploid spores of two genetically well characterized fungal species, Sclerotinia sclerotiorum and Puccinia graminis f.sp. tritici, as size standards. The Cqf haploid genome...
Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.
Maurice - Van Eijndhoven, M.H.T.; Bovenhuis, H.; Veerkamp, R.F.; Calus, M.P.L.
The aim of this study was to identify if genomic variations associated with fatty acid (FA) composition are similar between the Holstein-Friesian (HF) and native dual-purpose breeds used in the Dutch dairy industry. Phenotypic and genotypic information were available for the breeds Meuse-Rhine-Yssel
Zhang, Z.; Mao, L.; Chen, Junshi; Bu, F.; Li, G.; Sun, J.; Li, S.; Sun, H.; Jiao, C.; Blakely, R.; Pan, J.; Cai, R.; Luo, R.; Peer, Van de Y.; Jacobsen, E.; Fei, Z.; Huang, S.
Structural variations (SVs) represent a major source of genetic diversity. However, the functional impact and formation mechanisms of SVs in plant genomes remain largely unexplored. Here, we report a nucleotide-resolution SV map of cucumber (Cucumis sativas) that comprises 26,788 SVs based on deep
Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A
Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.
Full Text Available Segregation of chromosomes during the first meiotic division relies on crossovers established during prophase. Although crossovers are strictly regulated so that at least one occurs per chromosome, individual variation in crossover levels is not uncommon. In an analysis of different inbred strains of male mice, we identified among-strain variation in the number of foci for the crossover-associated protein MLH1. We report studies of strains with "low" (CAST/EiJ, "medium" (C3H/HeJ, and "high" (C57BL/6J genome-wide MLH1 values to define factors responsible for this variation. We utilized immunofluorescence to analyze the number and distribution of proteins that function at different stages in the recombination pathway: RAD51 and DMC1, strand invasion proteins acting shortly after double-strand break (DSB formation, MSH4, part of the complex stabilizing double Holliday junctions, and the Bloom helicase BLM, thought to have anti-crossover activity. For each protein, we identified strain-specific differences that mirrored the results for MLH1; i.e., CAST/EiJ mice had the lowest values, C3H/HeJ mice intermediate values, and C57BL/6J mice the highest values. This indicates that differences in the numbers of DSBs (as identified by RAD51 and DMC1 are translated into differences in the number of crossovers, suggesting that variation in crossover levels is established by the time of DSB formation. However, DSBs per se are unlikely to be the primary determinant, since allelic variation for the DSB-inducing locus Spo11 resulted in differences in the numbers of DSBs but not the number of MLH1 foci. Instead, chromatin conformation appears to be a more important contributor, since analysis of synaptonemal complex length and DNA loop size also identified consistent strain-specific differences; i.e., crossover frequency increased with synaptonemal complex length and was inversely related to chromatin loop size. This indicates a relationship between recombination
Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M
The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G
The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.
Shen, Chao; Li, Ximei; Zhang, Ruiting; Lin, Zhongxu
Recombination is crucial for genetic evolution, which not only provides new allele combinations but also influences the biological evolution and efficacy of natural selection. However, recombination variation is not well understood outside of the complex species' genomes, and it is particularly unclear in Gossypium. Cotton is the most important natural fibre crop and the second largest oil-seed crop. Here, we found that the genetic and physical maps distances did not have a simple linear relationship. Recombination rates were unevenly distributed throughout the cotton genome, which showed marked changes along the chromosome lengths and recombination was completely suppressed in the centromeric regions. Recombination rates significantly varied between A-subgenome (At) (range = 1.60 to 3.26 centimorgan/megabase [cM/Mb]) and D-subgenome (Dt) (range = 2.17 to 4.97 cM/Mb), which explained why the genetic maps of At and Dt are similar but the physical map of Dt is only half that of At. The translocation regions between A02 and A03 and between A04 and A05, and the inversion regions on A10, D10, A07 and D07 indicated relatively high recombination rates in the distal regions of the chromosomes. Recombination rates were positively correlated with the densities of genes, markers and the distance from the centromere, and negatively correlated with transposable elements (TEs). The gene ontology (GO) categories showed that genes in high recombination regions may tend to response to environmental stimuli, and genes in low recombination regions are related to mitosis and meiosis, which suggested that they may provide the primary driving force in adaptive evolution and assure the stability of basic cell cycle in a rapidly changing environment. Global knowledge of recombination rates will facilitate genetics and breeding in cotton.
Glessner, Joseph T; Wang, Kai; Cai, Guiqing; Korvatska, Olena; Kim, Cecilia E; Wood, Shawn; Zhang, Haitao; Estes, Annette; Brune, Camille W; Bradfield, Jonathan P; Imielinski, Marcin; Frackelton, Edward C; Reichert, Jennifer; Crawford, Emily L; Munson, Jeffrey; Sleiman, Patrick M A; Chiavacci, Rosetta; Annaiah, Kiran; Thomas, Kelly; Hou, Cuiping; Glaberson, Wendy; Flory, James; Otieno, Frederick; Garris, Maria; Soorya, Latha; Klei, Lambertus; Piven, Joseph; Meyer, Kacie J; Anagnostou, Evdokia; Sakurai, Takeshi; Game, Rachel M; Rudd, Danielle S; Zurawiecki, Danielle; McDougle, Christopher J; Davis, Lea K; Miller, Judith; Posey, David J; Michaels, Shana; Kolevzon, Alexander; Silverman, Jeremy M; Bernier, Raphael; Levy, Susan E; Schultz, Robert T; Dawson, Geraldine; Owley, Thomas; McMahon, William M; Wassink, Thomas H; Sweeney, John A; Nurnberger, John I; Coon, Hilary; Sutcliffe, James S; Minshew, Nancy J; Grant, Struan F A; Bucan, Maja; Cook, Edwin H; Buxbaum, Joseph D; Devlin, Bernie; Schellenberg, Gerard D; Hakonarson, Hakon
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins. Previous studies focusing on candidate genes or genomic regions have identified several copy number variations (CNVs) that are associated with an increased risk of ASDs. Here we present the results from a whole-genome CNV study on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who were genotyped with approximately 550,000 single nucleotide polymorphism markers, in an attempt to comprehensively identify CNVs conferring susceptibility to ASDs. Positive findings were evaluated in an independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry. Besides previously reported ASD candidate genes, such as NRXN1 (ref. 10) and CNTN4 (refs 11, 12), several new susceptibility genes encoding neuronal cell-adhesion molecules, including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to controls (P = 9.5 x 10(-3)). Furthermore, CNVs within or surrounding genes involved in the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were affected by CNVs not observed in controls (P = 3.3 x 10(-3)). We also identified duplications 55 kilobases upstream of complementary DNA AK123120 (P = 3.6 x 10(-6)). Although these variants may be individually rare, they target genes involved in neuronal cell-adhesion or ubiquitin degradation, indicating that these two important gene networks expressed within the central nervous system may contribute to the genetic susceptibility of ASD.
Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun
Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
Full Text Available Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus.
Bouyioukos, Costas; Elati, Mohamed; Képès, François
Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.
Nicholas R Thomson
Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common
Rellstab Christian; Gugerli Felix; Eckert Andrew J.; Hancock Angela M.; Holderegger Rolf
Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next generation sequencing which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel data sets describing environmental factors have greatly improved and increasingly become pu...
Identification of balanced chromosomal rearrangements previously unknown among participants in the 1000 Genomes Project: implications for interpretation of structural variation in genomes and the future of clinical cytogenetics.
Dong, Zirui; Wang, Huilin; Chen, Haixiao; Jiang, Hui; Yuan, Jianying; Yang, Zhenjun; Wang, Wen-Jing; Xu, Fengping; Guo, Xiaosen; Cao, Ye; Zhu, Zhenzhen; Geng, Chunyu; Cheung, Wan Chee; Kwok, Yvonne K; Yang, Huanming; Leung, Tak Yeung; Morton, Cynthia C; Cheung, Sau Wai; Choy, Kwong Wai
PurposeRecent studies demonstrate that whole-genome sequencing enables detection of cryptic rearrangements in apparently balanced chromosomal rearrangements (also known as balanced chromosomal abnormalities, BCAs) previously identified by conventional cytogenetic methods. We aimed to assess our analytical tool for detecting BCAs in the 1000 Genomes Project without knowing which bands were affected.MethodsThe 1000 Genomes Project provides an unprecedented integrated map of structural variants in phenotypically normal subjects, but there is no information on potential inclusion of subjects with apparent BCAs akin to those traditionally detected in diagnostic cytogenetics laboratories. We applied our analytical tool to 1,166 genomes from the 1000 Genomes Project with sufficient physical coverage (8.25-fold).ResultsWith this approach, we detected four reciprocal balanced translocations and four inversions, ranging in size from 57.9 kb to 13.3 Mb, all of which were confirmed by cytogenetic methods and polymerase chain reaction studies. One of these DNAs has a subtle translocation that is not readily identified by chromosome analysis because of the similarity of the banding patterns and size of exchanged segments, and another results in disruption of all transcripts of an OMIM gene.ConclusionOur study demonstrates the extension of utilizing low-pass whole-genome sequencing for unbiased detection of BCAs including translocations and inversions previously unknown in the 1000 Genomes Project.GENETICS in MEDICINE advance online publication, 2 November 2017; doi:10.1038/gim.2017.170.
Maurice-Van Eijndhoven, M H T; Bovenhuis, H; Veerkamp, R F; Calus, M P L
The aim of this study was to identify if genomic variations associated with fatty acid (FA) composition are similar between the Holstein-Friesian (HF) and native dual-purpose breeds used in the Dutch dairy industry. Phenotypic and genotypic information were available for the breeds Meuse-Rhine-Yssel (MRY), Dutch Friesian (DF), Groningen White Headed (GWH), and HF. First, the reliability of genomic breeding values of the native Dutch dual-purpose cattle breeds MRY, DF, and GWH was evaluated using single nucleotide polymorphism (SNP) effects estimated in HF, including all SNP or subsets with stronger associations in HF. Second, the genomic variation of the regions associated with FA composition in HF (regions on Bos taurus autosome 5, 14, and 26), were studied in the different breeds. Finally, similarities in genotype and allele frequencies between MRY, DF, GWH, and HF breeds were assessed for specific regions associated with FA composition. On average across the traits, the highest reliabilities of genomic prediction were estimated for GWH (0.158) and DF (0.116) when the 8 to 22 SNP with the strongest association in HF were included. With the same set of SNP, GEBV for MRY were the least reliable (0.022). This indicates that on average only 2 (MRY) to 16% (GWH) of the genomic variation in HF is shared with the native Dutch dual-purpose breeds. The comparison of predicted variances of different regions associated with milk and milk fat composition showed that breeds clearly differed in genomic variation within these regions. Finally, the correlations of allele frequencies between breeds across the 8 to 22 SNP with the strongest association in HF were around 0.8 between the Dutch native dual-purpose breeds, whereas the correlations between the native breeds and HF were clearly lower and around 0.5. There was no consistent relationship between the reliabilities of genomic prediction for a specific breed and the correlation between the allele frequencies of this breed
Full Text Available Although great progress in genome-wide association studies (GWAS has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked. The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001
Thind, Anupriya Kaur
Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.
Luo, Li; Zhu, Yun; Xiong, Momiao
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Fungi are microorganisms whose astounding variety can be found in every conceivable ecosystem on the planet. Fungi are nutrient recyclers, playing an irreplaceable role in the carbon cycle. They grow on land and in the sea, on plants and animals and in the soil. They feed us as mushrooms, and drive
Fungi are microorganisms whose astounding variety can be found in every conceivable ecosystem on the planet. Fungi are nutrient recyclers, playing an irreplaceable role in the carbon cycle. They grow on land and in the sea, on plants and animals and in the soil. They feed us as mushrooms, and drive our economy as bioreactors. They leaven our bread and brew our beer, nourish our crops and spoil our food. They even directly play a role in human health. Fungi are, however, far more complex organ...
Jia, Guanqing; Huang, Xuehui; Zhi, Hui; Zhao, Yan; Zhao, Qiang; Li, Wenjun; Chai, Yang; Yang, Lifang; Liu, Kunyan; Lu, Hengyun; Zhu, Chuanrang; Lu, Yiqi; Zhou, Congcong; Fan, Danlin; Weng, Qijun; Guo, Yunli; Huang, Tao; Zhang, Lei; Lu, Tingting; Feng, Qi; Hao, Hangfei; Liu, Hongkuan; Lu, Ping; Zhang, Ning; Li, Yuhui; Guo, Erhu; Wang, Shujun; Wang, Suying; Liu, Jinrong; Zhang, Wenfei; Chen, Guoqiu; Zhang, Baojin; Li, Wei; Wang, Yongfang; Li, Haiquan; Zhao, Baohua; Li, Jiayang; Diao, Xianmin; Han, Bin
Foxtail millet (Setaria italica) is an important grain crop that is grown in arid regions. Here we sequenced 916 diverse foxtail millet varieties, identified 2.58 million SNPs and used 0.8 million common SNPs to construct a haplotype map of the foxtail millet genome. We classified the foxtail millet varieties into two divergent groups that are strongly correlated with early and late flowering times. We phenotyped the 916 varieties under five different environments and identified 512 loci associated with 47 agronomic traits by genome-wide association studies. We performed a de novo assembly of deeply sequenced genomes of a Setaria viridis accession (the wild progenitor of S. italica) and an S. italica variety and identified complex interspecies and intraspecies variants. We also identified 36 selective sweeps that seem to have occurred during modern breeding. This study provides fundamental resources for genetics research and genetic improvement in foxtail millet.
Van Eyk, Jennifer; Dunn, M. J
... to cardiovascular disease. By exploring the various strategies and technical aspects of both, using examples from cardiac or vascular biology, the limitations and the potential of these methods can be clearly seen. The book is divided into three sections: the first focuses on genomics, the second on proteomics, and the third provides an overview of the importance of these two scientific disciplines in drug and diagnostic discovery. The goal of this book is the transfer of their hard-earned lessons to the growing num...
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available The transition from commensalism to pathogenicity of Candida albicans reflects both the host inability to mount specific immune responses and the microorganism’s dimorphic switch efficiency. In this study, we used whole genome sequencing and microarray analysis to investigate the genomic determinants of the phenotypic changes observed in two C. albicans clinical isolates (YL1 and YQ2. In vitro experiments employing epithelial, microglial, and peripheral blood mononuclear cells were thus used to evaluate C. albicans isolates interaction with first line host defenses, measuring adhesion, susceptibility to phagocytosis, and induction of secretory responses. Moreover, a murine model of peritoneal infection was used to compare the in vivo pathogenic potential of the two isolates. Genome sequence and gene expression analysis of C. albicans YL1 and YQ2 showed significant changes in cellular pathways involved in environmental stress response, adhesion, filamentous growth, invasiveness, and dimorphic transition. This was in accordance with the observed marked phenotypic differences in biofilm production, dimorphic switch efficiency, cell adhesion, invasion, and survival to phagocyte-mediated host defenses. The mutations in key regulators of the hyphal growth pathway in the more virulent strain corresponded to an overall greater number of budding yeast cells released. Compared to YQ2, YL1 consistently showed enhanced pathogenic potential, since in vitro, it was less susceptible to ingestion by phagocytic cells and more efficient in invading epithelial cells, while in vivo YL1 was more effective than YQ2 in recruiting inflammatory cells, eliciting IL-1β response and eluding phagocytic cells. Overall, these results indicate an unexpected isolate-specific variation in pathways important for host invasion and colonization, showing how the genetic background of C. albicans may greatly affect its behavior both in vitro and in vivo. Based on this approach, we
Bonito, Andrea; Pasciak, Joseph E.
is captured well enough by the coarsest grid. The main argument hinges on a perturbation analysis from an auxiliary variational algorithm defined directly on the smooth surface. In addition, the vanishing mean value constraint is imposed on each level, thereby
Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W
Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Santoni, Daniele; Romano-Spica, Vincenzo
Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.
Zakham, F.; Belayachi, L.; Ussery, David
. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length...... defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene...... the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str...
Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng
Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus.
Full Text Available Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS, which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish. Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org. The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.
Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael
High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.
Pavlović-Lažetić Gordana M
Full Text Available Abstract Background We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP, insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs. Results The nucleotide structure of all 38 isolates is presented. Based on insertions and deletions and dissimilarity due to SNPs, the dataset of all the isolates has been qualitatively classified into three groups each having their own subgroups. These are the A-group with "regular" isolates (no insertions / deletions except for 5' and 3' ends, the B-group of isolates with "long insertions", and the C-group of isolates with "many individual" insertions and deletions. The isolate with the smallest average number of SNPs, compared to other isolates, has been identified (TWH. The density distribution of SNPs, insertions and deletions for each group or subgroup, as well as cumulatively for all the isolates is also presented, along with the gene map for TWH. Since individual SNPs may have occurred at random, positions corresponding to multiple SNPs (occurring in two or more isolates are identified and presented. This result revises some previous results of a similar type. Amino acid changes caused by multiple SNPs are also identified (for the annotated sequences, as well as presupposed amino acid changes for non-annotated ones. Exact SNP positions for the isolates in each group or subgroup are presented. Finally, a phylogenetic tree for the SARS-CoV isolates has been produced using the CLUSTALW program, showing high compatibility with former qualitative classification. Conclusions The comparative study of SARS-CoV isolates provides essential information for genome
Wang, Jun; Dayem Ullah, Abu Z; Chelala, Claude
The vast majority of germline and somatic variations occur in the noncoding part of the genome, only a small fraction of which are believed to be functional. From the tens of thousands of noncoding variations detectable in each genome, identifying and prioritizing driver candidates with putative functional significance is challenging. To address this, we implemented IW-Scoring, a new Integrative Weighted Scoring model to annotate and prioritise functionally relevant noncoding variations. We evaluate 11 scoring methods, and apply an unsupervised spectral approach for subsequent selective integration into two linear weighted functional scoring schemas for known and novel variations. IW-Scoring produces stable high-quality performance as the best predictors for three independent data sets. We demonstrate the robustness of IW-Scoring in identifying recurrent functional mutations in the TERT promoter, as well as disease SNPs in proximity to consensus motifs and with gene regulatory effects. Using follicular lymphoma as a paradigmatic cancer model, we apply IW-Scoring to locate 11 recurrently mutated noncoding regions in 14 follicular lymphoma genomes, and validate 9 of these regions in an extension cohort, including the promoter and enhancer regions of PAX5. Overall, IW-Scoring demonstrates greater versatility in identifying trait- and disease-associated noncoding variants. Scores from IW-Scoring as well as other methods are freely available from http://www.snp-nexus.org/IW-Scoring/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available Salmonella enterica serovar Typhimurium, a gram-negative facultative rod-shaped bacterium causing salmonellosis and foodborne disease, is one of the most common isolated Salmonella serovars in both developed and developing nations. Several S. Typhimurium genomes have been completed and many more genome-sequencing projects are underway. Comparative genome analysis of the multiple strains leads to a better understanding of the evolution of S. Typhimurium and its pathogenesis. S. Typhimurium strain UK-1 (belongs to phage type 1 is highly virulent when orally administered to mice and chickens and efficiently colonizes lymphoid tissues of these species. These characteristics make this strain a good choice for use in vaccine development. In fact, UK-1 has been used as the parent strain for a number of nonrecombinant and recombinant vaccine strains, including several commercial vaccines for poultry. In this study, we conducted a thorough comparative genome analysis of the UK-1 strain with other S. Typhimurium strains and examined the phenotypic impact of several genomic differences. Whole genomic comparison highlights an extremely close relationship between the UK-1 strain and other S. Typhimurium strains; however, many interesting genetic and genomic variations specific to UK-1 were explored. In particular, the deletion of a UK-1-specific gene that is highly similar to the gene encoding the T3SS effector protein NleC exhibited a significant decrease in oral virulence in BALB/c mice. The complete genetic complements in UK-1, especially those elements that contribute to virulence or aid in determining the diversity within bacterial species, provide key information in evaluating the functional characterization of important genetic determinants and for development of vaccines.
Davila Olivas, Nelson H.; Kruijer, Willem; Gort, Gerrit; Wijnen, Cris L.; Loon, van Joop J.A.; Dicke, Marcel
Plants are commonly exposed to abiotic and biotic stresses. We used 350 Arabidopsis thaliana accessions grown under controlled conditions. We employed genome-wide association analysis to investigate the genetic architecture and underlying loci involved in genetic variation in resistance to: two
to protein: through epigenetic modifications, transcription regulators or post-transcriptional controls. The following papers concern several layers of gene regulation with questions answered by different HTS approaches. Genome-wide screening of epigenetic changes by ChIP-seq allowed us to study both spatial...... and temporal alterations of histone modifications (Papers I and II). Coupling the data with machine learning approaches, we established a prediction framework to assess the most informative histone marks as well as their most influential nucleosome positions in predicting the promoter usages. (Papers I...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V...
Full Text Available Routine screening of lung transplant recipients and hospital patients for respiratory virus infections allowed to identify human rhinovirus (HRV in the upper and lower respiratory tracts, including immunocompromised hosts chronically infected with the same strain over weeks or months. Phylogenetic analysis of 144 HRV-positive samples showed no apparent correlation between a given viral genotype or species and their ability to invade the lower respiratory tract or lead to protracted infection. By contrast, protracted infections were found almost exclusively in immunocompromised patients, thus suggesting that host factors rather than the virus genotype modulate disease outcome, in particular the immune response. Complete genome sequencing of five chronic cases to study rhinovirus genome adaptation showed that the calculated mutation frequency was in the range observed during acute human infections. Analysis of mutation hot spot regions between specimens collected at different times or in different body sites revealed that non-synonymous changes were mostly concentrated in the viral capsid genes VP1, VP2 and VP3, independent of the HRV type. In an immunosuppressed lung transplant recipient infected with the same HRV strain for more than two years, both classical and ultra-deep sequencing of samples collected at different time points in the upper and lower respiratory tracts showed that these virus populations were phylogenetically indistinguishable over the course of infection, except for the last month. Specific signatures were found in the last two lower respiratory tract populations, including changes in the 5'UTR polypyrimidine tract and the VP2 immunogenic site 2. These results highlight for the first time the ability of a given rhinovirus to evolve in the course of a natural infection in immunocompromised patients and complement data obtained from previous experimental inoculation studies in immunocompetent volunteers.
Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh
Jolly Emmitt R
Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.
Full Text Available Copy number variations (CNV include net gains or losses of part or whole chromosomal regions. They differ from copy neutral loss of heterozygosity (cn-LOH events which do not induce any net change in the copy number and are often associated with uniparental disomy. These phenomena have long been reported to be associated with diseases and particularly in cancer. Losses/gains of genomic regions are often correlated with lower/higher gene expression. On the other hand, loss of heterozygosity (LOH and cn-LOH are common events in cancer and may be associated with the loss of a functional tumor suppressor gene. Therefore, identifying recurrent CNV and cn-LOH events can be important as they may highlight common biological components and give insights into the development or mechanisms of a disease. However, no currently available tools allow a comprehensive whole-genome visualization of recurrent CNVs and cn-LOH in groups of samples providing absolute quantification of the aberrations leading to the loss of potentially important information.To overcome these limitations, we developed aCNViewer (Absolute CNV Viewer, a visualization tool for absolute CNVs and cn-LOH across a group of samples. aCNViewer proposes three graphical representations: dendrograms, bi-dimensional heatmaps showing chromosomal regions sharing similar abnormality patterns, and quantitative stacked histograms facilitating the identification of recurrent absolute CNVs and cn-LOH. We illustrated aCNViewer using publically available hepatocellular carcinomas (HCCs Affymetrix SNP Array data (Fig 1A. Regions 1q and 8q present a similar percentage of total gains but significantly different copy number gain categories (p-value of 0.0103 with a Fisher exact test, validated by another cohort of HCCs (p-value of 5.6e-7 (Fig 2B.aCNViewer is implemented in python and R and is available with a GNU GPLv3 license on GitHub https://github.com/FJD-CEPH/aCNViewer and Docker https://hub.docker.com/r/fjdceph/acnviewer/.aCNViewer@cephb.fr.
Renault, Victor; Tost, Jörg; Pichon, Fabien; Wang-Renault, Shu-Fang; Letouzé, Eric; Imbeaud, Sandrine; Zucman-Rossi, Jessica; Deleuze, Jean-François; How-Kit, Alexandre
Copy number variations (CNV) include net gains or losses of part or whole chromosomal regions. They differ from copy neutral loss of heterozygosity (cn-LOH) events which do not induce any net change in the copy number and are often associated with uniparental disomy. These phenomena have long been reported to be associated with diseases and particularly in cancer. Losses/gains of genomic regions are often correlated with lower/higher gene expression. On the other hand, loss of heterozygosity (LOH) and cn-LOH are common events in cancer and may be associated with the loss of a functional tumor suppressor gene. Therefore, identifying recurrent CNV and cn-LOH events can be important as they may highlight common biological components and give insights into the development or mechanisms of a disease. However, no currently available tools allow a comprehensive whole-genome visualization of recurrent CNVs and cn-LOH in groups of samples providing absolute quantification of the aberrations leading to the loss of potentially important information. To overcome these limitations, we developed aCNViewer (Absolute CNV Viewer), a visualization tool for absolute CNVs and cn-LOH across a group of samples. aCNViewer proposes three graphical representations: dendrograms, bi-dimensional heatmaps showing chromosomal regions sharing similar abnormality patterns, and quantitative stacked histograms facilitating the identification of recurrent absolute CNVs and cn-LOH. We illustrated aCNViewer using publically available hepatocellular carcinomas (HCCs) Affymetrix SNP Array data (Fig 1A). Regions 1q and 8q present a similar percentage of total gains but significantly different copy number gain categories (p-value of 0.0103 with a Fisher exact test), validated by another cohort of HCCs (p-value of 5.6e-7) (Fig 2B). aCNViewer is implemented in python and R and is available with a GNU GPLv3 license on GitHub https://github.com/FJD-CEPH/aCNViewer and Docker https
Pu, Lianrong; Lin, Yu; Pevzner, Pavel A
Although segmental duplications (SDs) represent hotbeds for genomic rearrangements and emergence of new genes, there are still no easy-to-use tools for identifying SDs. Moreover, while most previous studies focused on recently emerged SDs, detection of ancient SDs remains an open problem. We developed an SDquest algorithm for SD finding and applied it to analyzing SDs in human, gorilla, and mouse genomes. Our results demonstrate that previous studies missed many SDs in these genomes and show that SDs account for at least 6.05% of the human genome (version hg19), a 17% increase as compared to the previous estimate. Moreover, SDquest classified 6.42% of the latest GRCh38 version of the human genome as SDs, a large increase as compared to previous studies. We thus propose to re-evaluate evolution of SDs based on their accurate representation across multiple genomes. Toward this goal, we analyzed the complex mosaic structure of SDs and decomposed mosaic SDs into elementary SDs, a prerequisite for follow-up evolutionary analysis. We also introduced the concept of the breakpoint graph of mosaic SDs that revealed SD hotspots and suggested that some SDs may have originated from circular extrachromosomal DNA (ecDNA), not unlike ecDNA that contributes to accelerated evolution in cancer. © 2018 Pu et al.; Published by Cold Spring Harbor Laboratory Press.
Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.
S.L. Macrae (Sheila L.); Q. Zhang (Quanwei); C. Lemetre (Christophe); I. Seim (Inge); R.B. Calder (Robert B.); J.H.J. Hoeijmakers (Jan); Y. Suh (Yousin); V.N. Gladyshev (Vadim N.); A. Seluanov (Andrei); V. Gorbunova (Vera); J. Vijg (Jan); Z.D. Zhang (Zhengdong D.)
textabstractGenome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM
Kamanu, Frederick Kinyua
Elia, Josephine; Glessner, Joseph T; Wang, Kai; Takahashi, Nagahide; Shtir, Corina J; Hadley, Dexter; Sleiman, Patrick M A; Zhang, Haitao; Kim, Cecilia E; Robison, Reid; Lyon, Gholson J; Flory, James H; Bradfield, Jonathan P; Imielinski, Marcin; Hou, Cuiping; Frackelton, Edward C; Chiavacci, Rosetta M; Sakurai, Takeshi; Rabin, Cara; Middleton, Frank A; Thomas, Kelly A; Garris, Maria; Mentch, Frank; Freitag, Christine M; Steinhausen, Hans-Christoph; Todorov, Alexandre A; Reif, Andreas; Rothenberger, Aribert; Franke, Barbara; Mick, Eric O; Roeyers, Herbert; Buitelaar, Jan; Lesch, Klaus-Peter; Banaschewski, Tobias; Ebstein, Richard P; Mulas, Fernando; Oades, Robert D; Sergeant, Joseph; Sonuga-Barke, Edmund; Renner, Tobias J; Romanos, Marcel; Romanos, Jasmin; Warnke, Andreas; Walitza, Susanne; Meyer, Jobst; Pálmason, Haukur; Seitz, Christiane; Loo, Sandra K; Smalley, Susan L; Biederman, Joseph; Kent, Lindsey; Asherson, Philip; Anney, Richard J L; Gaynor, J William; Shaw, Philip; Devoto, Marcella; White, Peter S; Grant, Struan F A; Buxbaum, Joseph D; Rapoport, Judith L; Williams, Nigel M; Nelson, Stanley F; Faraone, Stephen V; Hakonarson, Hakon
Attention deficit hyperactivity disorder (ADHD) is a common, heritable neuropsychiatric disorder of unknown etiology. We performed a whole-genome copy number variation (CNV) study on 1,013 cases with ADHD and 4,105 healthy children of European ancestry using 550,000 SNPs. We evaluated statistically significant findings in multiple independent cohorts, with a total of 2,493 cases with ADHD and 9,222 controls of European ancestry, using matched platforms. CNVs affecting metabotropic glutamate receptor genes were enriched across all cohorts (P = 2.1 × 10−9). We saw GRM5 (encoding glutamate receptor, metabotropic 5) deletions in ten cases and one control (P = 1.36 × 10−6). We saw GRM7 deletions in six cases, and we saw GRM8 deletions in eight cases and no controls. GRM1 was duplicated in eight cases. We experimentally validated the observed variants using quantitative RT-PCR. A gene network analysis showed that genes interacting with the genes in the GRM family are enriched for CNVs in ~10% of the cases (P = 4.38 × 10−10) after correction for occurrence in the controls. We identified rare recurrent CNVs affecting glutamatergic neurotransmission genes that were overrepresented in multiple ADHD cohorts. PMID:22138692
Through this book, researchers and students will learn to use R for analysis of large-scale genomic data and how to create routines to automate analytical steps. The philosophy behind the book is to start with real world raw datasets and perform all the analytical steps needed to reach final results. Though theory plays an important role, this is a practical book for advanced undergraduate and graduate classes in bioinformatics, genomics and statistical genetics or for use in lab sessions. This book is also designed to be used by students in computer science and statistics who want to learn the practical aspects of genomic analysis without delving into algorithmic details. The datasets used throughout the book may be downloaded from the publisher’s website. Chapters show how to handle and manage high-throughput genomic data, create automated workflows and speed up analyses in R. A wide range of R packages useful for working with genomic data are illustrated with practical examples. In recent years R has b...
Evan H Hurowitz
Full Text Available We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale.We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90% confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs tend to be longer or shorter than average; these functional classes were similar in both human and yeast.Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.
Hurowitz, Evan H; Drori, Iddo; Stodden, Victoria C; Donoho, David L; Brown, Patrick O
We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.
Alison R Frand
Full Text Available Although the molting cycle is a hallmark of insects and nematodes, neither the endocrine control of molting via size, stage, and nutritional inputs nor the enzymatic mechanism for synthesis and release of the exoskeleton is well understood. Here, we identify endocrine and enzymatic regulators of molting in C. elegans through a genome-wide RNA-interference screen. Products of the 159 genes discovered include annotated transcription factors, secreted peptides, transmembrane proteins, and extracellular matrix enzymes essential for molting. Fusions between several genes and green fluorescent protein show a pulse of expression before each molt in epithelial cells that synthesize the exoskeleton, indicating that the corresponding proteins are made in the correct time and place to regulate molting. We show further that inactivation of particular genes abrogates expression of the green fluorescent protein reporter genes, revealing regulatory networks that might couple the expression of genes essential for molting to endocrine cues. Many molting genes are conserved in parasitic nematodes responsible for human disease, and thus represent attractive targets for pesticide and pharmaceutical development.
Olsen, Line; Hansen, Thomas; Djurovic, Srdjan
in a combined analysis of three case-control samples from Denmark, Norway and Iceland. A total of 1897 cases (n=1223 unipolar and n=463 bipolar) and 11 231 controls were analyzed for CNVs at the 10 genomic loci, but we found no combined association between these CNVs and affective disorders....
Mohammadnejad, Afsaneh; Brasch-Andersen, Charlotte; Haagerup, Annette
Background: Allergic Rhinitis (AR) is a complex disorder that affects many people around the world. There is a high genetic contribution to the development of the AR, as twins and family studies have estimated heritability of more than 33%. Due to the complex nature of the disease, single SNP...... analysis has limited power in identifying the genetic variations for AR. We combined genome-wide association analysis (GWAS) with polygenic risk score (PRS) in exploring the genetic basis underlying the disease. Methods: We collected clinical data on 631 Danish subjects with AR cases consisting of 434...... sibling pairs and unrelated individuals and control subjects of 197 unrelated individuals. SNP genotyping was done by Affymetrix Genome-Wide Human SNP Array 5.0. SNP imputation was performed using "IMPUTE2". Using additive effect model, GWAS was conducted in discovery sample, the genotypes...
The plant pleiotropic drug resistance (PDR) family of ATP-binding cassette (ABC) transporters has comprehensively been researched in relation to transport of antifungal agents and resistant pathogens. In our study, analyses of the whole family of PDR genes present in the potato genome were provided. This analysis ...
Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E
Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.
Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl
Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was
Christoforides, Alexis; Carpten, John D; Weiss, Glen J; Demeure, Michael J; Von Hoff, Daniel D; Craig, David W
The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.
A.R. Wood (Andrew); T. Esko (Tõnu); J. Yang (Jian); S. Vedantam (Sailaja); T.H. Pers (Tune); S. Gustafsson (Stefan); A.Y. Chu (Audrey Y); K. Estrada Gil (Karol); J. Luan; Z. Kutalik; N. Amin (Najaf); M.L. Buchkovich (Martin); D.C. Croteau-Chonka (Damien); F.R. Day (Felix); Y. Duan (Yanan); M. Fall (Magnus); R.S.N. Fehrmann (Rudolf); T. Ferreira (Teresa); A.U. Jackson (Anne); J. Karjalainen (Juha); K.S. Lo (Ken Sin); A. Locke (Adam); R. Mägi (Reedik); E. Mihailov (Evelin); E. Porcu (Eleonora); J.C. Randall (Joshua); A. Scherag (Andre); A.A.E. Vinkhuyzen (Anna A.); H.J. Westra (Harm-Jan); T.W. Winkler (Thomas W.); T. Workalemahu (Tsegaselassie); J.H. Zhao (Jing Hua); D. Absher (Devin); E. Albrecht (Eva); J. Baron (Jeffrey); M. Beekman (Marian); A. Demirkan (Ayşe); G.B. Ehret (Georg); B. Feenstra; M.F. Feitosa (Mary Furlan); K. Fischer (Krista); R.M. Fraser (Ross); A. Goel (Anuj); J. Gong (Jian); A.E. Justice (Anne); S. Kanoni (Stavroula); M.E. Kleber (Marcus); K. Kristiansson (Kati); U. Lim (Unhee); V. Lotay (Vaneet); J.C. Lui (Julian C); M. Mangino (Massimo); I.M. Leach (Irene Mateo); M.C. Medina-Gomez (Carolina); M.A. Nalls (Michael); A.S. Dimas (Antigone); C. Palmer (Cameron); D. Pasko (Dorota); S. Pechlivanis (Sonali); I. Prokopenko (Inga); J.S. Ried (Janina); S. Ripke (Stephan); D. Shungin (Dmitry); A. Stancáková (Alena); R.J. Strawbridge (Rona); Y.J. Sung (Yun Ju); T. Tanaka (Toshiko); A. Teumer (Alexander); S. Trompet (Stella); S.W. Van Der Laan (Sander W.); J. van Setten (Jessica); J.V. van Vliet-Ostaptchouk (Jana); Z. Wang (Zhaoming); L. Yengo (Loic); W. Zhang (Weihua); U. Afzal (Uzma); J. Ärnlöv (Johan); G.M. Arscott (Gillian M.); S. Bandinelli (Stefania); A. Barrett (Angela); C. Bellis (Claire); A.J. Bennett (Amanda); C. Berne (Christian); M. Blüher (Matthias); J.L. Bolton (Jennifer); Y. Böttcher (Yvonne); H.A. Boyd; M. Bruinenberg (M.); B.M. Buckley (Brendan M.); S. Buyske (Steven); I.H. Caspersen (Ida H.); P.S. Chines (Peter); R. Clarke (Robert); S. Claudi-Boehm (Simone); M.N. Cooper (Matthew); E.W. Daw (E Warwick); P.A. De Jong (Pim A); J. Deelen (Joris); G. Delgado; J.C. Denny (Josh C); R.A.M. Dhonukshe-Rutten (Rosalie); M. Dimitriou (Maria); A.S.F. Doney (Alex); M. Dörr (Marcus); N. Eklund (Niina); E. Eury (Elodie); L. Folkersen (Lasse); M. Garcia (Melissa); F. Geller (Frank); V. Giedraitis (Vilmantas); A. Go (Attie); H. Grallert (Harald); T.B. Grammer (Tanja B); J. Gräßler (Jürgen); H. Grönberg (Henrik); L.C.P.G.M. de Groot (Lisette); C.J. Groves (Christopher J.); J. Haessler (Jeff); P. Hall (Per); T. Haller (Toomas); G. Hallmans (Göran); M. Hannemann (Mario); C.A. Hartman (Catharina); M. Hassinen (Maija); C. Hayward (Caroline); N.L. Heard-Costa (Nancy); Q. Helmer (Quinta); G. Hemani; A.K. Henders (Anjali); H.L. Hillege (Hans); M.A. Hlatky (Mark); W. Hoffmann (Wolfgang); P. Hoffmann (Per); O.L. Holmen (Oddgeir); J.J. Houwing-Duistermaat (Jeanine); T. Illig (Thomas); A. Isaacs (Aaron); A.L. James (Alan); J. Jeff (Janina); B. Johansen (Berit); A. Johansson (Åsa); G.J. Jolley (Jason); T. Juliusdottir (Thorhildur); M.J. Junttila (Juhani); M.M.L. Kho (Marcia); L. Kinnunen (Leena); N. Klopp (Norman); T. Kocher; W. Kratzer (Wolfgang); P. Lichtner (Peter); L. Lind (Lars); J. Lindström (Jaana); S. Lobbens (Stéphane); M. Lorentzon (Mattias); Y. Lu (Yingchang); V. Lyssenko (Valeriya); P.K. Magnusson (Patrik); A. Mahajan (Anubha); M. Maillard (Marc); W.L. McArdle (Wendy); C.A. McKenzie (Colin A.); S. McLachlan (Stela); P.J. McLaren (Paul J); C. Menni (Cristina); S. Merger (Sigrun); L. Milani (Lili); A. Moayyeri (Alireza); K.L. Monda (Keri); M.A. Morken (Mario); G. Müller (Gabriele); M. Müller-Nurasyid (Martina); A.W. Musk (Arthur); N. Narisu (Narisu); M. Nauck (Matthias); I.M. Nolte (Ilja M.); M.M. Nöthen (Markus); L. Oozageer (Laticia); S. Pilz (Stefan); N.W. Rayner (Nigel William); F. Renström (Frida); N.R. Robertson (Neil R.); L.M. Rose (Lynda M.); R. Roussel (Ronan); S. Sanna (Serena); H. Scharnagl (Hubert); S. Scholtens (Salome); F.R. Schumacher (Fredrick R); H. Schunkert (Heribert); R.A. Scott (Robert); J.S. Sehmi (Joban); T. Seufferlein (Thomas); J. Shi (Jianxin); K. Silventoinen (Karri); J.H. Smit (Johannes); G.D. Smith; J. Smolonska (Joanna); A. Stanton (Alice); K. Stirrups (Kathy); D.J. Stott (David J); H.M. Stringham (Heather); J. Sundstrom (Johan); M. Swertz (Morris); A.C. Syvanen; B. Tayo (Bamidele); G. Thorleifsson (Gudmar); J.P. Tyrer (Jonathan); S. Van Dijk (Suzanne); N.M. van Schoor (Natasja); N. van der Velde (Nathalie); D. van Heemst (Diana); F.V.A. Van Oort (Floor V A); S.H.H.M. Vermeulen (Sita); N. Verweij (Niek); J.M. Vonk (Judith M); L. Waite (Lindsay); M. Waldenberger (Melanie); R. Wennauer (Roman); L.R. Wilkens (Lynne R.); C. Willenborg (Christina); T. Wilsgaard (Tom); M.K. Wojczynski (Mary ); A. Wong (Andrew); A. Wright (Alan); Q. Zhang (Qunyuan); D. Arveiler (Dominique); S.J.L. Bakker (Stephan); J. Beilby (John); R.N. Bergman (Richard); S.M. Bergmann (Sven); R. Biffar; J. Blangero (John); D.I. Boomsma (Dorret); S.R. Bornstein (Stefan R.); P. Bovet (Pascal); P. Brambilla (Paolo); M.J. Brown (Morris); H. Campbell (Harry); M. Caulfield (Mark); A. Chakravarti (Aravinda); F.S. Collins (Francis); D.C. Crawford (Dana); L.A. Cupples (Adrienne); J. Danesh (John); U. de Faire (Ulf); H.M. den Ruijter (Hester ); R. Erbel (Raimund); J. Erdmann (Jeanette); J. Eriksson; M. Farrall (Martin); E. Ferrannini (Ele); J. Ferrieres (Jean); I. Ford; N.G. Forouhi (Nita); T. Forrester (Terrence); R.T. Gansevoort (Ron); P.V. Gejman (Pablo); C. Gieger (Christian); A. Golay (Alain); R.F. Gottesman (Rebecca); V. Gudnason (Vilmundur); U. Gyllensten (Ulf); D.W. Haas (David W); A.S. Hall (Alistair); T.B. Harris (Tamara); A.T. Hattersley (Andrew); A.C. Heath (Andrew C); C. Hengstenberg (Christian); A.A. Hicks (Andrew); L.A. Hindorff (Lucia A); A. Hingorani (Aroon); A. Hofman (Albert); G.K. Hovingh (Kees); S.E. Humphries (Steve E.); S.C. Hunt (Steven); E. Hypponen (Elina); K.B. Jacobs (Kevin); M.-R. Jarvelin (Marjo-Riitta); P. Jousilahti (Pekka); A. Jula (Antti); J. Kaprio (Jaakko); J.J.P. Kastelein (John); M.H. Kayser (Manfred); F. Kee (Frank); S. Keinanen-Kiukaanniemi (Sirkka); L.A.L.M. Kiemeney (Bart); J.S. Kooner (Jaspal S.); C. Kooperberg (Charles); S. Koskinen (Seppo); P. Kovacs (Peter); A. Kraja (Aldi); M. Kumari (Meena); J. Kuusisto (Johanna); T.A. Lakka (Timo); C. Langenberg (Claudia); L. Le Marchand (Loic); T. Lehtimäki (Terho); S. Lupoli (Sara); P.A. Madden; S. Männistö (Satu); P. Manunta (Paolo); A. Marette (Andre'); T.C. Matise (Tara C.); B. McKnight (Barbara); T. Meitinger (Thomas); F.L. Moll (Frans); G.W. Montgomery (Grant W.); A.D. Morris (Andrew); A.P. Morris (Andrew); J.C. Murray (Jeffrey); M. Nelis (Mari); C. Ohlsson (Claes); A.J. Oldehinkel (Albertine); K.K. Ong (Ken K.); W.H. Ouwehand (Willem); G. Pasterkamp (Gerard); A. Peters (Annette); P.P. Pramstaller (Peter Paul); J.F. Price (Jackie F.); L. Qi (Lu); O. Raitakari (Olli); T. Rankinen (Tuomo); D.C. Rao (Dabeeru C.); T.K. Rice (Treva K.); M.D. Ritchie (Marylyn D.); I. Rudan (Igor); V. Salomaa (Veikko); N.J. Samani (Nilesh); J. Saramies (Jouko); M.A. Sarzynski (Mark A.); P.E.H. Schwarz (Peter E. H.); S. Sebert (Sylvain); P. Sever (Peter); A.R. Shuldiner (Alan); J. Sinisalo (Juha); V. Steinthorsdottir (Valgerdur); R.P. Stolk; J.-C. Tardif (Jean-Claude); A. Tönjes (Anke); A. Tremblay (Angelo); E. Tremoli (Elena); J. Virtamo (Jarmo); M.-C. Vohl (Marie-Claude); P. Amouyel (Philippe); F.W. Asselbergs (Folkert W.); T.L. Assimes (Themistocles); M. Bochud (Murielle); B.O. Boehm (Bernhard); E.A. Boerwinkle (Eric); E.P. Bottinger (Erwin P.); C. Bouchard (Claude); S. Cauchi (Stéphane); J.C. Chambers (John C.); S.J. Chanock (Stephen); R.S. Cooper (Richard S.); P.I.W. de Bakker (Paul); G.V. Dedoussis (George); L. Ferrucci (Luigi); P.W. Franks; P. Froguel (Philippe); L. Groop (Leif); C.A. Haiman (Christopher); A. Hamsten (Anders); M.G. Hayes (M. Geoffrey); J. Hui (Jennie); D. Hunter (David); K. Hveem (Kristian); J.W. Jukema (Jan Wouter); R.C. Kaplan (Robert); M. Kivimaki (Mika); D. Kuh (Diana); M. Laakso (Markku); Y. Liu (YongMei); N.G. Martin (Nicholas); W. März (Winfried); M. Melbye (Mads); S. Moebus (Susanne); P. Munroe (Patricia); I. Njølstad (Inger); B.A. Oostra (Ben); C.N.A. Palmer (Colin); N.L. Pedersen (Nancy L.); M. Perola (Markus); L. Perusse (Louis); U. Peters (Ulrike); J.E. Powell (Joseph); C. Power (Christine); T. Quertermous (Thomas); R. Rauramaa (Rainer); E. Reinmaa (Eva); P.M. Ridker (Paul); F. Rivadeneira Ramirez (Fernando); J.I. Rotter (Jerome I.); T. Saaristo (Timo); D. Saleheen; D. Schlessinger (David); P.E. Slagboom (P Eline); H. Snieder (Harold); T.D. Spector (Timothy); K. Strauch (Konstantin); M. Stumvoll (Michael); J. Tuomilehto (Jaakko); M. Uusitupa (Matti); P. van der Harst (Pim); H. Völzke (Henry); M. Walker (Mark); N.J. Wareham (Nick); H. Watkins (Hugh); H.E. Wichmann (Heinz Erich); J.F. Wilson (James F); P. Zanen (Pieter); P. Deloukas (Panagiotis); I.M. Heid (Iris); C.M. Lindgren (Cecilia); K.L. Mohlke (Karen); E.K. Speliotes (Elizabeth); U. Thorsteinsdottir (Unnur); I.E. Barroso (Inês); C.S. Fox (Caroline S.); K.E. North (Kari); D.P. Strachan (David P.); J.S. Beckmann (Jacques); S.I. Berndt (Sonja); M. Boehnke (Michael); I.B. Borecki (Ingrid); M.I. McCarthy (Mark); A. Metspalu (Andres); J-A. Zwart (John-Anker); A.G. Uitterlinden (André); C.M. van Duijn (Cornelia); L. Franke (Lude); C.J. Willer (Cristen); A. Price (Alkes); G. Lettre (Guillaume); R.J.F. Loos (Ruth); M.N. Weedon (Michael); E. Ingelsson (Erik); J.R. O´Connell; G.R. Abecasis (Gonçalo); D.I. Chasman (Daniel); D. Anderson (Denise); M.E. Goddard (Michael); P.M. Visscher (Peter); J.N. Hirschhorn (Joel); T.M. Frayling (Timothy)
textabstractUsing genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated
Wood, Andrew R.; Esko, Tonu; Yang, Jian; Vedantam, Sailaja; Pers, Tune H.; Gustafsson, Stefan; Chu, Audrey Y.; Estrada, Karol; Luan, Jian'an; Kutalik, Zoltán; Amin, Najaf; Buchkovich, Martin L.; Croteau-Chonka, Damien C.; Day, Felix R.; Duan, Yanan; Fall, Tove; Fehrmann, Rudolf; Ferreira, Teresa; Jackson, Anne U.; Karjalainen, Juha; Lo, Ken Sin; Locke, Adam E.; Mägi, Reedik; Mihailov, Evelin; Porcu, Eleonora; Randall, Joshua C.; Scherag, André; Vinkhuyzen, Anna A. E.; Westra, Harm-Jan; Winkler, Thomas W.; Workalemahu, Tsegaselassie; Zhao, Jing Hua; Absher, Devin; Albrecht, Eva; Anderson, Denise; Baron, Jeffrey; Beekman, Marian; Demirkan, Ayse; Ehret, Georg B.; Feenstra, Bjarke; Feitosa, Mary F.; Fischer, Krista; Fraser, Ross M.; Goel, Anuj; Gong, Jian; Justice, Anne E.; Kanoni, Stavroula; Kleber, Marcus E.; Kristiansson, Kati; Lim, Unhee; Lotay, Vaneet; Lui, Julian C.; Mangino, Massimo; Mateo Leach, Irene; Medina-Gomez, Carolina; Nalls, Michael A.; Nyholt, Dale R.; Palmer, Cameron D.; Pasko, Dorota; Pechlivanis, Sonali; Prokopenko, Inga; Ried, Janina S.; Ripke, Stephan; Shungin, Dmitry; Stancáková, Alena; Strawbridge, Rona J.; Sung, Yun Ju; Tanaka, Toshiko; Teumer, Alexander; Trompet, Stella; van der Laan, Sander W.; van Setten, Jessica; van Vliet-Ostaptchouk, Jana V.; Wang, Zhaoming; Yengo, Loïc; Zhang, Weihua; Afzal, Uzma; Arnlöv, Johan; Arscott, Gillian M.; Bandinelli, Stefania; Barrett, Amy; Bellis, Claire; Bennett, Amanda J.; Berne, Christian; Blüher, Matthias; Bolton, Jennifer L.; Böttcher, Yvonne; Boyd, Heather A.; Bruinenberg, Marcel; Buckley, Brendan M.; Buyske, Steven; Caspersen, Ida H.; Chines, Peter S.; Clarke, Robert; Claudi-Boehm, Simone; Cooper, Matthew; Daw, E. Warwick; de Jong, Pim A.; Deelen, Joris; Delgado, Graciela; Denny, Josh C.; Dhonukshe-Rutten, Rosalie; Dimitriou, Maria; Doney, Alex S. F.; Dörr, Marcus; Eklund, Niina; Eury, Elodie; Folkersen, Lasse; Garcia, Melissa E.; Geller, Frank; Giedraitis, Vilmantas; Go, Alan S.; Grallert, Harald; Grammer, Tanja B.; Gräßler, Jürgen; Grönberg, Henrik; de Groot, Lisette C. P. G. M.; Groves, Christopher J.; Haessler, Jeffrey; Hall, Per; Haller, Toomas; Hallmans, Goran; Hannemann, Anke; Hartman, Catharina A.; Hassinen, Maija; Hayward, Caroline; Heard-Costa, Nancy L.; Helmer, Quinta; Hemani, Gibran; Henders, Anjali K.; Hillege, Hans L.; Hlatky, Mark A.; Hoffmann, Wolfgang; Hoffmann, Per; Holmen, Oddgeir; Houwing-Duistermaat, Jeanine J.; Illig, Thomas; Isaacs, Aaron; James, Alan L.; Jeff, Janina; Johansen, Berit; Johansson, Åsa; Jolley, Jennifer; Juliusdottir, Thorhildur; Junttila, Juhani; Kho, Abel N.; Kinnunen, Leena; Klopp, Norman; Kocher, Thomas; Kratzer, Wolfgang; Lichtner, Peter; Lind, Lars; Lindström, Jaana; Lobbens, Stéphane; Lorentzon, Mattias; Lu, Yingchang; Lyssenko, Valeriya; Magnusson, Patrik K. E.; Mahajan, Anubha; Maillard, Marc; McArdle, Wendy L.; McKenzie, Colin A.; McLachlan, Stela; McLaren, Paul J.; Menni, Cristina; Merger, Sigrun; Milani, Lili; Moayyeri, Alireza; Monda, Keri L.; Morken, Mario A.; Müller, Gabriele; Müller-Nurasyid, Martina; Musk, Arthur W.; Narisu, Narisu; Nauck, Matthias; Nolte, Ilja M.; Nöthen, Markus M.; Oozageer, Laticia; Pilz, Stefan; Rayner, Nigel W.; Renstrom, Frida; Robertson, Neil R.; Rose, Lynda M.; Roussel, Ronan; Sanna, Serena; Scharnagl, Hubert; Scholtens, Salome; Schumacher, Fredrick R.; Schunkert, Heribert; Scott, Robert A.; Sehmi, Joban; Seufferlein, Thomas; Shi, Jianxin; Silventoinen, Karri; Smit, Johannes H.; Smith, Albert Vernon; Smolonska, Joanna; Stanton, Alice V.; Stirrups, Kathleen; Stott, David J.; Stringham, Heather M.; Sundström, Johan; Swertz, Morris A.; Syvänen, Ann-Christine; Tayo, Bamidele O.; Thorleifsson, Gudmar; Tyrer, Jonathan P.; van Dijk, Suzanne; van Schoor, Natasja M.; van der Velde, Nathalie; van Heemst, Diana; van Oort, Floor V. A.; Vermeulen, Sita H.; Verweij, Niek; Vonk, Judith M.; Waite, Lindsay L.; Waldenberger, Melanie; Wennauer, Roman; Wilkens, Lynne R.; Willenborg, Christina; Wilsgaard, Tom; Wojczynski, Mary K.; Wong, Andrew; Wright, Alan F.; Zhang, Qunyuan; Arveiler, Dominique; Bakker, Stephan J. L.; Beilby, John; Bergman, Richard N.; Bergmann, Sven; Biffar, Reiner; Blangero, John; Boomsma, Dorret I.; Bornstein, Stefan R.; Bovet, Pascal; Brambilla, Paolo; Brown, Morris J.; Campbell, Harry; Caulfield, Mark J.; Chakravarti, Aravinda; Collins, Rory; Collins, Francis S.; Crawford, Dana C.; Cupples, L. Adrienne; Danesh, John; de Faire, Ulf; den Ruijter, Hester M.; Erbel, Raimund; Erdmann, Jeanette; Eriksson, Johan G.; Farrall, Martin; Ferrannini, Ele; Ferrières, Jean; Ford, Ian; Forouhi, Nita G.; Forrester, Terrence; Gansevoort, Ron T.; Gejman, Pablo V.; Gieger, Christian; Golay, Alain; Gottesman, Omri; Gudnason, Vilmundur; Gyllensten, Ulf; Haas, David W.; Hall, Alistair S.; Harris, Tamara B.; Hattersley, Andrew T.; Heath, Andrew C.; Hengstenberg, Christian; Hicks, Andrew A.; Hindorff, Lucia A.; Hingorani, Aroon D.; Hofman, Albert; Hovingh, G. Kees; Humphries, Steve E.; Hunt, Steven C.; Hypponen, Elina; Jacobs, Kevin B.; Jarvelin, Marjo-Riitta; Jousilahti, Pekka; Jula, Antti M.; Kaprio, Jaakko; Kastelein, John J. P.; Kayser, Manfred; Kee, Frank; Keinanen-Kiukaanniemi, Sirkka M.; Kiemeney, Lambertus A.; Kooner, Jaspal S.; Kooperberg, Charles; Koskinen, Seppo; Kovacs, Peter; Kraja, Aldi T.; Kumari, Meena; Kuusisto, Johanna; Lakka, Timo A.; Langenberg, Claudia; Le Marchand, Loic; Lehtimäki, Terho; Lupoli, Sara; Madden, Pamela A. F.; Männistö, Satu; Manunta, Paolo; Marette, André; Matise, Tara C.; McKnight, Barbara; Meitinger, Thomas; Moll, Frans L.; Montgomery, Grant W.; Morris, Andrew D.; Morris, Andrew P.; Murray, Jeffrey C.; Nelis, Mari; Ohlsson, Claes; Oldehinkel, Albertine J.; Ong, Ken K.; Ouwehand, Willem H.; Pasterkamp, Gerard; Peters, Annette; Pramstaller, Peter P.; Price, Jackie F.; Qi, Lu; Raitakari, Olli T.; Rankinen, Tuomo; Rao, D. C.; Rice, Treva K.; Ritchie, Marylyn; Rudan, Igor; Salomaa, Veikko; Samani, Nilesh J.; Saramies, Jouko; Sarzynski, Mark A.; Schwarz, Peter E. H.; Sebert, Sylvain; Sever, Peter; Shuldiner, Alan R.; Sinisalo, Juha; Steinthorsdottir, Valgerdur; Stolk, Ronald P.; Tardif, Jean-Claude; Tönjes, Anke; Tremblay, Angelo; Tremoli, Elena; Virtamo, Jarmo; Vohl, Marie-Claude; Amouyel, Philippe; Asselbergs, Folkert W.; Assimes, Themistocles L.; Bochud, Murielle; Boehm, Bernhard O.; Boerwinkle, Eric; Bottinger, Erwin P.; Bouchard, Claude; Cauchi, Stéphane; Chambers, John C.; Chanock, Stephen J.; Cooper, Richard S.; de Bakker, Paul I. W.; Dedoussis, George; Ferrucci, Luigi; Franks, Paul W.; Froguel, Philippe; Groop, Leif C.; Haiman, Christopher A.; Hamsten, Anders; Hayes, M. Geoffrey; Hui, Jennie; Hunter, David J.; Hveem, Kristian; Jukema, J. Wouter; Kaplan, Robert C.; Kivimaki, Mika; Kuh, Diana; Laakso, Markku; Liu, Yongmei; Martin, Nicholas G.; März, Winfried; Melbye, Mads; Moebus, Susanne; Munroe, Patricia B.; Njølstad, Inger; Oostra, Ben A.; Palmer, Colin N. A.; Pedersen, Nancy L.; Perola, Markus; Pérusse, Louis; Peters, Ulrike; Powell, Joseph E.; Power, Chris; Quertermous, Thomas; Rauramaa, Rainer; Reinmaa, Eva; Ridker, Paul M.; Rivadeneira, Fernando; Rotter, Jerome I.; Saaristo, Timo E.; Saleheen, Danish; Schlessinger, David; Slagboom, P. Eline; Snieder, Harold; Spector, Tim D.; Strauch, Konstantin; Stumvoll, Michael; Tuomilehto, Jaakko; Uusitupa, Matti; van der Harst, Pim; Völzke, Henry; Walker, Mark; Wareham, Nicholas J.; Watkins, Hugh; Wichmann, H.-Erich; Wilson, James F.; Zanen, Pieter; Deloukas, Panos; Heid, Iris M.; Lindgren, Cecilia M.; Mohlke, Karen L.; Speliotes, Elizabeth K.; Thorsteinsdottir, Unnur; Barroso, Inês; Fox, Caroline S.; North, Kari E.; Strachan, David P.; Beckmann, Jacques S.; Berndt, Sonja I.; Boehnke, Michael; Borecki, Ingrid B.; McCarthy, Mark I.; Metspalu, Andres; Stefansson, Kari; Uitterlinden, André G.; van Duijn, Cornelia M.; Franke, Lude; Willer, Cristen J.; Price, Alkes L.; Lettre, Guillaume; Loos, Ruth J. F.; Weedon, Michael N.; Ingelsson, Erik; O'Connell, Jeffrey R.; Abecasis, Goncalo R.; Chasman, Daniel I.; Goddard, Michael E.; Visscher, Peter M.; Hirschhorn, Joel N.; Frayling, Timothy M.; McCarty, Catherine A.; Starren, Justin; Peissig, Peggy; Berg, Richard; Rasmussen, Luke; Linneman, James; Miller, Aaron; Choudary, Vidhu; Chen, Lin; Waudby, Carol; Kitchner, Terrie; Reeser, Jonathan; Fost, Norman; Wilke, Russell A.; Chisholm, Rex L.; Avila, Pedro C.; Greenland, Philip; Hayes, M. Geoff; Kho, Abel; Kibbe, Warren A.; Lemke, Amy A.; Lowe, William L.; Smith, Maureen E.; Wolf, Wendy A.; Pacheco, Jennifer A.; Thompson, William K.; Humowiecki, Joel; Law, May; Chute, Christopher; Kullo, Iftikar; Koenig, Barbara; de Andrade, Mariza; Bielinski, Suzette; Pathak, Jyotishman; Savova, Guergana; Wu, Joel; Henriksen, Joan; Ding, Keyue; Hart, Lacey; Palbicki, Jeremy; Larson, Eric B.; Newton, Katherine; Ludman, Evette; Spangler, Leslie; Hart, Gene; Carrell, David; Jarvik, Gail; Crane, Paul; Burke, Wylie; Fullerton, Stephanie Malia; Trinidad, Susan Brown; Carlson, Chris; Hutchinson, Fred; McDavid, Andrew; Roden, Dan M.; Clayton, Ellen; Haines, Jonathan L.; Masys, Daniel R.; Churchill, Larry R.; Cornfield, Daniel; Crawford, Dana; Darbar, Dawood; Denny, Joshua C.; Malin, Bradley A.; Ritchie, Marylyn D.; Schildcrout, Jonathan S.; Xu, Hua; Ramirez, Andrea Havens; Basford, Melissa; Pulley, Jill; Alizadeh, Behrooz Z.; de Boer, Rudolf A.; Boezen, H. Marike; van der Klauw, Melanie M.; Navis, Gerjan; Ormel, Johan; Postma, Dirkje S.; Rosmalen, Judith G. M.; Slaets, Joris P.; Wolffenbuttel, Bruce H. R.; Wijmenga, Cisca; Kathiresan, Sekar; Voight, Benjamin F.; Purcell, Shaun; Musunuru, Kiran; Ardissino, Diego; Mannucci, Pier M.; Anand, Sonia; Engert, James C.; Reilly, Muredach P.; Rader, Daniel J.; Morgan, Thomas; Spertus, John A.; Stoll, Monika; Girelli, Domenico; McKeown, Pascal P.; Patterson, Chris C.; Siscovick, David S.; O'Donnell, Christopher J.; Elosua, Roberto; Peltonen, Leena; Schwartz, Stephen M.; Melander, Olle; Altshuler, David; Merlini, Pier Angelica; Berzuini, Carlo; Bernardinelli, Luisa; Peyvandi, Flora; Tubaro, Marco; Celli, Patrizia; Ferrario, Maurizio; Fetiveau, Raffaela; Marziliano, Nicola; Casari, Giorgio; Galli, Michele; Ribichini, Flavio; Rossi, Marco; Bernardi, Francesco; Zonzin, Pietro; Piazza, Alberto; Yee, Jean; Friedlander, Yechiel; Marrugat, Jaume; Lucas, Gavin; Subirana, Isaac; Sala, Joan; Ramos, Rafael; Meigs, James B.; Williams, Gordon; Nathan, David M.; MacRae, Calum A.; Havulinna, Aki S.; Berglund, Goran; Asselta, Rosanna; Duga, Stefano; Spreafico, Marta; Daly, Mark J.; Nemesh, James; Korn, Joshua M.; McCarroll, Steven A.; Surti, Aarti; Guiducci, Candace; Gianniny, Lauren; Mirel, Daniel; Parkin, Melissa; Burtt, Noel; Gabriel, Stacey B.; Thompson, John R.; Braund, Peter S.; Wright, Benjamin J.; Balmforth, Anthony J.; Ball, Stephen G.; Schunkert, I. Heribert; Linsel-Nitschke, Patrick; Lieb, Wolfgang; Ziegler, Andreas; König, Inke R.; Fischer, Marcus; Stark, Klaus; Grosshennig, Anika; Preuss, Michael; Schreiber, Stefan; Ouwehand, Willem; Scholz, Michael; Cambien, Francois; Goodall, Alison; Li, Mingyao; Chen, Zhen; Wilensky, Robert; Matthai, William; Qasim, Atif; Hakonarson, Hakon H.; Devaney, Joe; Burnett, Mary-Susan; Pichard, Augusto D.; Kent, Kenneth M.; Satler, Lowell; Lindsay, Joseph M.; Waksman, Ron; Knouff, Christopher W.; Waterworth, Dawn M.; Walker, Max C.; Mooser, Vincent; Epstein, Stephen E.; Scheffold, Thomas; Berger, Klaus; Huge, Andreas; Martinelli, Nicola; Olivieri, Oliviero; Corrocher, Roberto; Hólm, Hilma; Do, Ron; Xie, Changchun; Siscovick, David; Matise, Tara; Buyske, Steve; Higashio, Julia; Williams, Rasheeda; Nato, Andrew; Ambite, Jose Luis; Deelman, Ewa; Manolio, Teri; Hindorff, Lucia; Heiss, Gerardo; Taylor, Kira; Franceschini, Nora; Avery, Christy; Graff, Misa; Lin, Danyu; Quibrera, Miguel; Cochran, Barbara; Kao, Linda; Umans, Jason; Cole, Shelley; MacCluer, Jean; Person, Sharina; Pankow, James; Gross, Myron; Fornage, Myriam; Durda, Peter; Jenny, Nancy; Patsy, Bruce; Arnold, Alice; Buzkova, Petra; Haines, Jonathan; Murdock, Deborah; Glenn, Kim; Brown-Gentry, Kristin; Thornton-Wells, Tricia; Dumitrescu, Logan; Bush, William S.; Mitchell, Sabrina L.; Goodloe, Robert; Wilson, Sarah; Boston, Jonathan; Malinowski, Jennifer; Restrepo, Nicole; Oetjens, Matthew; Fowke, Jay; Zheng, Wei; Spencer, Kylee; Pendergrass, Sarah; Le Marchand, Loïc; Wilkens, Lynne; Park, Lani; Tiirikainen, Maarit; Kolonel, Laurence; Cheng, Iona; Wang, Hansong; Shohet, Ralph; Haiman, Christopher; Stram, Daniel; Henderson, Brian; Monroe, Kristine; Schumacher, Fredrick; Anderson, Garnet; Prentice, Ross; LaCroix, Andrea; Wu, Chunyuan; Carty, Cara; Rosse, Stephanie; Young, Alicia; Haessler, Jeff; Kocarnik, Jonathan; Lin, Yi; Jackson, Rebecca; Duggan, David; Kuller, Lew
Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ∼2,000, ∼3,700
Rangkuti, Farania Gama Ardhina
Pathogens lie behind the deadliest pandemics in history. To date, AIDS pandemic has resulted in more than 25 million fatal cases, while tuberculosis and malaria annually claim more than 2 million lives. Comparative genomic analyses are needed to gain insights into the molecular mechanisms of pathogens, but the abundance of biological data dictates that such studies cannot be performed without the assistance of computational approaches. This explains the significant need for computational pipelines for genome assembly and analyses. The aim of this research is to develop such pipelines. This work utilizes various bioinformatics approaches to analyze the high-throughput genomic sequence data that has been obtained from several strains of bacterial pathogens. A pipeline has been compiled for quality control for sequencing and assembly, and several protocols have been developed to detect contaminations. Visualization has been generated of genomic data in various formats, in addition to alignment, homology detection and sequence variant detection. We have also implemented a metaheuristic algorithm that significantly improves bacterial genome assemblies compared to other known methods. Experiments on Mycobacterium tuberculosis H37Rv data showed that our method resulted in improvement of N50 value of up to 9697% while consistently maintaining high accuracy, covering around 98% of the published reference genome. Other improvement efforts were also implemented, consisting of iterative local assemblies and iterative correction of contiguated bases. Our result expedites the genomic analysis of virulent genes up to single base pair resolution. It is also applicable to virtually every pathogenic microorganism, propelling further research in the control of and protection from pathogen-associated diseases.
Analysis of Price Variation and Market Integration of Prosopis Africana (guill. ... select five markets based on the presence of traders selling the commodity in the markets ... T- test result showed that Prosopis africana seed trade is profitable and ...
Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.
Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.
Genetic variants associated with traits such as age at puberty and litter size could provide insight into the underlying genetic sources of variation impacting sow reproductive longevity and productivity. Genomewide characterization and gene expression profiling were used using gilts from the Univer...
Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David
Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.
Kim, Yong-Min; Kim, Seungill; Koo, Namjin; Shin, Ah-Young; Yeom, Seon-In; Seo, Eunyoung; Park, Seong-Jin; Kang, Won-Hee; Kim, Myung-Shin; Park, Jieun; Jang, Insu; Kim, Pan-Gyu; Byeon, Iksu; Kim, Min-Seo; Choi, JinHyuk; Ko, Gunhwan; Hwang, JiHye; Yang, Tae-Jin; Choi, Sang-Bong; Lee, Je Min; Lim, Ki-Byung; Lee, Jungho; Choi, Ik-Young; Park, Beom-Seok; Kwon, Suk-Yoon; Choi, Doil; Kim, Ryan W
Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Voigt, Anja; Schöfl, Gerhard; Saluz, Hans Peter
Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis. A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins. This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.
Full Text Available Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis.A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins.This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.
Zhang, Fantao; Xu, Tao; Mao, Linyong; Yan, Shuangyong; Chen, Xiwen; Wu, Zhenfeng; Chen, Rui; Luo, Xiangdong; Xie, Jiankun; Gao, Shan
It is widely accepted that cultivated rice (Oryza sativa L.) was domesticated from common wild rice (Oryza rufipogon Griff.). Compared to other studies which concentrate on rice origin, this study is to genetically elucidate the substantially phenotypic and physiological changes from wild rice to cultivated rice at the whole genome level. Instead of comparing two assembled genomes, this study directly compared the Dongxiang wild rice (DXWR) Illumina sequencing reads with the Nipponbare (O. sativa) complete genome without assembly of the DXWR genome. Based on the results from the comparative genomics analysis, structural variations (SVs) between DXWR and Nipponbare were determined to locate deleted genes which could have been acquired by Nipponbare during rice domestication. To overcome the limit of the SV detection, the DXWR transcriptome was also sequenced and compared with the Nipponbare transcriptome to discover the genes which could have been lost in DXWR during domestication. Both 1591 Nipponbare-acquired genes and 206 DXWR-lost transcripts were further analyzed using annotations from multiple sources. The NGS data are available in the NCBI SRA database with ID SRP070627. These results help better understanding the domestication from wild rice to cultivated rice at the whole genome level and provide a genomic data resource for rice genetic research or breeding. One finding confirmed transposable elements contribute greatly to the genome evolution from wild rice to cultivated rice. Another finding suggested the photophosphorylation and oxidative phosphorylation system in cultivated rice could have adapted to environmental changes simultaneously during domestication.
Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette
One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.
Feltus Frank A
Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of
Xu, Kelin; Jin, Li; Xiong, Momiao
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction
Full Text Available The relationships between the levels of transcripts and the levels of the proteins they encode have not been examined comprehensively in mammals, although previous work in plants and yeast suggest a surprisingly modest correlation. We have examined this issue using a genetic approach in which natural variations were used to perturb both transcript levels and protein levels among inbred strains of mice. We quantified over 5,000 peptides and over 22,000 transcripts in livers of 97 inbred and recombinant inbred strains and focused on the 7,185 most heritable transcripts and 486 most reliable proteins. The transcript levels were quantified by microarray analysis in three replicates and the proteins were quantified by Liquid Chromatography-Mass Spectrometry using O(18-reference-based isotope labeling approach. We show that the levels of transcripts and proteins correlate significantly for only about half of the genes tested, with an average correlation of 0.27, and the correlations of transcripts and proteins varied depending on the cellular location and biological function of the gene. We examined technical and biological factors that could contribute to the modest correlation. For example, differential splicing clearly affects the analyses for certain genes; but, based on deep sequencing, this does not substantially contribute to the overall estimate of the correlation. We also employed genome-wide association analyses to map loci controlling both transcript and protein levels. Surprisingly, little overlap was observed between the protein- and transcript-mapped loci. We have typed numerous clinically relevant traits among the strains, including adiposity, lipoprotein levels, and tissue parameters. Using correlation analysis, we found that a low number of clinical trait relationships are preserved between the protein and mRNA gene products and that the majority of such relationships are specific to either the protein levels or transcript levels
Pritykin, Yuri; Ghersi, Dario; Singh, Mona
Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655
Liu, Stephen V; Lenkiewicz, Elizabeth; Evers, Lisa; Holley, Tara; Kiefer, Jeffrey; Demeure, Michael J; Ramanathan, Ramesh K; Von Hoff, Daniel D; Barrett, Michael T; Ruiz, Christian; Glatz, Katharina; Bubendorf, Lukas; Eng, Cathy
It is widely accepted that many cancers arise as a result of an acquired genomic instability and the subsequent evolution of tumor cells with variable patterns of selected and background aberrations. The presence and behaviors of distinct neoplastic cell populations within a patient's tumor may underlie multiple clinical phenotypes in cancers. A goal of many current cancer genome studies is the identification of recurring selected driver events that can be advanced for the development of personalized therapies. Unfortunately, in the majority of rare tumors, this type of analysis can be particularly challenging. Large series of specimens for analysis are simply not available, allowing recurring patterns to remain hidden. In this paper, we highlight the use of DNA content-based flow sorting to identify and isolate DNA-diploid and DNA-aneuploid populations from tumor biopsies as a strategy to comprehensively study the genomic composition and behaviors of individual cancers in a series of rare solid tumors: intrahepatic cholangiocarcinoma, anal carcinoma, adrenal leiomyosarcoma, and pancreatic neuroendocrine tumors. We propose that the identification of highly selected genomic events in distinct tumor populations within each tumor can identify candidate driver events that can facilitate the development of novel, personalized treatment strategies for patients with cancer. (paper)
We design and analyze variational and non-variational multigrid algorithms for the Laplace-Beltrami operator on a smooth and closed surface. In both cases, a uniform convergence for the V -cycle algorithm is obtained provided the surface geometry is captured well enough by the coarsest grid. The main argument hinges on a perturbation analysis from an auxiliary variational algorithm defined directly on the smooth surface. In addition, the vanishing mean value constraint is imposed on each level, thereby avoiding singular quadratic forms without adding additional computational cost. Numerical results supporting our analysis are reported. In particular, the algorithms perform well even when applied to surfaces with a large aspect ratio. © 2011 American Mathematical Society.
Full Text Available Our goal was to test the hypothesis that inter-individual genomic copy number variation in control samples is a confounding factor in the non-invasive prenatal detection of fetal microdeletions via the sequence-based analysis of maternal plasma DNA. The database of genomic variants (DGV was used to determine the "Genomic Variants Frequency" (GVF for each 50kb region in the human genome. Whole genome sequencing of fifteen karyotypically normal maternal plasma and six CVS DNA controls samples was performed. The coefficient of variation of relative read counts (cv.RTC for these samples was determined for each 50kb region. Maternal plasma from two pregnancies affected with a chromosome 5p microdeletion was also sequenced, and analyzed using the GCREM algorithm. We found strong correlation between high variance in read counts and GVF amongst controls. Consequently we were unable to confirm the presence of the microdeletion via sequencing of maternal plasma samples obtained from two sequential affected pregnancies. Caution should be exercised when performing NIPT for microdeletions. It is vital to develop our understanding of the factors that impact the sensitivity and specificity of these approaches. In particular, benign copy number variation amongst controls is a major confounder, and their effects should be corrected bioinformatically.
Margaret J Mackinnon
Full Text Available Mechanisms for differential regulation of gene expression may underlie much of the phenotypic variation and adaptability of malaria parasites. Here we describe transcriptional variation among culture-adapted field isolates of Plasmodium falciparum, the species responsible for most malarial disease. It was found that genes coding for parasite protein export into the red cell cytosol and onto its surface, and genes coding for sexual stage proteins involved in parasite transmission are up-regulated in field isolates compared with long-term laboratory isolates. Much of this variability was associated with the loss of small or large chromosomal segments, or other forms of gene copy number variation that are prevalent in the P. falciparum genome (copy number variants, CNVs. Expression levels of genes inside these segments were correlated to that of genes outside and adjacent to the segment boundaries, and this association declined with distance from the CNV boundary. This observation could not be explained by copy number variation in these adjacent genes. This suggests a local-acting regulatory role for CNVs in transcription of neighboring genes and helps explain the chromosomal clustering that we observed here. Transcriptional co-regulation of physical clusters of adaptive genes may provide a way for the parasite to readily adapt to its highly heterogeneous and strongly selective environment.
Brown, T. A. (Terence A.)
... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...
Louise Bruun Thingholm
Full Text Available The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis. However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured – making the assessment of disease risk against a composite genomic factor possible. The focus of this review is to provide an overview and introduction to the main strategies and to discuss where there is a need for further development.
Chung, Kyong-Sook; Weber, Jaime A; Hipp, Andrew L
High intraspecific cytogenetic variation in the sedge genus Carex (Cyperaceae) is hypothesized to be due to the "diffuse" or non-localized centromeres, which facilitate chromosome fission and fusion. If chromosome number changes are dominated by fission and fusion, then chromosome evolution will result primarily in changes in the potential for recombination among populations. Chromosome duplications, on the other hand, entail consequent opportunities for divergent evolution of paralogs. In this study, we evaluate whether genome size and chromosome number covary within species. We used flow cytometry to estimate genome sizes in Carex scoparia var. scoparia, sampling 99 plants (23 populations) in the Chicago region, and we used meiotic chromosome observations to document chromosome numbers and chromosome pairing relations. Chromosome numbers range from 2n = 62 to 2n = 68, and nuclear DNA 1C content from 0.342 to 0.361 pg DNA. Regressions of DNA content on chromosome number are nonsignificant for data analyzed by individual or population, and a regression model that excludes slope is favored over a model in which chromosome number predicts genome size. Chromosome rearrangements within cytogenetically variable Carex species are more likely a consequence of fission and fusion than of duplication and deletion. Moreover, neither genome size nor chromosome number is spatially autocorrelated, which suggests the potential for rapid chromosome evolution by fission and fusion at a relatively fine geographic scale (<350 km). These findings have important implications for ecological restoration and speciation within the largest angiosperm genus of the temperate zone.
Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor
Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.
Zhao Patrick X
Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.
Full Text Available The vertebrate retina is comprised of seven major cell types that are generated in overlapping but well-defined intervals. To identify genes that might regulate retinal development, gene expression in the developing retina was profiled at multiple time points using serial analysis of gene expression (SAGE. The expression patterns of 1,051 genes that showed developmentally dynamic expression by SAGE were investigated using in situ hybridization. A molecular atlas of gene expression in the developing and mature retina was thereby constructed, along with a taxonomic classification of developmental gene expression patterns. Genes were identified that label both temporal and spatial subsets of mitotic progenitor cells. For each developing and mature major retinal cell type, genes selectively expressed in that cell type were identified. The gene expression profiles of retinal Müller glia and mitotic progenitor cells were found to be highly similar, suggesting that Müller glia might serve to produce multiple retinal cell types under the right conditions. In addition, multiple transcripts that were evolutionarily conserved that did not appear to encode open reading frames of more than 100 amino acids in length ("noncoding RNAs" were found to be dynamically and specifically expressed in developing and mature retinal cell types. Finally, many photoreceptor-enriched genes that mapped to chromosomal intervals containing retinal disease genes were identified. These data serve as a starting point for functional investigations of the roles of these genes in retinal development and physiology.
Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.
Full Text Available Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.
Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W
A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.
Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.
Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.
Full Text Available Abstract Background Macrocyclic lactone (ML anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE. Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. Methods In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST, which allow the genetic structure of each population (isolate to be compared at the level of single nucleotide polymorphisms (SNP across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. Results The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously
Full Text Available LPT123-TC171 is a salt-tolerant (ST and drought-tolerant (DT rice line that was selected from somaclonal variation of the original Leuang Pratew 123 (LPT123 rice cultivar. The objective of this study was to identify the changes in the rice genome that possibly lead to ST and/or DT characteristics. The genomes of LPT123 and LPT123-TC171 were comparatively studied at the four levels of whole chromosomes (chromosome structure including telomeres, transposable elements, and DNA sequence changes by using next-generation sequencing analysis. Compared with LPT123, the LPT123-TC171 line displayed no changes in the ploidy level, but had a significant deficiency of chromosome ends (telomeres. The functional genome analysis revealed new aspects of the genome response to the in vitro cultivation condition, where exome sequencing revealed the molecular spectrum and pattern of changes in the somaclonal variant compared with the parental LPT123 cultivar. Mutation detection was performed, and the degree of mutations was evaluated to estimate the impact of mutagenesis on the protein functions. Mutations within the known genes responding to both drought and salt stress were detected in 493 positions, while mutations within the genes responding to only salt stress were found in 100 positions. The possible functions of the mutated genes contributing to salt or drought tolerance were discussed. It was concluded that the ST and DT characteristics in the somaclonal variegated line resulted from the base changes in the salt- and drought-responsive genes rather than the changes in chromosome structure or the large duplication or deletion in the specific region of the genome.
Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to
Huckins, L M; Hatzikotoulas, K; Southam, L; Thornton, L M; Steinberg, J; Aguilera-McKay, F; Treasure, J; Schmidt, U; Gunasinghe, C; Romero, A; Curtis, C; Rhodes, D; Moens, J; Kalsi, G; Dempster, D; Leung, R; Keohane, A; Burghardt, R; Ehrlich, S; Hebebrand, J; Hinney, A; Ludolph, A; Walton, E; Deloukas, P; Hofman, A; Palotie, A; Palta, P; van Rooij, F J A; Stirrups, K; Adan, R; Boni, C; Cone, R; Dedoussis, G; van Furth, E; Gonidakis, F; Gorwood, P; Hudson, J; Kaprio, J; Kas, M; Keski-Rahonen, A; Kiezebrink, K; Knudsen, G-P; Slof-Op 't Landt, M C T; Maj, M; Monteleone, A M; Monteleone, P; Raevuori, A H; Reichborn-Kjennerud, T; Tozzi, F; Tsitsika, A; van Elburg, A; Adan, R A H; Alfredsson, L; Ando, T; Andreassen, O A; Aschauer, H; Baker, J H; Barrett, J C; Bencko, V; Bergen, A W; Berrettini, W H; Birgegard, A; Boni, C; Boraska Perica, V; Brandt, H; Breen, G; Bulik, C M; Carlberg, L; Cassina, M; Cichon, S; Clementi, M; Cohen-Woods, S; Coleman, J; Cone, R D; Courtet, P; Crawford, S; Crow, S; Crowley, J; Danner, U N; Davis, O S P; de Zwaan, M; Dedoussis, G; Degortes, D; DeSocio, J E; Dick, D M; Dikeos, D; Dina, C; Ding, B; Dmitrzak-Weglarz, M; Docampo, E; Duncan, L; Egberts, K; Ehrlich, S; Escaramís, G; Esko, T; Espeseth, T; Estivill, X; Favaro, A; Fernández-Aranda, F; Fichter, M M; Finan, C; Fischer, K; Floyd, J A B; Foretova, L; Forzan, M; Franklin, C S; Gallinger, S; Gambaro, G; Gaspar, H A; Giegling, I; Gonidakis, F; Gorwood, P; Gratacos, M; Guillaume, S; Guo, Y; Hakonarson, H; Halmi, K A; Hatzikotoulas, K; Hauser, J; Hebebrand, J; Helder, S; Herms, S; Herpertz-Dahlmann, B; Herzog, W; Hilliard, C E; Hinney, A; Hübel, C; Huckins, L M; Hudson, J I; Huemer, J; Inoko, H; Janout, V; Jiménez-Murcia, S; Johnson, C; Julià, A; Juréus, A; Kalsi, G; Kaminska, D; Kaplan, A S; Kaprio, J; Karhunen, L; Karwautz, A; Kas, M J H; Kaye, W; Kennedy, J L; Keski-Rahkonen, A; Kiezebrink, K; Klareskog, L; Klump, K L; Knudsen, G P S; Koeleman, B P C; Koubek, D; La Via, M C; Landén, M; Le Hellard, S; Levitan, R D; Li, D; Lichtenstein, P; Lilenfeld, L; Lissowska, J; Lundervold, A; Magistretti, P; Maj, M; Mannik, K; Marsal, S; Martin, N; Mattingsdal, M; McDevitt, S; McGuffin, P; Merl, E; Metspalu, A; Meulenbelt, I; Micali, N; Mitchell, J; Mitchell, K; Monteleone, P; Monteleone, A M; Mortensen, P; Munn-Chernoff, M A; Navratilova, M; Nilsson, I; Norring, C; Ntalla, I; Ophoff, R A; O'Toole, J K; Palotie, A; Pante, J; Papezova, H; Pinto, D; Rabionet, R; Raevuori, A; Rajewski, A; Ramoz, N; Rayner, N W; Reichborn-Kjennerud, T; Ripatti, S; Roberts, M; Rotondo, A; Rujescu, D; Rybakowski, F; Santonastaso, P; Scherag, A; Scherer, S W; Schmidt, U; Schork, N J; Schosser, A; Slachtova, L; Sladek, R; Slagboom, P E; Slof-Op 't Landt, M C T; Slopien, A; Soranzo, N; Southam, L; Steen, V M; Strengman, E; Strober, M; Sullivan, P F; Szatkiewicz, J P; Szeszenia-Dabrowska, N; Tachmazidou, I; Tenconi, E; Thornton, L M; Tortorella, A; Tozzi, F; Treasure, J; Tsitsika, A; Tziouvas, K; van Elburg, A A; van Furth, E F; Wagner, G; Walton, E; Watson, H; Wichmann, H-E; Widen, E; Woodside, D B; Yanovski, J; Yao, S; Yilmaz, Z; Zeggini, E; Zerwas, S; Zipfel, S; Collier, D A; Sullivan, P F; Breen, G; Bulik, C M; Zeggini, E
Anorexia nervosa (AN) is a complex neuropsychiatric disorder presenting with dangerously low body weight, and a deep and persistent fear of gaining weight. To date, only one genome-wide significant locus associated with AN has been identified. We performed an exome-chip based genome-wide association studies (GWAS) in 2158 cases from nine populations of European origin and 15 485 ancestrally matched controls. Unlike previous studies, this GWAS also probed association in low-frequency and rare variants. Sixteen independent variants were taken forward for in silico and de novo replication (11 common and 5 rare). No findings reached genome-wide significance. Two notable common variants were identified: rs10791286, an intronic variant in OPCML (P=9.89 × 10−6), and rs7700147, an intergenic variant (P=2.93 × 10−5). No low-frequency variant associations were identified at genome-wide significance, although the study was well-powered to detect low-frequency variants with large effect sizes, suggesting that there may be no AN loci in this genomic search space with large effect sizes. PMID:29155802
Background The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome. Results Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. Conclusion The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery