Hall, Barry G
SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP , a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ² probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy.
Sham P C
Full Text Available Abstract Background Large scale genome-wide association studies have become popular since the introduction of high throughput genotyping platforms. Efficient management of the vast array of data generated poses many challenges. Description We have developed an open source web-based data management system for the large amount of genotype data generated from the Affymetrix GeneChip® Mapping Array and Affymetrix Genome-Wide Human SNP Array platforms. The database supports genotype calling using DM, BRLMM, BRLMM-P or Birdseed algorithms provided by the Affymetrix Power Tools. The genotype and corresponding pedigree data are stored in a relational database for efficient downstream data manipulation and analysis, such as calculation of allele and genotype frequencies, sample identity checking, and export of genotype data in various file formats for analysis using commonly-available software. A novel method for genotyping error estimation is implemented using linkage disequilibrium information from the HapMap project. All functionalities are accessible via a web-based user interface. Conclusion OpenADAM provides an open source database system for management of Affymetrix genome-wide association SNP data.
Qiu, Ping; Wang, Luquan; Kostich, Mitch; Ding, Wei; Simon, Jason S; Greene, Jonathan R
Carcinogenesis occurs, at least in part, due to the accumulation of mutations in critical genes that control the mechanisms of cell proliferation, differentiation and death. Publicly accessible databases contain millions of expressed sequence tag (EST) and single nucleotide polymorphism (SNP) records, which have the potential to assist in the identification of SNPs overrepresented in tumor tissue. An in silico SNP-tumor association study was performed utilizing tissue library and SNP information available in NCBI's dbEST (release 092002) and dbSNP (build 106). A total of 4865 SNPs were identified which were present at higher allele frequencies in tumor compared to normal tissues. A subset of 327 (6.7%) SNPs induce amino acid changes to the protein coding sequences. This approach identified several SNPs which have been previously associated with carcinogenesis, as well as a number of SNPs that now warrant further investigation This novel in silico approach can assist in prioritization of genes and SNPs in the effort to elucidate the genetic mechanisms underlying the development of cancer
Full Text Available As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of 'Golden Delicious', SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional, and genomic selection in apple.
Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron
As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718
Gardner, Shea N; Hall, Barry G
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.
Full Text Available Abstract Background The typical objective of Genome-wide association (GWA studies is to identify single-nucleotide polymorphisms (SNPs and corresponding genes with the strongest evidence of association (the 'most-significant SNPs/genes' approach. Borrowing ideas from micro-array data analysis, we propose a new method, named RS-SNP, for detecting sets of genes enriched in SNPs moderately associated to the phenotype. RS-SNP assesses whether the number of significant SNPs, with p-value P ≤ α, belonging to a given SNP set is statistically significant. The rationale of proposed method is that two kinds of null hypotheses are taken into account simultaneously. In the first null model the genotype and the phenotype are assumed to be independent random variables and the null distribution is the probability of the number of significant SNPs in greater than observed by chance. The second null model assumes the number of significant SNPs in depends on the size of and not on the identity of the SNPs in . Statistical significance is assessed using non-parametric permutation tests. Results We applied RS-SNP to the Crohn's disease (CD data set collected by the Wellcome Trust Case Control Consortium (WTCCC and compared the results with GENGEN, an approach recently proposed in literature. The enrichment analysis using RS-SNP and the set of pathways contained in the MSigDB C2 CP pathway collection highlighted 86 pathways rich in SNPs weakly associated to CD. Of these, 47 were also indicated to be significant by GENGEN. Similar results were obtained using the MSigDB C5 pathway collection. Many of the pathways found to be enriched by RS-SNP have a well-known connection to CD and often with inflammatory diseases. Conclusions The proposed method is a valuable alternative to other techniques for enrichment analysis of SNP sets. It is well founded from a theoretical and statistical perspective. Moreover, the experimental comparison with GENGEN highlights that it is
Full Text Available Low-density (LD single nucleotide polymorphism (SNP arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD or high-density (HD SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE or haplotype-averaged Shannon entropy (HASE and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus
Full Text Available Whole-genome single-nucleotide polymorphism (SNP markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA and docosahexaenoic acid (DHA content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms.
Schaub, M. A.; Boyle, A. P.; Kundaje, A.; Batzoglou, S.; Snyder, M.
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
Schaub, M. A.
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
Full Text Available The identification of statistical SNP-SNP interactions may help explain the genetic etiology of many human diseases, but exhaustive genome-wide searches for these interactions have been difficult, due to a lack of power in most datasets. We aimed to use data from the Resource for Genetic Epidemiology Research on Adult Health and Aging (GERA study to search for SNP-SNP interactions associated with 10 common diseases. FastEpistasis and BOOST were used to evaluate all pairwise interactions among approximately N = 300,000 single nucleotide polymorphisms (SNPs with minor allele frequency (MAF ≥ 0.15, for the dichotomous outcomes of allergic rhinitis, asthma, cardiac disease, depression, dermatophytosis, type 2 diabetes, dyslipidemia, hemorrhoids, hypertensive disease, and osteoarthritis. A total of N = 45,171 subjects were included after quality control steps were applied. These data were divided into discovery and replication subsets; the discovery subset had > 80% power, under selected models, to detect genome-wide significant interactions (P < 10−12. Interactions were also evaluated for enrichment in particular SNP features, including functionality, prior disease relevancy, and marginal effects. No interaction in any disease was significant in both the discovery and replication subsets. Enrichment analysis suggested that, for some outcomes, interactions involving SNPs with marginal effects were more likely to be nominally replicated, compared to interactions without marginal effects. If SNP-SNP interactions play a role in the etiology of the studied conditions, they likely have weak effect sizes, involve lower-frequency variants, and/or involve complex models of interaction that are not captured well by the methods that were utilized.
Sulovari, Arvis; Li, Dawei
Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep
Hall, Michael J; Ruth, Karen J; Chen, David Yt; Gross, Laura M; Giri, Veda N
Advancements in genomic testing have led to the identification of single nucleotide polymorphisms (SNPs) associated with prostate cancer. The clinical utility of SNP tests to evaluate prostate cancer risk is unclear. Studies have not examined predictors of interest in novel genomic SNP tests for prostate cancer risk in a diverse population. Consecutive participants in the Fox Chase Prostate Cancer Risk Assessment Program (PRAP) (n = 40) and unselected men from surgical urology clinics (n = 40) completed a one-time survey. Items examined interest in genomic SNP testing for prostate cancer risk, knowledge, impact of unsolicited findings, and psychosocial factors including health literacy. Knowledge of genomic SNP tests was low in both groups, but interest was higher among PRAP men (p testing in both groups. Multivariable modeling identified several predictors of higher interest in a genomic SNP test including higher perceived risk (p = 0.025), indicating zero reasons for not wanting testing (vs ≥1 reason) (p = 0.013), and higher health literacy (p = 0.016). Knowledge of genomic SNP testing was low in this sample, but higher among high-risk men. High-risk status may increase interest in novel genomic tests, while low literacy may lessen interest.
Ariel M Pani
Full Text Available Intellectual disability (ID affects 2-3% of the population and may occur with or without multiple congenital anomalies (MCA or other medical conditions. Established genetic syndromes and visible chromosome abnormalities account for a substantial percentage of ID diagnoses, although for approximately 50% the molecular etiology is unknown. Individuals with features suggestive of various syndromes but lacking their associated genetic anomalies pose a formidable clinical challenge. With the advent of microarray techniques, submicroscopic genome alterations not associated with known syndromes are emerging as a significant cause of ID and MCA.High-density SNP microarrays were used to determine genome wide copy number in 42 individuals: 7 with confirmed alterations in the WS region but atypical clinical phenotypes, 31 with ID and/or MCA, and 4 controls. One individual from the first group had the most telomeric gene in the WS critical region deleted along with 2 Mb of flanking sequence. A second person had the classic WS deletion and a rearrangement on chromosome 5p within the Cri du Chat syndrome (OMIM:123450 region. Six individuals from the ID/MCA group had large rearrangements (3 deletions, 3 duplications, one of whom had a large inversion associated with a deletion that was not detected by the SNP arrays.Combining SNP microarray analyses and qPCR allowed us to clone and sequence 21 deletion breakpoints in individuals with atypical deletions in the WS region and/or ID or MCA. Comparison of these breakpoints to databases of genomic variation revealed that 52% occurred in regions harboring structural variants in the general population. For two probands the genomic alterations were flanked by segmental duplications, which frequently mediate recurrent genome rearrangements; these may represent new genomic disorders. While SNP arrays and related technologies can identify potentially pathogenic deletions and duplications, obtaining sequence information
Kohane Isaac S
Full Text Available Abstract Background Single Nucleotide Polymorphisms (SNPs are an increasingly important tool for genetic and biomedical research. Although current genomic databases contain information on several million SNPs and are growing at a very fast rate, the true value of a SNP in this context is a function of the quality of the annotations that characterize it. Retrieving and analyzing such data for a large number of SNPs often represents a major bottleneck in the design of large-scale association studies. Description SNPper is a web-based application designed to facilitate the retrieval and use of human SNPs for high-throughput research purposes. It provides a rich local database generated by combining SNP data with the Human Genome sequence and with several other data sources, and offers the user a variety of querying, visualization and data export tools. In this paper we describe the structure and organization of the SNPper database, we review the available data export and visualization options, and we describe how the architecture of SNPper and its specialized data structures support high-volume SNP analysis. Conclusions The rich annotation database and the powerful data manipulation and presentation facilities it offers make SNPper a very useful online resource for SNP research. Its success proves the great need for integrated and interoperable resources in the field of computational biology, and shows how such systems may play a critical role in supporting the large-scale computational analysis of our genome.
Dana B Hancock
Full Text Available Genome-wide association studies have identified numerous genetic loci for spirometic measures of pulmonary function, forced expiratory volume in one second (FEV(1, and its ratio to forced vital capacity (FEV(1/FVC. Given that cigarette smoking adversely affects pulmonary function, we conducted genome-wide joint meta-analyses (JMA of single nucleotide polymorphism (SNP and SNP-by-smoking (ever-smoking or pack-years associations on FEV(1 and FEV(1/FVC across 19 studies (total N = 50,047. We identified three novel loci not previously associated with pulmonary function. SNPs in or near DNER (smallest P(JMA = 5.00×10(-11, HLA-DQB1 and HLA-DQA2 (smallest P(JMA = 4.35×10(-9, and KCNJ2 and SOX9 (smallest P(JMA = 1.28×10(-8 were associated with FEV(1/FVC or FEV(1 in meta-analysis models including SNP main effects, smoking main effects, and SNP-by-smoking (ever-smoking or pack-years interaction. The HLA region has been widely implicated for autoimmune and lung phenotypes, unlike the other novel loci, which have not been widely implicated. We evaluated DNER, KCNJ2, and SOX9 and found them to be expressed in human lung tissue. DNER and SOX9 further showed evidence of differential expression in human airway epithelium in smokers compared to non-smokers. Our findings demonstrated that joint testing of SNP and SNP-by-environment interaction identified novel loci associated with complex traits that are missed when considering only the genetic main effects.
Gardner, Shea [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Slezak, Tom [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. We use it to identify thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle as input hundreds of gigabases of sequence in a single run. The algorithm is based on k-mer analysis using a suffix array, so we call it saSNP.
Spencer, Amy V; Cox, Angela; Lin, Wei-Yu; Easton, Douglas F; Michailidou, Kyriaki; Walters, Kevin
There is a large amount of functional genetic data available, which can be used to inform fine-mapping association studies (in diseases with well-characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP-level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine-mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP-specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples. © 2016 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.
Full Text Available Abstract Background Genetic markers are pivotal to modern genomics research; however, discovery and genotyping of molecular markers in oat has been hindered by the size and complexity of the genome, and by a scarcity of sequence data. The purpose of this study was to generate oat expressed sequence tag (EST information, develop a bioinformatics pipeline for SNP discovery, and establish a method for rapid, cost-effective, and straightforward genotyping of SNP markers in complex polyploid genomes such as oat. Results Based on cDNA libraries of four cultivated oat genotypes, approximately 127,000 contigs were assembled from approximately one million Roche 454 sequence reads. Contigs were filtered through a novel bioinformatics pipeline to eliminate ambiguous polymorphism caused by subgenome homology, and 96 in silico SNPs were selected from 9,448 candidate loci for validation using high-resolution melting (HRM analysis. Of these, 52 (54% were polymorphic between parents of the Ogle1040 × TAM O-301 (OT mapping population, with 48 segregating as single Mendelian loci, and 44 being placed on the existing OT linkage map. Ogle and TAM amplicons from 12 primers were sequenced for SNP validation, revealing complex polymorphism in seven amplicons but general sequence conservation within SNP loci. Whole-amplicon interrogation with HRM revealed insertions, deletions, and heterozygotes in secondary oat germplasm pools, generating multiple alleles at some primer targets. To validate marker utility, 36 SNP assays were used to evaluate the genetic diversity of 34 diverse oat genotypes. Dendrogram clusters corresponded generally to known genome composition and genetic ancestry. Conclusions The high-throughput SNP discovery pipeline presented here is a rapid and effective method for identification of polymorphic SNP alleles in the oat genome. The current-generation HRM system is a simple and highly-informative platform for SNP genotyping. These techniques provide
Potkin Steven G
Full Text Available Abstract Background Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs, have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs. Results We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1 for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2 ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one
Genome-Wide Association Mapping for Intelligence in Military Working Dogs: Canine Cohort, Canine Intelligence Assessment Regimen, Genome-Wide Single Nucleotide Polymorphism (SNP) Typing, and Unsupervised Classification Algorithm for Genome-Wide Association Data Analysis
SNP Array v2. A ‘proof-of-concept’ advanced data mining algorithm for unsupervised analysis of genome-wide association study (GWAS) dataset was... Opal F AUS Yes U141 Peggs F AUS Yes U142 Taxi F AUS Yes U143 Riso MI MAL Yes U144 Szarik MI GSD Yes U145 Astor MI MAL Yes U146 Roy MC MAL Yes... mining of genetic studies in general, and especially GWAS. As a proof-of-concept, a classification analysis of the WG SNP typing dataset of a
Mohammadnejad, Afsaneh; Brasch-Andersen, Charlotte; Haagerup, Annette
Background: Allergic Rhinitis (AR) is a complex disorder that affects many people around the world. There is a high genetic contribution to the development of the AR, as twins and family studies have estimated heritability of more than 33%. Due to the complex nature of the disease, single SNP...... analysis has limited power in identifying the genetic variations for AR. We combined genome-wide association analysis (GWAS) with polygenic risk score (PRS) in exploring the genetic basis underlying the disease. Methods: We collected clinical data on 631 Danish subjects with AR cases consisting of 434...... sibling pairs and unrelated individuals and control subjects of 197 unrelated individuals. SNP genotyping was done by Affymetrix Genome-Wide Human SNP Array 5.0. SNP imputation was performed using "IMPUTE2". Using additive effect model, GWAS was conducted in discovery sample, the genotypes...
Full Text Available Abstract Background With the availability of large-scale genome-wide association study (GWAS data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs to predict psoriasis from searching GWAS data. Methods Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB method was compared with classical linear discriminant analysis(LDA for classification performance. Results The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698, while only 0.520(95% CI: 0.472-0.524 was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study. Conclusions The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.
Dozmorov, Mikhail G; Cara, Lukas R; Giles, Cory B; Wren, Jonathan D
The growing amount of regulatory data from the ENCODE, Roadmap Epigenomics and other consortia provides a wealth of opportunities to investigate the functional impact of single nucleotide polymorphisms (SNPs). Yet, given the large number of regulatory datasets, researchers are posed with a challenge of how to efficiently utilize them to interpret the functional impact of SNP sets. We developed the GenomeRunner web server to automate systematic statistical analysis of SNP sets within a regulatory context. Besides defining the functional impact of SNP sets, GenomeRunner implements novel regulatory similarity/differential analyses, and cell type-specific regulatory enrichment analysis. Validated against literature- and disease ontology-based approaches, analysis of 39 disease/trait-associated SNP sets demonstrated that the functional impact of SNP sets corresponds to known disease relationships. We identified a group of autoimmune diseases with SNPs distinctly enriched in the enhancers of T helper cell subpopulations, and demonstrated relevant cell type-specificity of the functional impact of other SNP sets. In summary, we show how systematic analysis of genomic data within a regulatory context can help interpreting the functional impact of SNP sets. GenomeRunner web server is freely available at http://www.integrativegenomics.org/ firstname.lastname@example.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
Hand Melanie L
GoldenGate™ assay is capable of high-throughput co-dominant SNP allele detection, and minimises the problems associated with SNP genotyping in a polyploid by effectively reducing the complexity to a diploid system. This SNP collection may now be refined and used in applications such as cultivar identification, genetic linkage map construction, genome-wide association studies and genomic selection in tall fescue. The bioinformatic pipeline described here represents an effective general method for SNP discovery within outbreeding allopolyploid species.
Chagné, D.; Crowhurst, R.N.; Troggio, M.; Davey, M.W.; Gilmore, B.; Lawley, C.; Vanderzande, S.; Hellens, R.P.; Kumar, S.; Cestaro, A.; Velasco, R.; Main, D.; Rees, J.D.; Iezzoni, A.F.; Mockler, T.; Wilhelm, L.; Weg, van de W.E.; Gardiner, S.E.; Bassil, N.; Peace, C.
As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide
Full Text Available Background: We conducted a genome-wide association study (GWAS to identify specific genetic variants that underlie susceptibility to disease caused by Staphylococcus aureus in humans. Methods: Cases (n=309 and controls (n=2,925 were genotyped at 508,921 single nucleotide polymorphisms (SNPs. Cases had at least one laboratory and clinician confirmed disease caused by S. aureus whereas controls did not. R-package (for SNP association, EIGENSOFT (to estimate and adjust for population stratification and gene- (VEGAS and pathway-based (DAVID, PANTHER, and Ingenuity Pathway Analysis analyses were performed.Results: No SNP reached genome-wide significance. Four SNPs exceeded the pConclusion: We identified potential susceptibility genes for S. aureus diseases in this preliminary study but confirmation by other studies is needed. The observed associations could be relevant given the complexity of S. aureus as a pathogen and its ability to exploit multiple biological pathways to cause infections in humans.
Orr, N; Back, W; Gu, J; Leegwater, P; Govindarajan, P; Conroy, J; Ducro, B; Van Arendonk, J A M; MacHugh, D E; Ennis, S; Hill, E W; Brama, P A J
The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of inheritance, to a 2-MB region of chromosome 14 using just 10 affected animals and 10 controls. We successfully genotyped 34,429 SNPs that were tested for association with dwarfism using chi-square tests. The most significant SNP in our study, BIEC2-239376 (P(2df)=4.54 × 10(-5), P(rec)=7.74 × 10(-6)), is located close to a gene implicated in human dwarfism. Fine-mapping and resequencing analyses did not aid in further localization of the causative variant, and replication of our findings in independent sample sets will be necessary to confirm these results. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.
Porter Christopher J
Full Text Available Abstract Background SNP microarrays are designed to genotype Single Nucleotide Polymorphisms (SNPs. These microarrays report hybridization of DNA fragments and therefore can be used for the purpose of detecting genomic fragments. Results Here, we demonstrate that a SNP microarray can be effectively used in this way to perform chromatin immunoprecipitation (ChIP on chip as an alternative to tiling microarrays. We illustrate this novel application by mapping whole genome histone H4 hyperacetylation in human myoblasts and myotubes. We detect clusters of hyperacetylated histone H4, often spanning across up to 300 kilobases of genomic sequence. Using complementary genome-wide analyses of gene expression by DNA microarray we demonstrate that these clusters of hyperacetylated histone H4 tend to be associated with expressed genes. Conclusion The use of a SNP array for a ChIP-on-chip application (ChIP on SNP-chip will be of great value to laboratories whose interest is the determination of general rules regarding the relationship of specific chromatin modifications to transcriptional status throughout the genome and to examine the asymmetric modification of chromatin at heterozygous loci.
Mourad, Amira M I; Sallam, Ahmed; Belamkar, Vikas; Wegulo, Stephen; Bowden, Robert; Jin, Yue; Mahdy, Ezzat; Bakheit, Bahy; El-Wafaa, Atif A; Poland, Jesse; Baenziger, Peter S
Stem rust (caused by Puccinia graminis f. sp. tritici Erikss. & E. Henn.), is a major disease in wheat ( Triticum aestivium L.). However, in recent years it occurs rarely in Nebraska due to weather and the effective selection and gene pyramiding of resistance genes. To understand the genetic basis of stem rust resistance in Nebraska winter wheat, we applied genome-wide association study (GWAS) on a set of 270 winter wheat genotypes (A-set). Genotyping was carried out using genotyping-by-sequencing and ∼35,000 high-quality SNPs were identified. The tested genotypes were evaluated for their resistance to the common stem rust race in Nebraska (QFCSC) in two replications. Marker-trait association identified 32 SNP markers, which were significantly (Bonferroni corrected P < 0.05) associated with the resistance on chromosome 2D. The chromosomal location of the significant SNPs (chromosome 2D) matched the location of Sr6 gene which was expected in these genotypes based on pedigree information. A highly significant linkage disequilibrium (LD, r 2 ) was found between the significant SNPs and the specific SSR marker for the Sr6 gene ( Xcfd43 ). This suggests the significant SNP markers are tagging Sr6 gene. Out of the 32 significant SNPs, eight SNPs were in six genes that are annotated as being linked to disease resistance in the IWGSC RefSeq v1.0. The 32 significant SNP markers were located in nine haplotype blocks. All the 32 significant SNPs were validated in a set of 60 different genotypes (V-set) using single marker analysis. SNP markers identified in this study can be used in marker-assisted selection, genomic selection, and to develop KASP (Kompetitive Allele Specific PCR) marker for the Sr6 gene. Novel SNPs for Sr6 gene, an important stem rust resistant gene, were identified and validated in this study. These SNPs can be used to improve stem rust resistance in wheat.
Marete, Andrew Gitahi; Sahana, Goutam; Fritz, Sebastian
Using a combination of data from the BovineSNP50 BeadChip SNP array (Illumina, San Diego, CA) and a EuroGenomics (Amsterdam, the Netherlands) custom single nucleotide polymorphism (SNP) chip with SNP pre-selected from whole genome sequence data, we carried out an association study of milking speed...... associated with milking speed. As clinical mastitis and somatic cell score have an unfavorable genetic correlation with milking speed, we tested whether the most significant SNP on these 22 chromosomes associated with milking speed were also associated with clinical mastitis or somatic cell score. Nine...... hundred seventy-one genome-wide significant SNP were associated with milking speed. Of these, 86 were associated with clinical mastitis and 198 with somatic cell score. The most significant association signals for milking speed were observed on chromosomes 7, 8, 10, 14, and 18. The most significant signal...
Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.
Calus Mario PL
Full Text Available Abstract Background Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype information are not in agreement. Methods Straightforward tests to detect Mendelian inconsistencies exist that count the number of opposing homozygous marker (e.g. SNP genotypes between parent and offspring (PAR-OFF. Here, we develop two tests to identify Mendelian inconsistencies between sibs. The first test counts SNP with opposing homozygous genotypes between sib pairs (SIBCOUNT. The second test compares pedigree and SNP-based relationships (SIBREL. All tests iteratively remove animals based on decreasing numbers of inconsistent parents and offspring or sibs. The PAR-OFF test, followed by either SIB test, was applied to a dataset comprising 2,078 genotyped cows and 211 genotyped sires. Theoretical expectations for distributions of test statistics of all three tests were calculated and compared to empirically derived values. Type I and II error rates were calculated after applying the tests to the edited data, while Mendelian inconsistencies were introduced by permuting pedigree against genotype data for various proportions of animals. Results Both SIB tests identified animal pairs for which pedigree and genomic relationships could be considered as inconsistent by visual inspection of a scatter plot of pairwise pedigree and SNP-based relationships. After removal of 235 animals with the PAR-OFF test, SIBCOUNT (SIBREL identified 18 (22 additional inconsistent animals. Seventeen animals were identified by both methods. The numbers of incorrectly deleted animals (Type I error, were equally low for both methods, while the numbers of incorrectly non-deleted animals (Type II error, were considerably higher for SIBREL compared to SIBCOUNT. Conclusions
Orr, J.L.; Back, W.; Gu, J.; Leegwater, P.H.; Govindarajan, P.; Conroy, J.; Ducro, B.J.; Arendonk, van J.A.M.
The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of
Full Text Available Abstract Background Mitochondrial single nucleotide polymorphisms (mtSNPs constitute important data when trying to shed some light on human diseases and cancers. Unfortunately, providing relevant mtSNP genotyping information in mtDNA databases in a neatly organized and transparent visual manner still remains a challenge. Amongst the many methods reported for SNP genotyping, determining the restriction fragment length polymorphisms (RFLPs is still one of the most convenient and cost-saving methods. In this study, we prepared the visualization of the mtDNA genome in a way, which integrates the RFLP genotyping information with mitochondria related cancers and diseases in a user-friendly, intuitive and interactive manner. The inherent problem associated with mtDNA sequences in BLAST of the NCBI database was also solved. Description V-MitoSNP provides complete mtSNP information for four different kinds of inputs: (1 color-coded visual input by selecting genes of interest on the genome graph, (2 keyword search by locus, disease and mtSNP rs# ID, (3 visualized input of nucleotide range by clicking the selected region of the mtDNA sequence, and (4 sequences mtBLAST. The V-MitoSNP output provides 500 bp (base pairs flanking sequences for each SNP coupled with the RFLP enzyme and the corresponding natural or mismatched primer sets. The output format enables users to see the SNP genotype pattern of the RFLP by virtual electrophoresis of each mtSNP. The rate of successful design of enzymes and primers for RFLPs in all mtSNPs was 99.1%. The RFLP information was validated by actual agarose electrophoresis and showed successful results for all mtSNPs tested. The mtBLAST function in V-MitoSNP provides the gene information within the input sequence rather than providing the complete mitochondrial chromosome as in the NCBI BLAST database. All mtSNPs with rs number entries in NCBI are integrated in the corresponding SNP in V-MitoSNP. Conclusion V-MitoSNP is a web
Excoffier, Laurent; Dupanloup, Isabelle; Huerta-Sánchez, Emilia; Sousa, Vitor C.; Foll, Matthieu
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. PMID:24204310
Zheng, Jie; Gaunt, Tom R; Day, Ian N M
Genome-Wide Association Studies (GWAS) frequently incorporate meta-analysis within their framework. However, conditional analysis of individual-level data, which is an established approach for fine mapping of causal sites, is often precluded where only group-level summary data are available for analysis. Here, we present a numerical and graphical approach, "sequential sentinel SNP regional association plot" (SSS-RAP), which estimates regression coefficients (beta) with their standard errors using the meta-analysis summary results directly. Under an additive model, typical for genes with small effect, the effect for a sentinel SNP can be transformed to the predicted effect for a possibly dependent SNP through a 2×2 2-SNP haplotypes table. The approach assumes Hardy-Weinberg equilibrium for test SNPs. SSS-RAP is available as a Web-tool (http://apps.biocompute.org.uk/sssrap/sssrap.cgi). To develop and illustrate SSS-RAP we analyzed lipid and ECG traits data from the British Women's Heart and Health Study (BWHHS), evaluated a meta-analysis for ECG trait and presented several simulations. We compared results with existing approaches such as model selection methods and conditional analysis. Generally findings were consistent. SSS-RAP represents a tool for testing independence of SNP association signals using meta-analysis data, and is also a convenient approach based on biological principles for fine mapping in group level summary data. © 2012 Blackwell Publishing Ltd/University College London.
Background High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera. Results We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of Eucalyptus from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for E. grandis. A systematic assessment of in silico SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous in silico constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species. SNP reliability was high across nine Eucalyptus species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased. Conclusions This study indicates that the GGGT performs well both within and across species of Eucalyptus notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across
Calus, M.P.L.; Mulder, H.A.; Bastiaansen, J.W.M.
Background Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype
Leekitcharoenphon, Pimlapas; Kaas, Rolf Sommer; Thomsen, Martin Christen Frølund
identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed...... to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic...... skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Results Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can...
Watson-Haigh Nathan S
Full Text Available Abstract Background Whole genome association studies using highly dense single nucleotide polymorphisms (SNPs are a set of methods to identify DNA markers associated with variation in a particular complex trait of interest. One of the main outcomes from these studies is a subset of statistically significant SNPs. Finding the potential biological functions of such SNPs can be an important step towards further use in human and agricultural populations (e.g., for identifying genes related to susceptibility to complex diseases or genes playing key roles in development or performance. The current challenge is that the information holding the clues to SNP functions is distributed across many different databases. Efficient bioinformatics tools are therefore needed to seamlessly integrate up-to-date functional information on SNPs. Many web services have arisen to meet the challenge but most work only within the framework of human medical research. Although we acknowledge the importance of human research, we identify there is a need for SNP annotation tools for other organisms. Description We introduce an R package called FunctSNP, which is the user interface to custom built species-specific databases. The local relational databases contain SNP data together with functional annotations extracted from online resources. FunctSNP provides a unified bioinformatics resource to link SNPs with functional knowledge (e.g., genes, pathways, ontologies. We also introduce dbAutoMaker, a suite of Perl scripts, which can be scheduled to run periodically to automatically create/update the customised SNP databases. We illustrate the use of FunctSNP with a livestock example, but the approach and software tools presented here can be applied also to human and other organisms. Conclusions Finding the potential functional significance of SNPs is important when further using the outcomes from whole genome association studies. FunctSNP is unique in that it is the only R
De La Vega, Francisco M; Dailey, David; Ziegle, Janet; Williams, Julie; Madden, Dawn; Gilbert, Dennis A
Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.
Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong
-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All...... of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.......25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered...
Full Text Available GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM and principal components analysis based approach (PCA using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.
Zhang, Han; Wheeler, William; Song, Lei; Yu, Kai
As meta-analysis results published by consortia of genome-wide association studies (GWASs) become increasingly available, many association summary statistics-based multi-locus tests have been developed to jointly evaluate multiple single-nucleotide polymorphisms (SNPs) to reveal novel genetic architectures of various complex traits. The validity of these approaches relies on the accurate estimate of z-score correlations at considered SNPs, which in turn requires knowledge on the set of SNPs assessed by each study participating in the meta-analysis. However, this exact SNP coverage information is usually unavailable from the meta-analysis results published by GWAS consortia. In the absence of the coverage information, researchers typically estimate the z-score correlations by making oversimplified coverage assumptions. We show through real studies that such a practice can generate highly inflated type I errors, and we demonstrate the proper way to incorporate correct coverage information into multi-locus analyses. We advocate that consortia should make SNP coverage information available when posting their meta-analysis results, and that investigators who develop analytic tools for joint analyses based on summary data should pay attention to the variation in SNP coverage and adjust for it appropriately. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Background The ability to transport and store DNA at room temperature in low volumes has the advantage of optimising cost, time and storage space. Blood spots on adapted filter papers are popular for this, with FTA (Flinders Technology Associates) Whatman™TM technology being one of the most recent. Plant material, plasmids, viral particles, bacteria and animal blood have been stored and transported successfully using this technology, however the method of porcine DNA extraction from FTA Whatman™TM cards is a relatively new approach, allowing nucleic acids to be ready for downstream applications such as PCR, whole genome amplification, sequencing and subsequent application to single nucleotide polymorphism microarrays has hitherto been under-explored. Findings DNA was extracted from FTA Whatman™TM cards (following adaptations of the manufacturer’s instructions), whole genome amplified and subsequently analysed to validate the integrity of the DNA for downstream SNP analysis. DNA was successfully extracted from 288/288 samples and amplified by WGA. Allele dropout post WGA, was observed in less than 2% of samples and there was no clear evidence of amplification bias nor contamination. Acceptable call rates on porcine SNP chips were also achieved using DNA extracted and amplified in this way. Conclusions DNA extracted from FTA Whatman cards is of a high enough quality and quantity following whole genomic amplification to perform meaningful SNP chip studies. PMID:22974252
This is because the SNPs on BovineSNP50 and GGP-80K assays were ascertained as being common in European taurine breeds. Lower MAF and SNP informativeness observed in this study limits the application of these assays in breed assignment, and could have other implications for genome-wide studies in South ...
Full Text Available Obesity represents a major global public health problem that increases the risk for cardiovascular or metabolic disease. The pigs represent an exceptional biomedical model related to energy metabolism and obesity in humans. To pinpoint causal genetic factors for a common form of obesity, we conducted local genomic de novo sequencing, 18.2 Mb, of a porcine QTL region affecting fatness traits, and carried out SNP association studies for backfat thickness and intramuscular fat content in pigs. In order to relate the association studies in pigs to human obesity, we performed a targeted genome wide association study for subcutaneous fat thickness in a cohort population of 8,842 Korean individuals. These combined association studies in human and pig revealed a significant SNP located in a gene family with sequence similarity 73, member A (FAM73A associated with subscapular skin-fold thickness in humans (rs4121165, GC-corrected p-value = 0.0000175 and with backfat thickness in pigs (ASGA0029495, p-value = 0.000031. Our combined association studies also suggest that eight neuronal genes are responsible for subcutaneous fat thickness: NEGR1, SLC44A5, PDE4B, LPHN2, ELTD1, ST6GALNAC3, ST6GALNAC5, and TTLL7. These results provide strong support for a major involvement of the CNS in the genetic predisposition to a common form of obesity.
Full Text Available Laying performance is an important economical trait of goose production. As laying performance is of low heritability, it is of significance to develop a marker-assisted selection (MAS strategy for this trait. Definition of sequence variation related to the target trait is a prerequisite of quantitating MAS, but little is presently known about the goose genome, which greatly hinders the identification of genetic markers for the laying traits of geese. Recently developed restriction site-associated DNA (RAD sequencing is a possible approach for discerning large-scale single nucleotide polymorphism (SNP and reducing the complexity of a genome without having reference genomic information available. In the present study, we developed a pooled RAD sequencing strategy for detecting geese laying-related SNP. Two DNA pools were constructed, each consisting of equal amounts of genomic DNA from 10 individuals with either high estimated breeding value (HEBV or low estimated breeding value (LEBV. A total of 139,013 SNP were obtained from 42,291,356 sequences, of which 18,771,943 were for LEBV and 23,519,413 were for HEBV cohorts. Fifty-five SNP which had different allelic frequencies in the two DNA pools were further validated by individual-based AS-PCR genotyping in the LEBV and HEBV cohorts. Ten out of 55 SNP exhibited distinct allele distributions in these two cohorts. These 10 SNP were further genotyped in a goose population of 492 geese to verify the association with egg numbers. The result showed that 8 of 10 SNP were associated with egg numbers. Additionally, liner regression analysis revealed that SNP Record-111407, 106975 and 112359 were involved in a multiplegene network affecting laying performance. We used IPCR to extend the unknown regions flanking the candidate RAD tags. The obtained sequences were subjected to BLAST to retrieve the orthologous genes in either ducks or chickens. Five novel genes were cloned for geese which harbored the
Bjelland, D W; Weigel, K A; Vukasinovic, N; Nkrumah, J D
The effects of increased pedigree inbreeding in dairy cattle populations have been well documented and result in a negative impact on profitability. Recent advances in genotyping technology have allowed researchers to move beyond pedigree analysis and study inbreeding at a molecular level. In this study, 5,853 animals were genotyped for 54,001 single nucleotide polymorphisms (SNP); 2,913 cows had phenotypic records including a single lactation for milk yield (from either lactation 1, 2, 3, or 4), reproductive performance, and linear type conformation. After removing SNP with poor call rates, low minor allele frequencies, and departure from Hardy-Weinberg equilibrium, 33,025 SNP remained for analyses. Three measures of genomic inbreeding were evaluated: percent homozygosity (FPH), inbreeding calculated from runs of homozygosity (FROH), and inbreeding derived from a genomic relationship matrix (FGRM). Average FPH was 60.5±1.1%, average FROH was 3.8±2.1%, and average FGRM was 20.8±2.3%, where animals with larger values for each of the genomic inbreeding indices were considered more inbred. Decreases in total milk yield to 205d postpartum of 53, 20, and 47kg per 1% increase in FPH, FROH, and FGRM, respectively, were observed. Increases in days open per 1% increase in FPH (1.76 d), FROH (1.72 d), and FGRM (1.06 d) were also noted, as well as increases in maternal calving difficulty (0.09, 0.03, and 0.04 on a 5-point scale for FPH, FROH, and FGRM, respectively). Several linear type traits, such as strength (-0.40, -0.11, and -0.19), rear legs rear view (-0.35, -0.16, and -0.14), front teat placement (0.35, 0.25, 0.18), and teat length (-0.24, -0.14, and -0.13) were also affected by increases in FPH, FROH, and FGRM, respectively. Overall, increases in each measure of genomic inbreeding in this study were associated with negative effects on production and reproductive ability in dairy cows. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc
Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela
High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.
Full Text Available High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus. A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs. Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.
Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela
High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs. PMID:25303088
Full Text Available Abstract Background Genome-wide association studies (GWAS do not provide a full account of the heritability of genetic diseases since gene-gene interactions, also known as epistasis are not considered in single locus GWAS. To address this problem, a considerable number of methods have been developed for identifying disease-associated gene-gene interactions. However, these methods typically fail to identify interacting markers explaining more of the disease heritability over single locus GWAS, since many of the interactions significant for disease are obscured by uninformative marker interactions e.g., linkage disequilibrium (LD. Results In this study, we present a novel SNP interaction prioritization algorithm, named iLOCi (Interacting Loci. This algorithm accounts for marker dependencies separately in case and control groups. Disease-associated interactions are then prioritized according to a novel ranking score calculated from the difference in marker dependencies for every possible pair between case and control groups. The analysis of a typical GWAS dataset can be completed in less than a day on a standard workstation with parallel processing capability. The proposed framework was validated using simulated data and applied to real GWAS datasets using the Wellcome Trust Case Control Consortium (WTCCC data. The results from simulated data showed the ability of iLOCi to identify various types of gene-gene interactions, especially for high-order interaction. From the WTCCC data, we found that among the top ranked interacting SNP pairs, several mapped to genes previously known to be associated with disease, and interestingly, other previously unreported genes with biologically related roles. Conclusion iLOCi is a powerful tool for uncovering true disease interacting markers and thus can provide a more complete understanding of the genetic basis underlying complex disease. The program is available for download at http://www4a.biotec.or.th/GI/tools/iloci.
Wagner Mark C
Full Text Available Abstract Background Microbial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation. Results A method and SPR Opt (SNP and PCR-RFLP Optimization software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes. Conclusion This is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As
Full Text Available BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS. This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP data. SNPpy and its dependencies are open source software. RESULTS: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. CONCLUSIONS: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.
Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M
Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.
Full Text Available The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50-60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed: Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes, sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years.
Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from
Samantha A Brooks
Full Text Available Lavender Foal Syndrome (LFS is a lethal inherited disease of horses with a suspected autosomal recessive mode of inheritance. LFS has been primarily diagnosed in a subgroup of the Arabian breed, the Egyptian Arabian horse. The condition is characterized by multiple neurological abnormalities and a dilute coat color. Candidate genes based on comparative phenotypes in mice and humans include the ras-associated protein RAB27a (RAB27A and myosin Va (MYO5A. Here we report mapping of the locus responsible for LFS using a small set of 36 horses segregating for LFS. These horses were genotyped using a newly available single nucleotide polymorphism (SNP chip containing 56,402 discriminatory elements. The whole genome scan identified an associated region containing these two functional candidate genes. Exon sequencing of the MYO5A gene from an affected foal revealed a single base deletion in exon 30 that changes the reading frame and introduces a premature stop codon. A PCR-based Restriction Fragment Length Polymorphism (PCR-RFLP assay was designed and used to investigate the frequency of the mutant gene. All affected horses tested were homozygous for this mutation. Heterozygous carriers were detected in high frequency in families segregating for this trait, and the frequency of carriers in unrelated Egyptian Arabians was 10.3%. The mapping and discovery of the LFS mutation represents the first successful use of whole-genome SNP scanning in the horse for any trait. The RFLP assay can be used to assist breeders in avoiding carrier-to-carrier matings and thus in preventing the birth of affected foals.
Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P
Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.
The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a textbased browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.
Barnett, Ian; Mukherjee, Rajarshi; Lin, Xihong
It is of substantial interest to study the effects of genes, genetic pathways, and networks on the risk of complex diseases. These genetic constructs each contain multiple SNPs, which are often correlated and function jointly, and might be large in number. However, only a sparse subset of SNPs in a genetic construct is generally associated with the disease of interest. In this article, we propose the generalized higher criticism (GHC) to test for the association between an SNP set and a disease outcome. The higher criticism is a test traditionally used in high-dimensional signal detection settings when marginal test statistics are independent and the number of parameters is very large. However, these assumptions do not always hold in genetic association studies, due to linkage disequilibrium among SNPs and the finite number of SNPs in an SNP set in each genetic construct. The proposed GHC overcomes the limitations of the higher criticism by allowing for arbitrary correlation structures among the SNPs in an SNP-set, while performing accurate analytic p-value calculations for any finite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association study. Supplementary materials for this article are available online. PMID:28736464
Full Text Available Objective Holsteins are known as the world’s highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein using Korean Holstein data. Methods This study was performed using single nucleotide polymorphism (SNP chip data (Illumina BovineSNP50 Beadchip of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. Results We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. Conclusion This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins.
James W Kijas
Full Text Available The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability.
Maurice-Van Eijndhoven, M H T; Bovenhuis, H; Veerkamp, R F; Calus, M P L
The aim of this study was to identify if genomic variations associated with fatty acid (FA) composition are similar between the Holstein-Friesian (HF) and native dual-purpose breeds used in the Dutch dairy industry. Phenotypic and genotypic information were available for the breeds Meuse-Rhine-Yssel (MRY), Dutch Friesian (DF), Groningen White Headed (GWH), and HF. First, the reliability of genomic breeding values of the native Dutch dual-purpose cattle breeds MRY, DF, and GWH was evaluated using single nucleotide polymorphism (SNP) effects estimated in HF, including all SNP or subsets with stronger associations in HF. Second, the genomic variation of the regions associated with FA composition in HF (regions on Bos taurus autosome 5, 14, and 26), were studied in the different breeds. Finally, similarities in genotype and allele frequencies between MRY, DF, GWH, and HF breeds were assessed for specific regions associated with FA composition. On average across the traits, the highest reliabilities of genomic prediction were estimated for GWH (0.158) and DF (0.116) when the 8 to 22 SNP with the strongest association in HF were included. With the same set of SNP, GEBV for MRY were the least reliable (0.022). This indicates that on average only 2 (MRY) to 16% (GWH) of the genomic variation in HF is shared with the native Dutch dual-purpose breeds. The comparison of predicted variances of different regions associated with milk and milk fat composition showed that breeds clearly differed in genomic variation within these regions. Finally, the correlations of allele frequencies between breeds across the 8 to 22 SNP with the strongest association in HF were around 0.8 between the Dutch native dual-purpose breeds, whereas the correlations between the native breeds and HF were clearly lower and around 0.5. There was no consistent relationship between the reliabilities of genomic prediction for a specific breed and the correlation between the allele frequencies of this breed
Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David
Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions
Liu, Zhizhen; Liu, Jinding; Wang, Jiaqi; Chen, Deqing; Liu, Zidong; Shi, Jie; Li, Zeqin; Li, Wenyan; Zhang, Gengqian; Du, Bing
Unbalanced DNA mixture is still a difficult problem for forensic practice. DIP-STRs are useful markers for detection of minor DNA but they are not widespread in the human genome and having long amplicons. In this study, we proposed a novel type of genetic marker, termed DIP-SNP. DIP-SNP refers to the combination of INDEL and SNP in less than 300bp length of human genome. The multiplex PCR and SNaPshot assay were established for 14 DIP-SNP markers in a Chinese Han population from Shanxi, China. This novel compound marker allows detection of the minor DNA contributor with sensitivity from 1:50 to 1:1000 in a DNA mixture of any gender with 1 ng-10 ng DNA template. Most of the DIP-SNP markers had a relatively high probability of informative alleles with an average I value of 0.33. In all, we proposed DIP-SNP as a novel kind of genetic marker for detection of minor contributor from unbalanced DNA mixture and established the detection method by associating the multiplex PCR and SNaPshot assay. DIP-SNP polymorphisms are promising markers for forensic or clinical mixture examination because they are shorter, widespread and higher sensitive. Copyright © 2018 Elsevier Inc. All rights reserved.
B. G. Welderufael
Full Text Available Because mastitis is very frequent and unavoidable, adding recovery information into the analysis for genetic evaluation of mastitis is of great interest from economical and animal welfare point of view. Here we have performed genome-wide association studies (GWAS to identify associated single nucleotide polymorphisms (SNPs and investigate the genetic background not only for susceptibility to – but also for recoverability from mastitis. Somatic cell count records from 993 Danish Holstein cows genotyped for a total of 39378 autosomal SNP markers were used for the association analysis. Single SNP regression analysis was performed using the statistical software package DMU. Substitution effect of each SNP was tested with a t-test and a genome-wide significance level of P-value < 10-4 was used to declare significant SNP-trait association. A number of significant SNP variants were identified for both traits. Many of the SNP variants associated either with susceptibility to – or recoverability from mastitis were located in or very near to genes that have been reported for their role in the immune system. Genes involved in lymphocyte developments (e.g., MAST3 and STAB2 and genes involved in macrophage recruitment and regulation of inflammations (PDGFD and PTX3 were suggested as possible causal genes for susceptibility to – and recoverability from mastitis, respectively. However, this is the first GWAS study for recoverability from mastitis and our results need to be validated. The findings in the current study are, therefore, a starting point for further investigations in identifying causal genetic variants or chromosomal regions for both susceptibility to – and recoverability from mastitis.
Erdoğan, Onur; Aydin Son, Yeşim
Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.
Full Text Available Abstract Background Thoroughbred horses have been selected for traits contributing to speed and stamina for centuries. It is widely recognized that inherited variation in physical and physiological characteristics is responsible for variation in individual aptitude for race distance, and that muscle phenotypes in particular are important. Results A genome-wide SNP-association study for optimum racing distance was performed using the EquineSNP50 Bead Chip genotyping array in a cohort of n = 118 elite Thoroughbred racehorses divergent for race distance aptitude. In a cohort-based association test we evaluated genotypic variation at 40,977 SNPs between horses suited to short distance (≤ 8 f and middle-long distance (> 8 f races. The most significant SNP was located on chromosome 18: BIEC2-417495 ~690 kb from the gene encoding myostatin (MSTN [Punadj. = 6.96 × 10-6]. Considering best race distance as a quantitative phenotype, a peak of association on chromosome 18 (chr18:65809482-67545806 comprising eight SNPs encompassing a 1.7 Mb region was observed. Again, similar to the cohort-based analysis, the most significant SNP was BIEC2-417495 (Punadj. = 1.61 × 10-9; PBonf. = 6.58 × 10-5. In a candidate gene study we have previously reported a SNP (g.66493737C>T in MSTN associated with best race distance in Thoroughbreds; however, its functional and genome-wide relevance were uncertain. Additional re-sequencing in the flanking regions of the MSTN gene revealed four novel 3' UTR SNPs and a 227 bp SINE insertion polymorphism in the 5' UTR promoter sequence. Linkage disequilibrium was highest between g.66493737C>T and BIEC2-417495 (r2 = 0.86. Conclusions Comparative association tests consistently demonstrated the g.66493737C>T SNP as the superior variant in the prediction of distance aptitude in racehorses (g.66493737C>T, P = 1.02 × 10-10; BIEC2-417495, Punadj. = 1.61 × 10-9. Functional investigations will be required to determine whether this
Full Text Available Abstract Background Genome-wide single-nucleotide polymorphism (SNP arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed. Results We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples. Conclusions This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC. SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm.
Parker Gaddis, K L; Megonigal, J H; Clay, J S; Wolfe, C W
Ketosis is one of the most frequently reported metabolic health events in dairy herds. Several genetic analyses of ketosis in dairy cattle have been conducted; however, few have focused specifically on Jersey cattle. The objectives of this research included estimating variance components for susceptibility to ketosis and identification of genomic regions associated with ketosis in Jersey cattle. Voluntary producer-recorded health event data related to ketosis were available from Dairy Records Management Systems (Raleigh, NC). Standardization was implemented to account for the various acronyms used by producers to designate an incidence of ketosis. Events were restricted to the first reported incidence within 60 d after calving in first through fifth parities. After editing, there were a total of 42,233 records from 23,865 cows. A total of 1,750 genotyped animals were used for genomic analyses using 60,671 markers. Because of the binary nature of the trait, a threshold animal model was fitted using THRGIBBS1F90 (version 2.110) using only pedigree information, and genomic information was incorporated using a single-step genomic BLUP approach. Individual single nucleotide polymorphism (SNP) effects and the proportion of variance explained by 10-SNP windows were calculated using postGSf90 (version 1.38). Heritability of susceptibility to ketosis was 0.083 [standard deviation (SD) = 0.021] and 0.078 (SD = 0.018) in pedigree-based and genomic analyses, respectively. The marker with the largest associated effect was located on chromosome 10 at 66.3 Mbp. The 10-SNP window explaining the largest proportion of variance (0.70%) was located on chromosome 6 beginning at 56.1 Mbp. Gene Ontology (GO) and Medical Subject Heading (MeSH) enrichment analyses identified several overrepresented processes and terms related to immune function. Our results indicate that there is a genetic component related to ketosis susceptibility in Jersey cattle and, as such, genetic selection for
Full Text Available Switchgrass (Panicum virgatum L. is a perennial grass that has been designated as an herbaceous model biofuel crop for the United States of America. To facilitate accelerated breeding programs of switchgrass, we developed both an association panel and linkage populations for genome-wide association study (GWAS and genomic selection (GS. All of the 840 individuals were then genotyped using genotyping by sequencing (GBS, generating 350 GB of sequence in total. As a highly heterozygous polyploid (tetraploid and octoploid species lacking a reference genome, switchgrass is highly intractable with earlier methodologies of single nucleotide polymorphism (SNP discovery. To access the genetic diversity of species like switchgrass, we developed a SNP discovery pipeline based on a network approach called the Universal Network-Enabled Analysis Kit (UNEAK. Complexities that hinder single nucleotide polymorphism discovery, such as repeats, paralogs, and sequencing errors, are easily resolved with UNEAK. Here, 1.2 million putative SNPs were discovered in a diverse collection of primarily upland, northern-adapted switchgrass populations. Further analysis of this data set revealed the fundamentally diploid nature of tetraploid switchgrass. Taking advantage of the high conservation of genome structure between switchgrass and foxtail millet (Setaria italica (L. P. Beauv., two parent-specific, synteny-based, ultra high-density linkage maps containing a total of 88,217 SNPs were constructed. Also, our results showed clear patterns of isolation-by-distance and isolation-by-ploidy in natural populations of switchgrass. Phylogenetic analysis supported a general south-to-north migration path of switchgrass. In addition, this analysis suggested that upland tetraploid arose from upland octoploid. All together, this study provides unparalleled insights into the diversity, genomic complexity, population structure, phylogeny, phylogeography, ploidy, and evolutionary dynamics
Payseur, Bret A; Place, Michael; Weber, James L
Patterns of linkage disequilibrium (LD) reveal the action of evolutionary processes and provide crucial information for association mapping of disease genes. Although recent studies have described the landscape of LD among single nucleotide polymorphisms (SNPs) from across the human genome, associations involving other classes of molecular variation remain poorly understood. In addition to recombination and population history, mutation rate and process are expected to shape LD. To test this idea, we measured associations between short-tandem-repeat polymorphisms (STRPs), which can mutate rapidly and recurrently, and SNPs in 721 regions across the human genome. We directly compared STRP-SNP LD with SNP-SNP LD from the same genomic regions in the human HapMap populations. The intensity of STRP-SNP LD, measured by the average of D', was reduced, consistent with the action of recurrent mutation. Nevertheless, a higher fraction of STRP-SNP pairs than SNP-SNP pairs showed significant LD, on both short (up to 50 kb) and long (cM) scales. These results reveal the substantial effects of mutational processes on LD at STRPs and provide important measures of the potential of STRPs for association mapping of disease genes.
Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark
Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed
Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong
Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Ren, Jing; Chen, Liang; Jin, Xiaoli; Zhang, Miaomiao; You, Frank M; Wang, Jirui; Frenkel, Vladimir; Yin, Xuegui; Nevo, Eviatar; Sun, Dongfa; Luo, Ming-Cheng; Peng, Junhua
Whole-genome scans with large number of genetic markers provide the opportunity to investigate local adaptation in natural populations and identify candidate genes under positive selection. In the present study, adaptation genetic differentiation associated with solar radiation was investigated using 695 polymorphic SNP markers in wild emmer wheat originated in a micro-site at Yehudiyya, Israel. The test involved two solar radiation niches: (1) sun, in-between trees; and (2) shade, under tree canopy, separated apart by a distance of 2-4 m. Analysis of molecular variance showed a small (0.53%) but significant portion of overall variation between the sun and shade micro-niches, indicating a non-ignorable genetic differentiation between sun and shade habitats. Fifty SNP markers showed a medium (0.05 ≤ F ST ≤ 0.15) or high genetic differentiation ( F ST > 0.15). A total of 21 outlier loci under positive selection were identified by using four different F ST -outlier testing algorithms. The markers and genome locations under positive selection are consistent with the known patterns of selection. These results suggested that genetic differentiation between sun and shade habitats is substantial, radiation-associated, and therefore ecologically determined. Hence, the results of this study reflected effects of natural selection through solar radiation on EST-related SNP genetic diversity, resulting presumably in different adaptive complexes at a micro-scale divergence. The present work highlights the evolutionary theory and application significance of solar radiation-driven natural selection in wheat improvement.
Cregan Perry B
Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs, amplified fragment length polymorphisms (AFLPs and simple sequence repeats (SSRs or microsatellite markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. Results We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis and GenBank (-dbSNP submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at http://bfgl.anri.barc.usda.gov/ML/snp-phage/. Conclusion SNP-PHAGE provides a bioinformatics
Climer, Sharlee; Yang, Wei; de las Fuentes, Lisa; Dávila-Román, Victor G; Gu, C Charles
Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns. This method identified 42 candidate multi-SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation-contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease-associated multi-SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets. © 2014 WILEY PERIODICALS, INC.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone
van der Spek, D; van Arendonk, J A M; Bovenhuis, H
Performing a genome-wide association study (GWAS) might add to a better understanding of the development of claw disorders and the need for trimming. Therefore, the aim of the current study was to perform a GWAS on claw disorders and trimming status and to validate the results for claw disorders based on an independent data set. Data consisted of 20,474 cows with phenotypes for claw disorders and 50,238 cows with phenotypes for trimming status. Recorded claw disorders used in the current study were double sole (DS), interdigital hyperplasia (IH), sole hemorrhage (SH), sole ulcer (SU), white line separation (WLS), a combination of infectious claw disorders consisting of (inter-)digital dermatitis and heel erosion, and a combination of laminitis-related claw disorders (DS, SH, SU, and WLS). Of the cows with phenotypes for claw disorders, 1,771 cows were genotyped and these cow data were used for the GWAS on claw disorders. A SNP was considered significant when the false discovery rate≤0.05 and suggestive when the false discovery rate≤0.20. An independent data set of 185 genotyped bulls having at least 5 daughters with phenotypes (6,824 daughters in total) for claw disorders was used to validate significant and suggestive SNP detected based on the cow data. To analyze the trait "trimming status" (i.e., the need for claw trimming), a data set with 327 genotyped bulls having at least 5 daughters with phenotypes (18,525 daughters in total) was used. Based on the cow data, in total 10 significant and 45 suggestive SNP were detected for claw disorders. The 10 significant SNP were associated with SU, and mainly located on BTA8. The suggestive SNP were associated with DS, IH, SU, and laminitis-related claw disorders. Three of the suggestive SNP were validated in the data set of 185 bulls, and were located on BTA13, BTA14, and BTA17. For infectious claw disorders, SH, and WLS, no significant or suggestive SNP associations were detected. For trimming status, 1 significant
Jee, Sun Ha; Sull, Jae Woong; Lee, Jong-Eun; Shin, Chol; Park, Jongkeun; Kimm, Heejin; Cho, Eun-Young; Shin, Eun-Soon; Yun, Ji Eun; Park, Ji Wan; Kim, Sang Yeun; Lee, Sun Ju; Jee, Eun Jung; Baik, Inkyung; Kao, Linda
Adiponectin is associated with obesity and insulin resistance. To date, there has been no genome-wide association study (GWAS) of adiponectin levels in Asians. Here we present a GWAS of a cohort of Korean volunteers. A total of 4,001 subjects were genotyped by using a genome-wide marker panel in a two-stage design (979 subjects initially and 3,022 in a second stage). Another 2,304 subjects were used for follow-up replication studies with selected markers. In the discovery phase, the top SNP a...
Consistent but indirect evidence has implicated genetic factors in smoking behavior. We report meta-analyses of several smoking phenotypes within cohorts of the Tobacco and Genetics Consortium (n = 74,053). We also partnered with the European Network of Genetic and Genomic Epidemiology (ENGAGE) and Oxford-GlaxoSmithKline (Ox-GSK) consortia to follow up the 15 most significant regions (n > 140,000). We identified three loci associated with number of cigarettes smoked per day. The strongest association was a synonymous 15q25 SNP in the nicotinic receptor gene CHRNA3 (rs1051730[A], beta = 1.03, standard error (s.e.) = 0.053, P = 2.8 x 10(-73)). Two 10q25 SNPs (rs1329650[G], beta = 0.367, s.e. = 0.059, P = 5.7 x 10(-10); and rs1028936[A], beta = 0.446, s.e. = 0.074, P = 1.3 x 10(-9)) and one 9q13 SNP in EGLN2 (rs3733829[G], beta = 0.333, s.e. = 0.058, P = 1.0 x 10(-8)) also exceeded genome-wide significance for cigarettes per day. For smoking initiation, eight SNPs exceeded genome-wide significance, with the strongest association at a nonsynonymous SNP in BDNF on chromosome 11 (rs6265[C], odds ratio (OR) = 1.06, 95% confidence interval (Cl) 1.04-1.08, P = 1.8 x 10(-8)). One SNP located near DBH on chromosome 9 (rs3025343[G], OR = 1.12, 95% Cl 1.08-1.18, P = 3.6 x 10(-8)) was significantly associated with smoking cessation.
Sambo, Francesco; Di Camillo, Barbara; Toffolo, Gianna; Cobelli, Claudio
The increasing interest in rare genetic variants and epistatic genetic effects on complex phenotypic traits is currently pushing genome-wide association study design towards datasets of increasing size, both in the number of studied subjects and in the number of genotyped single nucleotide polymorphisms (SNPs). This, in turn, is leading to a compelling need for new methods for compression and fast retrieval of SNP data. We present a novel algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies. Our algorithm is based on two main ideas: (i) compress linkage disequilibrium blocks in terms of differences with a reference SNP and (ii) compress reference SNPs exploiting information on their call rate and minor allele frequency. Tested on two SNP datasets and compared with several state-of-the-art software tools, our compression algorithm is shown to be competitive in terms of compression rate and to outperform all tools in terms of time to load compressed data. Our compression and decompression algorithms are implemented in a C++ library, are released under the GNU General Public License and are freely downloadable from http://www.dei.unipd.it/~sambofra/snpack.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
Brown, Allan F; Yousef, Gad G; Chebrolu, Kranthi K; Byrd, Robert W; Everhart, Koyt W; Thomas, Aswathy; Reid, Robert W; Parkin, Isobel A P; Sharpe, Andrew G; Oliver, Rebekah; Guzman, Ivette; Jackson, Eric W
A high-resolution genetic linkage map of B. oleracea was developed from a B. napus SNP array. The work will facilitate genetic and evolutionary studies in Brassicaceae. A broccoli population, VI-158 × BNC, consisting of 150 F2:3 families was used to create a saturated Brassica oleracea (diploid: CC) linkage map using a recently developed rapeseed (Brassica napus) (tetraploid: AACC) Illumina Infinium single nucleotide polymorphism (SNP) array. The map consisted of 547 non-redundant SNP markers spanning 948.1 cM across nine chromosomes with an average interval size of 1.7 cM. As the SNPs are anchored to the genomic reference sequence of the rapid cycling B. oleracea TO1000, we were able to estimate that the map provides 96 % coverage of the diploid genome. Carotenoid analysis of 2 years data identified 3 QTLs on two chromosomes that are associated with up to half of the phenotypic variation associated with the accumulation of total or individual compounds. By searching the genome sequences of the two related diploid species (B. oleracea and B. rapa), we further identified putative carotenoid candidate genes in the region of these QTLs. This is the first description of the use of a B. napus SNP array to rapidly construct high-density genetic linkage maps of one of the constituent diploid species. The unambiguous nature of these markers with regard to genomic sequences provides evidence to the nature of genes underlying the QTL, and demonstrates the value and impact this resource will have on Brassica research.
Studer, Bruno; Kölliker, Roland
In the recent years, single nucleotide polymorphism (SNP) markers have emerged as the marker technology of choice for plant genetics and breeding applications. Besides the efficient technologies available for SNP discovery even in complex genomes, one of the main reasons for this is the availabil...
Jee, Sun Ha; Sull, Jae Woong; Lee, Jong-Eun; Shin, Chol; Park, Jongkeun; Kimm, Heejin; Cho, Eun-Young; Shin, Eun-Soon; Yun, Ji Eun; Park, Ji Wan; Kim, Sang Yeun; Lee, Sun Ju; Jee, Eun Jung; Baik, Inkyung; Kao, Linda; Yoon, Sungjoo Kim; Jang, Yangsoo; Beaty, Terri H.
Adiponectin is associated with obesity and insulin resistance. To date, there has been no genome-wide association study (GWAS) of adiponectin levels in Asians. Here we present a GWAS of a cohort of Korean volunteers. A total of 4,001 subjects were genotyped by using a genome-wide marker panel in a two-stage design (979 subjects initially and 3,022 in a second stage). Another 2,304 subjects were used for follow-up replication studies with selected markers. In the discovery phase, the top SNP associated with mean log adiponectin was rs3865188 in CDH13 on chromosome 16 (p = 1.69 × 10−15 in the initial sample, p = 6.58 × 10−39 in the second genome-wide sample, and p = 2.12 × 10−32 in the replication sample). The meta-analysis p value for rs3865188 in all 6,305 individuals was 2.82 × 10−83. The association of rs3865188 with high-molecular-weight adiponectin (p = 7.36 × 10−58) was even stronger in the third sample. A reporter assay that evaluated the effects of a CDH13 promoter SNP in complete linkage disequilibrium with rs3865188 revealed that the major allele increased expression 2.2-fold. This study clearly shows that genetic variants in CDH13 influence adiponectin levels in Korean adults. PMID:20887962
Cole M McQueen
Full Text Available Pneumonia caused by Rhodococcus equi is a common cause of disease and death in foals. Although agent and environmental factors contribute to the incidence of this disease, the genetic factors influencing the clinical outcomes of R. equi pneumonia are ill-defined. Here, we performed independent single nucleotide polymorphism (SNP- and copy number variant (CNV-based genome-wide association studies to identify genomic loci associated with R. equi pneumonia in foals. Foals at a large Quarter Horse breeding farm were categorized into 3 groups: 1 foals with R. equi pneumonia (clinical group [N = 43]; 2 foals with ultrasonographic evidence of pulmonary lesions that never developed clinical signs of pneumonia (subclinical group [N = 156]; and, 3 foals without clinical signs or ultrasonographic evidence of pneumonia (unaffected group [N = 49]. From each group, 24 foals were randomly selected and used for independent SNP- and CNV-based genome-wide association studies (GWAS. The SNP-based GWAS identified a region on chromosome 26 that had moderate evidence of association with R. equi pneumonia when comparing clinical and subclinical foals. A joint analysis including all study foals revealed a 3- to 4-fold increase in odds of disease for a homozygous SNP within the associated region when comparing the clinical group with either of the other 2 groups of foals or their combination. The region contains the transient receptor potential cation channel, subfamily M, member 2 (TRPM2 gene, which is involved in neutrophil function. No associations were identified in the CNV-based GWAS. Collectively, these data identify a region on chromosome 26 associated with R. equi pneumonia in foals, providing evidence that genetic factors may indeed contribute to this important disease of foals.
Christine E McLaren
Full Text Available The existence of multiple inherited disorders of iron metabolism in man, rodents and other vertebrates suggests genetic contributions to iron deficiency. To identify new genomic locations associated with iron deficiency, a genome-wide association study (GWAS was performed using DNA collected from white men aged≥25 y and women≥50 y in the Hemochromatosis and Iron Overload Screening (HEIRS Study with serum ferritin (SF≤12 µg/L (cases and iron replete controls (SF>100 µg/L in men, SF>50 µg/L in women. Regression analysis was used to examine the association between case-control status (336 cases, 343 controls and quantitative serum iron measures and 331,060 single nucleotide polymorphism (SNP genotypes, with replication analyses performed in a sample of 71 cases and 161 controls from a population of white male and female veterans screened at a US Veterans Affairs (VA medical center. Five SNPs identified in the GWAS met genome-wide statistical significance for association with at least one iron measure, rs2698530 on chr. 2p14; rs3811647 on chr. 3q22, a known SNP in the transferrin (TF gene region; rs1800562 on chr. 6p22, the C282Y mutation in the HFE gene; rs7787204 on chr. 7p21; and rs987710 on chr. 22q11 (GWAS observed P<1.51×10(-7 for all. An association between total iron binding capacity and SNP rs3811647 in the TF gene (GWAS observed P=7.0×10(-9, corrected P=0.012 was replicated within the VA samples (observed P=0.012. Associations with the C282Y mutation in the HFE gene also were replicated. The joint analysis of the HEIRS and VA samples revealed strong associations between rs2698530 on chr. 2p14 and iron status outcomes. These results confirm a previously-described TF polymorphism and implicate one potential new locus as a target for gene identification.
Rao, Y S; Li, J; Zhang, R; Lin, X R; Xu, J G; Xie, L; Xu, Z Q; Wang, L; Gan, J K; Xie, X J; He, J; Zhang, X Q
Copy number variation (CNV) is an important source of genetic variation in organisms and a main factor that affects phenotypic variation. A comprehensive study of chicken CNV can provide valuable information on genetic diversity and facilitate future analyses of associations between CNV and economically important traits in chickens. In the present study, an F2 full-sib chicken population (554 individuals), established from a cross between Xinghua and White Recessive Rock chickens, was used to explore CNV in the chicken genome. Genotyping was performed using a chicken 60K SNP BeadChip. A total of 1,875 CNV were detected with the PennCNV algorithm, and the average number of CNV was 3.42 per individual. The CNV were distributed across 383 independent CNV regions (CNVR) and covered 41 megabases (3.97%) of the chicken genome. Seven CNVR in 108 individuals were validated by quantitative real-time PCR, and 81 of these individuals (75%) also were detected with the PennCNV algorithm. In total, 274 CNVR (71.54%) identified in the current study were previously reported. Of these, 147 (38.38%) were reported in at least 2 studies. Additionally, 109 of the CNVR (28.46%) discovered here are novel. A total of 709 genes within or overlapping with the CNVR was retrieved. Out of the 2,742 quantitative trait loci (QTL) collected in the chicken QTL database, 43 QTL had confidence intervals overlapping with the CNVR, and 32 CNVR encompassed one or more functional genes. The functional genes located in the CNVR are likely to be the QTG that are associated with underlying economic traits. This study considerably expands our insight into the structural variation in the genome of chickens and provides an important resource for genomic variation, especially for genomic structural variation related to economic traits in chickens. © 2016 Poultry Science Association Inc.
Mathew J Barber
Full Text Available Statins effectively lower total and plasma LDL-cholesterol, but the magnitude of decrease varies among individuals. To identify single nucleotide polymorphisms (SNPs contributing to this variation, we performed a combined analysis of genome-wide association (GWA results from three trials of statin efficacy.Bayesian and standard frequentist association analyses were performed on untreated and statin-mediated changes in LDL-cholesterol, total cholesterol, HDL-cholesterol, and triglyceride on a total of 3932 subjects using data from three studies: Cholesterol and Pharmacogenetics (40 mg/day simvastatin, 6 weeks, Pravastatin/Inflammation CRP Evaluation (40 mg/day pravastatin, 24 weeks, and Treating to New Targets (10 mg/day atorvastatin, 8 weeks. Genotype imputation was used to maximize genomic coverage and to combine information across studies. Phenotypes were normalized within each study to account for systematic differences among studies, and fixed-effects combined analysis of the combined sample were performed to detect consistent effects across studies. Two SNP associations were assessed as having posterior probability greater than 50%, indicating that they were more likely than not to be genuinely associated with statin-mediated lipid response. SNP rs8014194, located within the CLMN gene on chromosome 14, was strongly associated with statin-mediated change in total cholesterol with an 84% probability by Bayesian analysis, and a p-value exceeding conventional levels of genome-wide significance by frequentist analysis (P = 1.8 x 10(-8. This SNP was less significantly associated with change in LDL-cholesterol (posterior probability = 0.16, P = 4.0 x 10(-6. Bayesian analysis also assigned a 51% probability that rs4420638, located in APOC1 and near APOE, was associated with change in LDL-cholesterol.Using combined GWA analysis from three clinical trials involving nearly 4,000 individuals treated with simvastatin, pravastatin, or atorvastatin, we
Kim, Young Uk; Kim, Young Jin; Lee, Jong-Young; Park, Kiejung
Genome-wide association studies (GWAS) have become popular as an approach for the identification of large numbers of phenotype-associated variants. However, differences in genetic architecture and environmental factors mean that the effect of variants can vary across populations. Understanding population genetic diversity is valuable for the investigation of possible population specific and independent effects of variants. EvoSNP-DB aims to provide information regarding genetic diversity among East Asian populations, including Chinese, Japanese, and Korean. Non-redundant SNPs (1.6 million) were genotyped in 54 Korean trios (162 samples) and were compared with 4 million SNPs from HapMap phase II populations. EvoSNP-DB provides two user interfaces for data query and visualization, and integrates scores of genetic diversity (Fst and VarLD) at the level of SNPs, genes, and chromosome regions. EvoSNP-DB is a web-based application that allows users to navigate and visualize measurements of population genetic differences in an interactive manner, and is available online at [http://biomi.cdc.go.kr/EvoSNP/].
Full Text Available Genome-wide association studies (GWAS have identified multiple single nucleotide polymorphisms (SNPs associated with prostate cancer risk. However, whether these associations can be consistently replicated, vary with disease aggressiveness (tumor stage and grade and/or interact with non-genetic potential risk factors or other SNPs is unknown. We therefore genotyped 39 SNPs from regions identified by several prostate cancer GWAS in 10,501 prostate cancer cases and 10,831 controls from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3. We replicated 36 out of 39 SNPs (P-values ranging from 0.01 to 10⁻²⁸. Two SNPs located near KLK3 associated with PSA levels showed differential association with Gleason grade (rs2735839, P = 0.0001 and rs266849, P = 0.0004; case-only test, where the alleles associated with decreasing PSA levels were inversely associated with low-grade (as defined by Gleason grade < 8 tumors but positively associated with high-grade tumors. No other SNP showed differential associations according to disease stage or grade. We observed no effect modification by SNP for association with age at diagnosis, family history of prostate cancer, diabetes, BMI, height, smoking or alcohol intake. Moreover, we found no evidence of pair-wise SNP-SNP interactions. While these SNPs represent new independent risk factors for prostate cancer, we saw little evidence for effect modification by other SNPs or by the environmental factors examined.
The objective of this research was to identify genomic regions associated with clinical mastitis (MAST) in US Holsteins using producer-reported data. Genome-wide association studies (GWAS) were performed on deregressed PTA using GEMMA v. 0.94. Genotypes included 60,671 SNP for all predictor bulls (n...
Haberland, A M; König von Borstel, U; Simianer, H; König, S
Reliable selection criteria are required for young riding horses to increase genetic gain by increasing accuracy of selection and decreasing generation intervals. In this study, selection strategies incorporating genomic breeding values (GEBVs) were evaluated. Relevant stages of selection in sport horse breeding programs were analyzed by applying selection index theory. Results in terms of accuracies of indices (r(TI) ) and relative selection response indicated that information on single nucleotide polymorphism (SNP) genotypes considerably increases the accuracy of breeding values estimated for young horses without own or progeny performance. In a first scenario, the correlation between the breeding value estimated from the SNP genotype and the true breeding value (= accuracy of GEBV) was fixed to a relatively low value of r(mg) = 0.5. For a low heritability trait (h(2) = 0.15), and an index for a young horse based only on information from both parents, additional genomic information doubles r(TI) from 0.27 to 0.54. Including the conventional information source 'own performance' into the before mentioned index, additional SNP information increases r(TI) by 40%. Thus, particularly with regard to traits of low heritability, genomic information can provide a tool for well-founded selection decisions early in life. In a further approach, different sources of breeding values (e.g. GEBV and estimated breeding values (EBVs) from different countries) were combined into an overall index when altering accuracies of EBVs and correlations between traits. In summary, we showed that genomic selection strategies have the potential to contribute to a substantial reduction in generation intervals in horse breeding programs.
Kulbrock, Maike; Lehner, Stefanie; Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar
Equine recurrent uveitis (ERU) is a common eye disease affecting up to 3–15% of the horse population. A genome-wide association study (GWAS) using the Illumina equine SNP50 bead chip was performed to identify loci conferring risk to ERU. The sample included a total of 144 German warmblood horses. A GWAS showed a significant single nucleotide polymorphism (SNP) on horse chromosome (ECA) 20 at 49.3 Mb, with IL-17A and IL-17F being the closest genes. This locus explained a fraction of 23% of the phenotypic variance for ERU. A GWAS taking into account the severity of ERU, revealed a SNP on ECA18 nearby to the crystalline gene cluster CRYGA-CRYGF. For both genomic regions on ECA18 and 20, significantly associated haplotypes containing the genome-wide significant SNPs could be demonstrated. In conclusion, our results are indicative for a genetic component regulating the possible critical role of IL-17A and IL-17F in the pathogenesis of ERU. The associated SNP on ECA18 may be indicative for cataract formation in the course of ERU. PMID:23977091
Welderufael, B. G.; Løvendahl, Peter; de Koning, Dirk-Jan; Janss, Lucas L. G.; Fikse, W. F.
Because mastitis is very frequent and unavoidable, adding recovery information into the analysis for genetic evaluation of mastitis is of great interest from economical and animal welfare point of view. Here we have performed genome-wide association studies (GWAS) to identify associated single nucleotide polymorphisms (SNPs) and investigate the genetic background not only for susceptibility to – but also for recoverability from mastitis. Somatic cell count records from 993 Danish Holstein cows genotyped for a total of 39378 autosomal SNP markers were used for the association analysis. Single SNP regression analysis was performed using the statistical software package DMU. Substitution effect of each SNP was tested with a t-test and a genome-wide significance level of P-value mastitis were located in or very near to genes that have been reported for their role in the immune system. Genes involved in lymphocyte developments (e.g., MAST3 and STAB2) and genes involved in macrophage recruitment and regulation of inflammations (PDGFD and PTX3) were suggested as possible causal genes for susceptibility to – and recoverability from mastitis, respectively. However, this is the first GWAS study for recoverability from mastitis and our results need to be validated. The findings in the current study are, therefore, a starting point for further investigations in identifying causal genetic variants or chromosomal regions for both susceptibility to – and recoverability from mastitis. PMID:29755506
Daniëlle van Manen
Full Text Available BACKGROUND: AIDS develops typically after 7-11 years of untreated HIV-1 infection, with extremes of very rapid disease progression (15 years. To reveal additional host genetic factors that may impact on the clinical course of HIV-1 infection, we designed a genome-wide association study (GWAS in 404 participants of the Amsterdam Cohort Studies on HIV-1 infection and AIDS. METHODS: The association of SNP genotypes with the clinical course of HIV-1 infection was tested in Cox regression survival analyses using AIDS-diagnosis and AIDS-related death as endpoints. RESULTS: Multiple, not previously identified SNPs, were identified to be strongly associated with disease progression after HIV-1 infection, albeit not genome-wide significant. However, three independent SNPs in the top ten associations between SNP genotypes and time between seroconversion and AIDS-diagnosis, and one from the top ten associations between SNP genotypes and time between seroconversion and AIDS-related death, had P-values smaller than 0.05 in the French Genomics of Resistance to Immunodeficiency Virus cohort on disease progression. CONCLUSIONS: Our study emphasizes that the use of different phenotypes in GWAS may be useful to unravel the full spectrum of host genetic factors that may be associated with the clinical course of HIV-1 infection.
van Manen, Daniëlle; Delaneau, Olivier; Kootstra, Neeltje A.; Boeser-Nunnink, Brigitte D.; Limou, Sophie; Bol, Sebastiaan M.; Burger, Judith A.; Zwinderman, Aeilko H.; Moerland, Perry D.; van 't Slot, Ruben; Zagury, Jean-François; van 't Wout, Angélique B.; Schuitemaker, Hanneke
Background AIDS develops typically after 7–11 years of untreated HIV-1 infection, with extremes of very rapid disease progression (15 years). To reveal additional host genetic factors that may impact on the clinical course of HIV-1 infection, we designed a genome-wide association study (GWAS) in 404 participants of the Amsterdam Cohort Studies on HIV-1 infection and AIDS. Methods The association of SNP genotypes with the clinical course of HIV-1 infection was tested in Cox regression survival analyses using AIDS-diagnosis and AIDS-related death as endpoints. Results Multiple, not previously identified SNPs, were identified to be strongly associated with disease progression after HIV-1 infection, albeit not genome-wide significant. However, three independent SNPs in the top ten associations between SNP genotypes and time between seroconversion and AIDS-diagnosis, and one from the top ten associations between SNP genotypes and time between seroconversion and AIDS-related death, had P-values smaller than 0.05 in the French Genomics of Resistance to Immunodeficiency Virus cohort on disease progression. Conclusions Our study emphasizes that the use of different phenotypes in GWAS may be useful to unravel the full spectrum of host genetic factors that may be associated with the clinical course of HIV-1 infection. PMID:21811574
Boichard, Didier A; Boussaha, Mekki; Capitan, Aurélien
This article presents the strategy to evaluate candidate mutations underlying QTL or responsible for genetic defects, based upon the design and large-scale use of the Eurogenomics custom SNP chip set up for bovine genomic selection. Some variants under study originated from mapping genetic defect...
Ai, XianTao; Liang, YaJun; Wang, JunDuo; Zheng, JuYun; Gong, ZhaoLong; Guo, JiangPing; Li, XueYuan; Qu, YanYing
Cotton (Gossypium spp.) is the most important natural textile fiber crop, and Gossypium hirsutum L. is responsible for 90% of the annual cotton crop in the world. Information on cotton genetic diversity and population structure is essential for new breeding lines. In this study, we analyzed population structure and genetic diversity of 288 elite Gossypium hirsutum cultivar accessions collected from around the world, and especially from China, using genome-wide single nucleotide polymorphisms (SNP) markers. The average polymorphsim information content (PIC) was 0.25, indicating a relatively low degree of genetic diversity. Population structure analysis revealed extensive admixture and identified three subgroups. Phylogenetic analysis supported the subgroups identified by STRUCTURE. The results from both population structure and phylogenetic analysis were, for the most part, in agreement with pedigree information. Analysis of molecular variance revealed a larger amount of variation was due to diversity within the groups. Establishment of genetic diversity and population structure from this study could be useful for genetic and genomic analysis and systematic utilization of the standing genetic variation in upland cotton.
Full Text Available In Brassica napus breeding, traits related to commercial success are of highest importance for plant breeders. However, such traits can only be assessed in an advanced developmental stage. % as well as require high experimental effort due to their quantitative inheritance and the importance of genotype*environment interaction. Molecular markers genetically linked to such traits have the potential to accelerate the breeding process of B. napus by marker-assisted selection. Therefore, the objectives of this study were to identify (i genome regions associated with the examined agronomic and seed quality traits, (ii the interrelationship of population structure and the detected associations, and (iii candidate genes for the revealed associations. The diversity set used in this study consisted of 405 Brassica napus inbred lines which were genotyped using a 6K single nucleotide polymorphism (SNP array and phenotyped for agronomic and seed quality traits in field trials. In a genome-wide association study, we detected a total of 112 associations between SNPs and the seed quality traits as well as 46 SNP-trait associations for the agronomic traits with a P-value 100 and a sequence identity of > 70 % to A. thaliana or B. rapa could be found for the agronomic SNP-trait associations and 187 hits of potential candidate genes for the seed quality SNP-trait associations.
Theodore S. Kalbfleisch
Full Text Available Background: Moose (Alces alces colonized the North American continent from Asia less than 15,000 years ago, and spread across the boreal forest regions of Canada and the northern United States (US. Contemporary populations have low genetic diversity, due either to low number of individuals in the original migration (founder effect, and/or subsequent population bottlenecks in North America. Genetic tests based on informative single nucleotide polymorphism (SNP markers are helpful in forensic and wildlife conservation activities, but have been difficult to develop for moose, due to the lack of a reference genome assembly and whole genome sequence (WGS data. Methods: WGS data were generated for four individual moose from the US states of Alaska, Idaho, Wyoming, and Vermont with minimum and average genome coverage depths of 14- and 19-fold, respectively. Cattle and sheep reference genomes were used for aligning sequence reads and identifying moose SNPs. Results: Approximately 11% and 9% of moose WGS reads aligned to cattle and sheep genomes, respectively. The reads clustered at genomic segments, where sequence identity between these species was greater than 95%. In these segments, average mapped read depth was approximately 19-fold. Sets of 46,005 and 36,934 high-confidence SNPs were identified from cattle and sheep comparisons, respectively, with 773 and 552 of those having minor allele frequency of 0.5 and conserved flanking sequences in all three species. Among the four moose, heterozygosity and allele sharing of SNP genotypes were consistent with decreasing levels of moose genetic diversity from west to east. A minimum set of 317 SNPs, informative across all four moose, was selected as a resource for future SNP assay design. Conclusions: All SNPs and associated information are available, without restriction, to support development of SNP-based tests for animal identification, parentage determination, and estimating
Full Text Available The purpose of this study was to compare results obtained from various methodologies for genome-wide association studies, when applied to real data, in terms of number and commonality of regions identified and their genetic variance explained, computational speed, and possible pitfalls in interpretations of results. Methodologies include: two iteratively reweighted single-step genomic BLUP procedures (ssGWAS1 and ssGWAS2, a single-marker model (CGWAS, and BayesB. The ssGWAS methods utilize genomic breeding values (GEBVs based on combined pedigree, genomic and phenotypic information, while CGWAS and BayesB only utilize phenotypes from genotyped animals or pseudo-phenotypes. In this study, ssGWAS was performed by converting GEBVs to SNP marker effects. Unequal variances for markers were incorporated for calculating weights into a new genomic relationship matrix. SNP weights were refined iteratively. The data was body weight at 6 weeks on 274,776 broiler chickens, of which 4553 were genotyped using a 60k SNP chip. Comparison of genomic regions was based on genetic variances explained by local SNP regions (20 SNPs. After 3 iterations, the noise was greatly reduced of ssGWAS1 and results are similar to that of CGWAS, with 4 out of the top 10 regions in common. In contrast, for BayesB, the plot was dominated by a single region explaining 23.1% of the genetic variance. This same region was found by ssGWAS1 with the same rank, but the amount of genetic variation attributed to the region was only 3%. These finding emphasize the need for caution when comparing and interpreting results from various methods, and highlight that detected associations, and strength of association, strongly depends on methodologies and details of implementations. BayesB appears to overly shrink regions to zero, while overestimating the amount of genetic variation attributed to the remaining SNP effects. The real world is most likely a compromise between methods and remains to
Antanaviciute, Laima; Fernández-Fernández, Felicidad; Jansen, Johannes; Banchi, Elisa; Evans, Katherine M; Viola, Roberto; Velasco, Riccardo; Dunwell, Jim M; Troggio, Michela; Sargent, Daniel J
A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the 'Golden Delicious' genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the 'Golden Delicious' pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been
Clive J Hoggart
Full Text Available Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.
Akpinar, Bala Ani; Lucas, Stuart; Budak, Hikmet
Single-nucleotide polymorphisms (SNPs) are the most prevalent type of variation in genomes that are increasingly being used as molecular markers in diversity analyses, mapping and cloning of genes, and germplasm characterization. However, only a few studies reported large-scale SNP discovery in Aegilops tauschii, restricting their potential use as markers for the low-polymorphic D genome. Here, we report 68,592 SNPs found on the gene-related sequences of the 5D chromosome of Ae. tauschii genotype MvGB589 using genomic and transcriptomic sequences from seven Ae. tauschii accessions, including AL8/78, the only genotype for which a draft genome sequence is available at present. We also suggest a workflow to compare SNP positions in homologous regions on the 5D chromosome of Triticum aestivum, bread wheat, to mark single nucleotide variations between these closely related species. Overall, the identified SNPs define a density of 4.49 SNPs per kilobyte, among the highest reported for the genic regions of Ae. tauschii so far. To our knowledge, this study also presents the first chromosome-specific SNP catalog in Ae. tauschii that should facilitate the association of these SNPs with morphological traits on chromosome 5D to be ultimately targeted for wheat improvement.
David G. Riley
Full Text Available Gestation length, birth weight, and weaning weight of F2 Nelore-Angus calves (n = 737 with designed extensive full-sibling and half-sibling relatedness were evaluated for association with 34,957 SNP markers. In analyses of birth weight, random relatedness was modeled three ways: 1 none, 2 random animal, pedigree-based relationship matrix, or 3 random animal, genomic relationship matrix. Detected birth weight-SNP associations were 1,200, 735, and 31 for those parameterizations respectively; each additional model refinement removed associations that apparently were a result of the built-in stratification by relatedness. Subsequent analyses of gestation length and weaning weight modeled genomic relatedness; there were 40 and 26 trait-marker associations detected for those traits, respectively. Birth weight associations were on BTA14 except for a single marker on BTA5. Gestation length associations included 37 SNP on BTA21, 2 on BTA27 and one on BTA3. Weaning weight associations were on BTA14 except for a single marker on BTA10. Twenty-one SNP markers on BTA14 were detected in both birth and weaning weight analyses.
Background Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array. Results SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex. Conclusions This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in
Hsu, Yi-Hsiang; Liu, Youfang; Hannan, Marian T.; Maixner, William; Smith, Shad B.; Diatchenko, Luda; Golightly, Yvonne M.; Menz, Hylton B.; Kraus, Virginia B.; Doherty, Michael; Wilson, A.G.; Jordan, Joanne M.
Objective Hallux valgus (HV) affects ~36% of Caucasian adults. Although considered highly heritable, the underlying genetic determinants are unclear. We conducted the first genome-wide association study (GWAS) aimed to identify genetic variants associated with HV. Methods HV was assessed in 3 Caucasian cohorts (n=2,263, n=915, and n=1,231 participants, respectively). In each cohort, a GWAS was conducted using 2.5M imputed single nucleotide polymorphisms (SNPs). Mixed-effect regression with the additive genetic model adjusted for age, sex, weight and within-family correlations was used for both sex-specific and combined analyses. To combine GWAS results across cohorts, fixed-effect inverse-variance meta-analyses were used. Following meta-analyses, top-associated findings were also examined in an African American cohort (n=327). Results The proportion of HV variance explained by genome-wide genotyped SNPs was 50% in men and 48% in women. A higher proportion of genetic determinants of HV was sex-specific. The most significantly associated SNP in men was rs9675316 located on chr17q23-a24 near the AXIN2 gene (p=5.46×10−7); the most significantly associated SNP in women was rs7996797 located on chr13q14.1-q14.2 near the ESD gene (p=7.21×10−7). Genome-wide significant SNP-by-sex interaction was found for SNP rs1563374 located on chr11p15.1 near the MRGPRX3 gene (interaction p-value =4.1×10−9). The association signals diminished when combining men and women. Conclusion Findings suggest that the potential pathophysiological mechanisms of HV are complex and strongly underlined by sex-specific interactions. The identified genetic variants imply contribution of biological pathways observed in osteoarthritis as well as new pathways, influencing skeletal development and inflammation. PMID:26337638
Full Text Available Genome-wide association studies (GWASs have identified low-penetrance common variants (i.e., single nucleotide polymorphisms, SNPs associated with breast cancer susceptibility. Although GWASs are primarily focused on single-locus effects, gene-gene interactions (i.e., epistasis are also assumed to contribute to the genetic risks for complex diseases including breast cancer. While it has been hypothesized that moderately ranked (P value based weak single-locus effects in GWASs could potentially harbor valuable information for evaluating epistasis, we lack systematic efforts to investigate SNPs showing consistent associations with weak statistical significance across independent discovery and replication stages. The objectives of this study were i to select SNPs showing single-locus effects with weak statistical significance for breast cancer in a GWAS and/or candidate-gene studies; ii to replicate these SNPs in an independent set of breast cancer cases and controls; and iii to explore their potential SNP-SNP interactions contributing to breast cancer susceptibility. A total of 17 SNPs related to DNA repair, modification and metabolism pathway genes were selected since these pathways offer a priori knowledge for potential epistatic interactions and an overall role in breast carcinogenesis. The study design included predominantly Caucasian women (2,795 cases and 4,505 controls from Alberta, Canada. We observed two two-way SNP-SNP interactions (APEX1-rs1130409 and RPAP1-rs2297381; MLH1-rs1799977 and MDM2-rs769412 in logistic regression that conferred elevated risks for breast cancer (P(interaction<7.3 × 10(-3. Logic regression identified an interaction involving four SNPs (MBD2-rs4041245, MLH1-rs1799977, MDM2-rs769412, BRCA2-rs1799943 (P(permutation = 2.4 × 10(-3. SNPs involved in SNP-SNP interactions also showed single-locus effects with weak statistical significance, while BRCA2-rs1799943 showed stronger statistical significance (P
Yoshimura, Yoshinaga; Ohtake, Tomoko; Okada, Hajime; Fujimoto, Kenzo [School of Materials Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292 (Japan); Ami, Takehiro [Innovation Plaza Ishikawa, Japan Science and Technology Agency, 2-13 Asahidai, Nomi, Ishikawa 923-1211 (Japan); Tsukaguchi, Tadashi, E-mail: firstname.lastname@example.org [Faculty of Bioresources and Environmental Sciences, Ishikawa Prefectural University, 1-308 Suematsu, Nonoichi, Ishikawa 921-8836 (Japan)
We describe a simple and inexpensive single-nucleotide polymorphism (SNP) typing method, using DNA photoligation with 5-carboxyvinyl-2'-deoxyuridine and two fluorophores. This SNP-typing method facilitates qualitative determination of genes from indica and japonica rice, and showed a high degree of single nucleotide specificity up to 10 000. This method can be used in the SNP typing of actual genomic DNA samples from food crops.
Yoshinaga Yoshimura, Tomoko Ohtake, Hajime Okada, Takehiro Ami, Tadashi Tsukaguchi and Kenzo Fujimoto
Full Text Available We describe a simple and inexpensive single-nucleotide polymorphism (SNP typing method, using DNA photoligation with 5-carboxyvinyl-2'-deoxyuridine and two fluorophores. This SNP-typing method facilitates qualitative determination of genes from indica and japonica rice, and showed a high degree of single nucleotide specificity up to 10 000. This method can be used in the SNP typing of actual genomic DNA samples from food crops.
Zhan, Bujie; Fadista, João; Thomsen, Bo
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome...... of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation...
Luo, Li; Zhu, Yun; Xiong, Momiao
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Sawayama, Eitaro; Noguchi, Daiki; Nakayama, Kei; Takagi, Motohiro
We previously reported a body color deformity in juvenile red sea bream, which shows transparency in the juvenile stage because of delayed chromatophore development compared with normal individuals, and this finding suggested a genetic cause based on parentage assessments. To conduct marker-assisted selection to eliminate broodstock inheriting the causative gene, developing DNA markers associated with the phenotype was needed. We first conducted SNP mining based on AFLP analysis using bulked-DNA from normal and transparent individuals. One SNP was identified from a transparent-specific AFLP fragment, which significantly associated with transparent individuals. Two alleles (A/G) were observed in this locus, and the genotype G/G was dominantly observed in the transparent groups (97.1%) collected from several production lots produced from different broodstock populations. A few normal individuals inherited the G/G genotype (5.0%), but the A/A and A/G genotypes were dominantly observed in the normal groups. The homologs region of the SNP was searched using a medaka genome database, and intron 12 of the Nell2a gene (located on chromosome 6 of the medaka genome) was highly matched. We also mapped the red sea bream Nell2a gene on the previously developed linkage maps, and this gene was mapped on a male linkage group, LG4-M. The newly found SNP was useful in eliminating broodstock possessing the causative gene of the body color transparency observed in juvenile stage of red sea bream.
Jason W Sahl
Full Text Available Burkholderia pseudomallei is the causative agent of melioidosis and a potential bioterrorism agent. In the development of medical countermeasures against B. pseudomallei infection, the US Food and Drug Administration (FDA animal Rule recommends using well-characterized strains in animal challenge studies. In this study, whole genome sequence data were generated for 6 B. pseudomallei isolates previously identified as candidates for animal challenge studies; an additional 5 isolates were sequenced that were associated with human inhalational melioidosis. A core genome single nucleotide polymorphism (SNP phylogeny inferred from a concatenated SNP alignment from the 11 isolates sequenced in this study and a diverse global collection of isolates demonstrated the diversity of the proposed Animal Rule isolates. To understand the genomic composition of each isolate, a large-scale blast score ratio (LS-BSR analysis was performed on the entire pan-genome; this demonstrated the variable composition of genes across the panel and also helped to identify genes unique to individual isolates. In addition, a set of ~550 genes associated with pathogenesis in B. pseudomallei were screened against the 11 sequenced genomes with LS-BSR. Differential gene distribution for 54 virulence-associated genes was observed between genomes and three of these genes were correlated with differential virulence observed in animal challenge studies using BALB/c mice. Differentially conserved genes and SNPs associated with disease severity were identified and could be the basis for future studies investigating the pathogenesis of B. pseudomallei. Overall, the genetic characterization of the 11 proposed Animal Rule isolates provides context for future studies involving B. pseudomallei pathogenesis, differential virulence, and efficacy to therapeutics.
Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar
Equine guttural pouch tympany (GPT) is a hereditary condition affecting foals in their first months of life. Complex segregation analyses in Arabian and German warmblood horses showed the involvement of a major gene as very likely. Genome-wide linkage and association analyses including a high density marker set of single nucleotide polymorphisms (SNPs) were performed to map the genomic region harbouring the potential major gene for GPT. A total of 85 Arabian and 373 German warmblood horses were genotyped on the Illumina equine SNP50 beadchip. Non-parametric multipoint linkage analyses showed genome-wide significance on horse chromosomes (ECA) 3 for German warmblood at 16–26 Mb and 34–55 Mb and for Arabian on ECA15 at 64–65 Mb. Genome-wide association analyses confirmed the linked regions for both breeds. In Arabian, genome-wide association was detected at 64 Mb within the region with the highest linkage peak on ECA15. For German warmblood, signals for genome-wide association were close to the peak region of linkage at 52 Mb on ECA3. The odds ratio for the SNP with the highest genome-wide association was 0.12 for the Arabian. In conclusion, the refinement of the regions with the Illumina equine SNP50 beadchip is an important step to unravel the responsible mutations for GPT. PMID:22848553
Full Text Available Abstract Background Specific genetic contributions for preeclampsia (PE are currently unknown. This genome-wide association study (GWAS aims to identify maternal single nucleotide polymorphisms (SNPs and copy-number variants (CNVs involved in the etiology of PE. Methods A genome-wide scan was performed on 177 PE cases (diagnosed according to National Heart, Lung and Blood Institute guidelines and 116 normotensive controls. White female study subjects from Iowa were genotyped on Affymetrix SNP 6.0 microarrays. CNV calls made using a combination of four detection algorithms (Birdseye, Canary, PennCNV, and QuantiSNP were merged using CNVision and screened with stringent prioritization criteria. Due to limited DNA quantities and the deleterious nature of copy-number deletions, it was decided a priori that only deletions would be selected for assay on the entire case-control dataset using quantitative real-time PCR. Results The top four SNP candidates had an allelic or genotypic p-value between 10-5 and 10-6, however, none surpassed the Bonferroni-corrected significance threshold. Three recurrent rare deletions meeting prioritization criteria detected in multiple cases were selected for targeted genotyping. A locus of particular interest was found showing an enrichment of case deletions in 19q13.31 (5/169 cases and 1/114 controls, which encompasses the PSG11 gene contiguous to a highly plastic genomic region. All algorithm calls for these regions were assay confirmed. Conclusions CNVs may confer risk for PE and represent interesting regions that warrant further investigation. Top SNP candidates identified from the GWAS, although not genome-wide significant, may be useful to inform future studies in PE genetics.
Stephanie N Lewis
Full Text Available Genome wide association studies (GWAS have proven useful as a method for identifying genetic variations associated with diseases. In this study, we analyzed GWAS data for 61 diseases and phenotypes to elucidate common associations based on single nucleotide polymorphisms (SNP. The study was an expansion on a previous study on identifying disease associations via data from a single GWAS on seven diseases.Adjustments to the originally reported study included expansion of the SNP dataset using Linkage Disequilibrium (LD and refinement of the four levels of analysis to encompass SNP, SNP block, gene, and pathway level comparisons. A pair-wise comparison between diseases and phenotypes was performed at each level and the Jaccard similarity index was used to measure the degree of association between two diseases/phenotypes. Disease relatedness networks (DRNs were used to visualize our results. We saw predominant relatedness between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis for the first three levels of analysis. Expected relatedness was also seen between lipid- and blood-related traits.The predominant associations between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis can be validated by clinical studies. The diseases have been proposed to share a systemic inflammation phenotype that can result in progression of additional diseases in patients with one of these three diseases. We also noticed unexpected relationships between metabolic and neurological diseases at the pathway comparison level. The less significant relationships found between diseases require a more detailed literature review to determine validity of the predictions. The results from this study serve as a first step towards a better understanding of seemingly unrelated diseases and phenotypes with similar symptoms or modes of treatment.
Lomonaco, Sara; Furumoto, Emily J; Loquasto, Joseph R; Morra, Patrizia; Grassi, Ausilia; Roberts, Robert F
Identification at the genus, species, and strain levels is desirable when a probiotic microorganism is added to foods. Strains of Bifidobacterium animalis ssp. lactis (BAL) are commonly used worldwide in dairy products supplemented with probiotic strains. However, strain discrimination is difficult because of the high degree of genome identity (99.975%) between different genomes of this subspecies. Typing of monomorphic species can be carried out efficiently by targeting informative single nucleotide polymorphisms (SNP). Findings from a previous study analyzing both reference and commercial strains of BAL identified SNP that could be used to discriminate common strains into 8 groups. This paper describes development of a minisequencing assay based on the primer extension reaction (PER) targeting multiple SNP that can allow strain differentiation of BAL. Based on previous data, 6 informative SNP were selected for further testing, and a multiplex preliminary PCR was optimized to amplify the DNA regions containing the selected SNP. Extension primers (EP) annealing immediately adjacent to the selected SNP were developed and tested in simplex and multiplex PER to evaluate their performance. Twenty-five strains belonging to 9 distinct genomic clusters of B. animalis ssp. lactis were selected and analyzed using the developed minisequencing assay, simultaneously targeting the 6 selected SNP. Fragment analysis was subsequently carried out in duplicate and demonstrated that the assay yielded 8 specific profiles separating the most commonly used commercial strains. This novel multiplex PER approach provides a simple, rapid, flexible SNP-based subtyping method for proper characterization and identification of commercial probiotic strains of BAL from fermented dairy products. To assess the usefulness of this method, DNA was extracted from yogurt manufactured with and without the addition of B. animalis ssp. lactis BB-12. Extracted DNA was then subjected to the minisequencing
Full Text Available Abstract With ever-increasing numbers of microbial genomes being sequenced, efficient tools are needed to perform strain-level identification of any newly sequenced genome. Here, we present the SNP identification for strain typing (SNIT pipeline, a fast and accurate software system that compares a newly sequenced bacterial genome with other genomes of the same species to identify single nucleotide polymorphisms (SNPs and small insertions/deletions (indels. Based on this information, the pipeline analyzes the polymorphic loci present in all input genomes to identify the genome that has the fewest differences with the newly sequenced genome. Similarly, for each of the other genomes, SNIT identifies the input genome with the fewest differences. Results from five bacterial species show that the SNIT pipeline identifies the correct closest neighbor with 75% to 100% accuracy. The SNIT pipeline is available for download at http://www.bhsai.org/snit.html
Tosser-klopp, G.; Bardou, P.; Bouchez, O.; Cabau, C.; Crooijmans, R.P.M.A.; Dong, Y.; Donnadieu-Tonon, C.; Eggen, A.; Heuven, H.C.M.; Jamli, S.; Jiken, A.J.; Klopp, C.; Lawley, C.T.; McEwen, J.; Martin, P.; Moreno, C.R.; Mulsant, P.; Nabihoudine, I.; Pailhoux, E.; Palhiere, I.; Rupp, R.; Sarry, J.; Sayre, B.L.; Tircazes, A.; Wang, J.; Wang, W.; Zhang, W.G.
The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a
Donninger, Howard; Barnoud, Thibaut; Nelson, Nick; Kassler, Suzanna; Clark, Jennifer; Cummins, Timothy D.; Powell, David W.; Nyante, Sarah; Millikan, Robert C.; Clark, Geoffrey J.
RASSF1A is one of the most frequently inactivated tumor suppressors yet identified in human cancer. It is pro-apoptotic and appears to function as a scaffolding protein that interacts with a variety of other tumor suppressors to modulate their function. It can also complex with the Ras oncoprotein and may serve to integrate pro-growth and pro-death signaling pathways. A SNP has been identified that is present in approximately 29% of European populations [rs2073498, A(133)S]. Several studies have now presented evidence that this SNP is associated with an enhanced risk of developing breast cancer. We have used a proteomics based approach to identify multiple differences in the pattern of protein/protein interactions mediated by the wild type compared to the SNP variant protein. We have also identified a significant difference in biological activity between wild type and SNP variant protein. However, we have found only a very modest association of the SNP with breast cancer predisposition.
Full Text Available Abstract Background We have engaged in an international program designated the Bank On A Cure, which has established DNA banks from multiple cooperative and institutional clinical trials, and a platform for examining the association of genetic variations with disease risk and outcomes in multiple myeloma. We describe the development and content of a novel custom SNP panel that contains 3404 SNPs in 983 genes, representing cellular functions and pathways that may influence disease severity at diagnosis, toxicity, progression or other treatment outcomes. A systematic search of national databases was used to identify non-synonymous coding SNPs and SNPs within transcriptional regulatory regions. To explore SNP associations with PFS we compared SNP profiles of short term (less than 1 year, n = 70 versus long term progression-free survivors (greater than 3 years, n = 73 in two phase III clinical trials. Results Quality controls were established, demonstrating an accurate and robust screening panel for genetic variations, and some initial racial comparisons of allelic variation were done. A variety of analytical approaches, including machine learning tools for data mining and recursive partitioning analyses, demonstrated predictive value of the SNP panel in survival. While the entire SNP panel showed genotype predictive association with PFS, some SNP subsets were identified within drug response, cellular signaling and cell cycle genes. Conclusion A targeted gene approach was undertaken to develop an SNP panel that can test for associations with clinical outcomes in myeloma. The initial analysis provided some predictive power, demonstrating that genetic variations in the myeloma patient population may influence PFS.
Hong, Joon Ki; Jeong, Yong Dae; Cho, Eun Seok; Choi, Tae Jeong; Kim, Yong Min; Cho, Kyu Ho; Lee, Jae Bong; Lim, Hyun Tae; Lee, Deuk Hwan
The genetic effects of an individual on the phenotypes of its social partners, such as its pen mates, are known as social genetic effects. This study aims to identify the candidate genes for social (pen-mates') average daily gain (ADG) in pigs by using the genome-wide association approach. Social ADG (sADG) was the average ADG of unrelated pen-mates (strangers). We used the phenotype data (16,802 records) after correcting for batch (week), sex, pen, number of strangers (1 to 7 pigs) in the pen, full-sib rate (0% to 80%) within pen, and age at the end of the test. A total of 1,041 pigs from Landrace breeds were genotyped using the Illumina PorcineSNP60 v2 BeadChip panel, which comprised 61,565 single nucleotide polymorphism (SNP) markers. After quality control, 909 individuals and 39,837 markers remained for sADG in genome-wide association study. We detected five new SNPs, all on chromosome 6, which have not been associated with social ADG or other growth traits to date. One SNP was inside the prostaglandin F2α receptor ( PTGFR ) gene, another SNP was located 22 kb upstream of gene interferon-induced protein 44 ( IFI44 ), and the last three SNPs were between 161 kb and 191 kb upstream of the EGF latrophilin and seven transmembrane domain-containing protein 1 ( ELTD1 ) gene. PTGFR, IFI44, and ELTD1 were never associated with social interaction and social genetic effects in any of the previous studies. The identification of several genomic regions, and candidate genes associated with social genetic effects reported here, could contribute to a better understanding of the genetic basis of interaction traits for ADG. In conclusion, we suggest that the PTGFR, IFI44, and ELTD1 may be used as a molecular marker for sADG, although their functional effect was not defined yet. Thus, it will be of interest to execute association studies in those genes.
Full Text Available Methanopyrus spp. are usually isolated from harsh niches, such as high osmotic pressure and extreme temperature. However, the molecular mechanisms for their environmental adaption are poorly understood. Archaeal species is commonly considered as primitive organism. The evolutional placement of archaea is a fundamental and intriguing scientific question. We sequenced the genomes of Methanopyrus strains SNP6 and KOL6 isolated from the Atlantic and Iceland, respectively. Comparative genomic analysis revealed genetic diversity and instability implicated in niche adaption, including a number of transporter- and integrase/transposase-related genes. Pan-genome analysis also defined the gene pool of Methanopyrus spp., in addition of ~120-Kb genomic region of plasticity impacting cognate genomic architecture. We believe that Methanopyrus genomics could facilitate efficient investigation/recognition of archaeal phylogenetic diverse patterns, as well as improve understanding of biological roles and significance of these versatile microbes.
Full Text Available The analysis of next-generation sequence (NGS data is often a fragmented step-wise process. For example, multiple pieces of software are typically needed to map NGS reads, extract variant sites, and construct a DNA sequence matrix containing only single nucleotide polymorphisms (i.e., a SNP matrix for a set of individuals. The management and chaining of these software pieces and their outputs can often be a cumbersome and difficult task. Here, we present CFSAN SNP Pipeline, which combines into a single package the mapping of NGS reads to a reference genome with Bowtie2, processing of those mapping (BAM files using SAMtools, identification of variant sites using VarScan, and production of a SNP matrix using custom Python scripts. We also introduce a Python package (CFSAN SNP Mutator that when given a reference genome will generate variants of known position against which we validate our pipeline. We created 1,000 simulated Salmonella enterica sp. enterica Serovar Agona genomes at 100× and 20× coverage, each containing 500 SNPs, 20 single-base insertions and 20 single-base deletions. For the 100× dataset, the CFSAN SNP Pipeline recovered 98.9% of the introduced SNPs and had a false positive rate of 1.04 × 10−6; for the 20× dataset 98.8% of SNPs were recovered and the false positive rate was 8.34 × 10−7. Based on these results, CFSAN SNP Pipeline is a robust and accurate tool that it is among the first to combine into a single executable the myriad steps required to produce a SNP matrix from NGS data. Such a tool is useful to those working in an applied setting (e.g., food safety traceback investigations as well as for those interested in evolutionary questions.
Wu, Xiaoping; Fang, Ming; Liu, Lin
.Results: The Illumina BovineSNP50 BeadChip was used to identify single nucleotide polymorphisms (SNPs) that are associated with body conformation traits. A least absolute shrinkage and selection operator (LASSO) was applied to detect multiple SNPs simultaneously for 29 body conformation traits with 1,314 Chinese...... Holstein cattle and 52,166 SNPs. Totally, 59 genome-wide significant SNPs associated with 26 conformation traits were detected by genome-wide association analysis; five SNPs were within previously reported QTL regions (Animal Quantitative Trait Loci (QTL) database) and 11 were very close to the reported...... SNPs. Twenty-two SNPs were located within annotated gene regions, while the remainder were 0.6-826 kb away from known genes. Some of the genes had clear biological functions related to conformation traits. By combining information about the previously reported QTL regions and the biological functions...
Full Text Available Abstract Background PCR-restriction fragment length polymorphism (RFLP assay is a cost-effective method for SNP genotyping and mutation detection, but the manual mining for restriction enzyme sites is challenging and cumbersome. Three years after we constructed SNP-RFLPing, a freely accessible database and analysis tool for restriction enzyme mining of SNPs, significant improvements over the 2006 version have been made and incorporated into the latest version, SNP-RFLPing 2. Results The primary aim of SNP-RFLPing 2 is to provide comprehensive PCR-RFLP information with multiple functionality about SNPs, such as SNP retrieval to multiple species, different polymorphism types (bi-allelic, tri-allelic, tetra-allelic or indels, gene-centric searching, HapMap tagSNPs, gene ontology-based searching, miRNAs, and SNP500Cancer. The RFLP restriction enzymes and the corresponding PCR primers for the natural and mutagenic types of each SNP are simultaneously analyzed. All the RFLP restriction enzyme prices are also provided to aid selection. Furthermore, the previously encountered updating problems for most SNP related databases are resolved by an on-line retrieval system. Conclusions The user interfaces for functional SNP analyses have been substantially improved and integrated. SNP-RFLPing 2 offers a new and user-friendly interface for RFLP genotyping that can be used in association studies and is freely available at http://bio.kuas.edu.tw/snp-rflping2.
Chang, Hsueh-Wei; Cheng, Yu-Huei; Chuang, Li-Yeh; Yang, Cheng-Hong
PCR-restriction fragment length polymorphism (RFLP) assay is a cost-effective method for SNP genotyping and mutation detection, but the manual mining for restriction enzyme sites is challenging and cumbersome. Three years after we constructed SNP-RFLPing, a freely accessible database and analysis tool for restriction enzyme mining of SNPs, significant improvements over the 2006 version have been made and incorporated into the latest version, SNP-RFLPing 2. The primary aim of SNP-RFLPing 2 is to provide comprehensive PCR-RFLP information with multiple functionality about SNPs, such as SNP retrieval to multiple species, different polymorphism types (bi-allelic, tri-allelic, tetra-allelic or indels), gene-centric searching, HapMap tagSNPs, gene ontology-based searching, miRNAs, and SNP500Cancer. The RFLP restriction enzymes and the corresponding PCR primers for the natural and mutagenic types of each SNP are simultaneously analyzed. All the RFLP restriction enzyme prices are also provided to aid selection. Furthermore, the previously encountered updating problems for most SNP related databases are resolved by an on-line retrieval system. The user interfaces for functional SNP analyses have been substantially improved and integrated. SNP-RFLPing 2 offers a new and user-friendly interface for RFLP genotyping that can be used in association studies and is freely available at http://bio.kuas.edu.tw/snp-rflping2.
Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo
Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.
Full Text Available Egg number (EN, egg laying rate (LR and age at first egg (AFE are important production traits related to egg production in poultry industry. To better understand the knowledge of genetic architecture of dynamic EN during the whole laying cycle and provide the precise positions of associated variants for EN, LR and AFE, laying records from 21 to 72 weeks of age were collected individually for 1,534 F2 hens produced by reciprocal crosses between White Leghorn and Dongxiang Blue-shelled chicken, and their genotypes were assayed by chicken 600 K Affymetrix high density genotyping arrays. Subsequently, pedigree and SNP-based genetic parameters were estimated and a genome-wide association study (GWAS was conducted on EN, LR and AFE. The heritability estimates were similar between pedigree and SNP-based estimates varying from 0.17 to 0.36. In the GWA analysis, we identified nine genome-wide significant loci associated with EN of the laying periods from 21 to 26 weeks, 27 to 36 weeks and 37 to 72 weeks. Analysis of GTF2A1 and CLSPN suggested that they influenced the function of ovary and uterus, and may be considered as relevant candidates. The identified SNP rs314448799 for accumulative EN from 21 to 40 weeks on chromosome 5 created phenotypic differences of 6.86 eggs between two homozygous genotypes, which could be potentially applied to the molecular breeding for EN selection. Moreover, our finding showed that LR was a moderate polygenic trait. The suggestive significant region on chromosome 16 for AFE suggested the relationship between sex maturity and immune in the current population. The present study comprehensively evaluates the role of genetic variants in the development of egg laying. The findings will be helpful to investigation of causative genes function and future marker-assisted selection and genomic selection in chickens.
Full Text Available Egg weight (EW is an economically-important trait and displays a consecutive increase with the hen's age. Because extremely large eggs cause a range of problems in the poultry industry, we performed a genome-wide association study (GWAS in order to identify genomic variations that are associated with EW. We utilized the Affymetrix 600 K high density SNP array in a population of 1,078 hens at seven time points from day at first egg to 80 weeks age (EW80. Results reveal that a 90 Kb genomic region (169.42 Mb ~ 169.51 Mb in GGA1 is significantly associated with EW36 and is also potentially associated with egg weight at 28, 56, and 66 week of age. The leading SNP could account for 3.66% of the phenotypic variation, while two promising genes (DLEU7 and MIR15A can be mapped to this narrow significant region and may affect EW in a pleiotropic manner. In addition, one gene (CECR2 on GGA1 and two genes (MEIS1 and SPRED2 on GGA3, which involved in the processes of embryogenesis and organogenesis, were also considered to be candidates related to first egg weight (FEW and EW56, respectively. Findings in our study could provide worthy theoretical basis to generate eggs of ideal size based on marker assisted breeding selection.
Buitenhuis, Bart; Janss, Luc L G; Poulsen, Nina Aagaard
provide new possibilities to change the milk fat composition by selective breeding. In this study a genome wide association scan (GWAS) in the DH and DJ was performed for a detailed milk fatty acid (FA) profile using the HD bovine SNP array and subsequently a biological pathway analysis based on the SNP...
Manivannan, Abinaya; Kim, Jin-Hee; Yang, Eun-Young; Ahn, Yul-Kyun; Lee, Eun-Su; Choi, Sena; Kim, Do-Sun
Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS) approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP) indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.
Full Text Available Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.
Brym, P; Bojarojć-Nosowicz, B; Oleński, K; Hering, D M; Ruść, A; Kaczmarczyk, E; Kamiński, S
The mechanisms of leukemogenesis induced by bovine leukemia virus (BLV) and the processes underlying the phenomenon of differential host response to BLV infection still remain poorly understood. The aim of the study was to screen the entire cattle genome to identify markers and candidate genes that might be involved in host response to bovine leukemia virus infection. A genome-wide association study was performed using Holstein cows naturally infected by BLV. A data set included 43 cows (BLV positive) and 30 cows (BLV negative) genotyped for 54,609 SNP markers (Illumina Bovine SNP50 BeadChip). The BLV status of cows was determined by serum ELISA, nested-PCR and hematological counts. Linear Regression Analysis with a False Discovery Rate and kinship matrix (computed on the autosomal SNPs) was calculated to find out which SNP markers significantly differentiate BLV-positive and BLV-negative cows. Nine markers reached genome-wide significance. The most significant SNPs were located on chromosomes 23 (rs41583098), 3 (rs109405425, rs110785500) and 8 (rs43564499) in close vicinity of a patatin-like phospholipase domain containing 1 (PNPLA1); adaptor-related protein complex 4, beta 1 subunit (AP4B1); tripartite motif-containing 45 (TRIM45) and cell division cycle associated 2 (CDCA2) genes, respectively. Furthermore, a list of 41 candidate genes was composed based on their proximity to significant markers (within a distance of ca. 1 Mb) and functional involvement in processes potentially underlying BLV-induced pathogenesis. In conclusion, it was demonstrated that host response to BLV infection involves nine sub-regions of the cattle genome (represented by 9 SNP markers), containing many genes which, based on the literature, could be involved to enzootic bovine leukemia progression. New group of promising candidate genes associated with the host response to BLV infection were identified and could therefore be a target for future studies. The functions of candidate genes
Charlier, Carole; Coppieters, Wouter; Rollin, Frédéric
The widespread use of elite sires by means of artificial insemination in livestock breeding leads to the frequent emergence of recessive genetic defects, which cause significant economic and animal welfare concerns. Here we show that the availability of genome-wide, high-density SNP panels, combi...
Brynildsrud, Ola; Bohlin, Jon; Scheffer, Lonneke; Eldholm, Vegard
Genome-wide association studies (GWAS) have become indispensable in human medicine and genomics, but very few have been carried out on bacteria. Here we introduce Scoary, an ultra-fast, easy-to-use, and widely applicable software tool that scores the components of the pan-genome for associations to observed phenotypic traits while accounting for population stratification, with minimal assumptions about evolutionary processes. We call our approach pan-GWAS to distinguish it from traditional, single nucleotide polymorphism (SNP)-based GWAS. Scoary is implemented in Python and is available under an open source GPLv3 license at https://github.com/AdmiralenOla/Scoary .
Full Text Available Genome-wide association studies (GWAS using single nucleotide polymorphisms (SNPs have identified more than 50 loci associated with estimated glomerular filtration rate (eGFR, a measure of kidney function. However, significant SNPs account for a small proportion of eGFR variability. Other forms of genetic variation have not been comprehensively evaluated for association with eGFR. In this study, we assess whether changes in germline DNA copy number are associated with GFR estimated from serum creatinine, eGFRcrea. We used hidden Markov models (HMMs to identify copy number polymorphic regions (CNPs from high-throughput SNP arrays for 2,514 African (AA and 8,645 European ancestry (EA participants in the Atherosclerosis Risk in Communities (ARIC study. Separately for the EA and AA cohorts, we used Bayesian Gaussian mixture models to estimate copy number at regions identified by the HMM or previously reported in the HapMap Project. We identified 312 and 464 autosomal CNPs among individuals of EA and AA, respectively. Multivariate models adjusted for SNP-derived covariates of population structure identified one CNP in the EA cohort near genome-wide statistical significance (Bonferroni-adjusted p = 0.067 located on chromosome 5 (876-880kb. Overall, our findings suggest a limited role of CNPs in explaining eGFR variability.
Johnson Andrew D
Full Text Available Abstract Background The number of genome-wide association studies (GWAS is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results. Methods We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS. Results Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., APOE, LPL. At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (SLC16A7, CSMD1, OAS1, suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p -14, a finding
Cronin, Matthew A; Rincon, Gonzalo; Meredith, Robert W; MacNeil, Michael D; Islas-Trejo, Alma; Cánovas, Angela; Medrano, Juan F
We assessed the relationships of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) with high throughput genomic sequencing data with an average coverage of 25× for each species. A total of 1.4 billion 100-bp paired-end reads were assembled using the polar bear and annotated giant panda (Ailuropoda melanoleuca) genome sequences as references. We identified 13.8 million single nucleotide polymorphisms (SNP) in the 3 species aligned to the polar bear genome. These data indicate that polar bears and brown bears share more SNP with each other than either does with black bears. Concatenation and coalescence-based analysis of consensus sequences of approximately 1 million base pairs of ultraconserved elements in the nuclear genome resulted in a phylogeny with black bears as the sister group to brown and polar bears, and all brown bears are in a separate clade from polar bears. Genotypes for 162 SNP loci of 336 bears from Alaska and Montana showed that the species are genetically differentiated and there is geographic population structure of brown and black bears but not polar bears.
Norris Belinda J
Full Text Available Abstract Background Fleece rot (FR and body-strike of Merino sheep by the sheep blowfly Lucilia cuprina are major problems for the Australian wool industry, causing significant losses as a result of increased management costs coupled with reduced wool productivity and quality. In addition to direct effects on fleece quality, fleece rot is a major predisposing factor to blowfly strike on the body of sheep. In order to investigate the genetic drivers of resistance to fleece rot, we constructed a combined ovine-bovine cDNA microarray of almost 12,000 probes including 6,125 skin expressed sequence tags and 5,760 anonymous clones obtained from skin subtracted libraries derived from fleece rot resistant and susceptible animals. This microarray platform was used to profile the gene expression changes between skin samples of six resistant and six susceptible animals taken immediately before, during and after FR induction. Mixed-model equations were employed to normalize the data and 155 genes were found to be differentially expressed (DE. Ten DE genes were selected for validation using real-time PCR on independent skin samples. The genomic regions of a further 5 DE genes were surveyed to identify single nucleotide polymorphisms (SNP that were genotyped across three populations for their associations with fleece rot resistance. Results The majority of the DE genes originated from the fleece rot subtracted libraries and over-representing gene ontology terms included defense response to bacterium and epidermis development, indicating a role of these processes in modulating the sheep's response to fleece rot. We focused on genes that contribute to the physical barrier function of skin, including keratins, collagens, fibulin and lipid proteins, to identify SNPs that were associated to fleece rot scores. Conclusions We identified FBLN1 (fibulin and FABP4 (fatty acid binding protein 4 as key factors in sheep's resistance to fleece rot. Validation of these
Ortega-Alonso, Alfredo; Ekelund, Jesper; Sarin, Antti-Pekka; Miettunen, Jouko; Veijola, Juha; Järvelin, Marjo-Riitta; Hennah, William
The current study examined quantitative measures of psychosis proneness in a nonpsychotic population, in order to elucidate their underlying genetic architecture and to observe if there is any commonality to that already detected in the studies of individuals with overt psychotic conditions, such as schizophrenia and bipolar disorder. Heritability, univariate and multivariate genome-wide association (GWAs) tests, including a series of comprehensive gene-based association analyses, were developed in 4269 nonpsychotic persons participating in the Northern Finland Birth Cohort 1966 study with information on the following psychometric measures: Hypomanic Personality, Perceptual Aberration, Physical and Social Anhedonia (also known as Chapman's Schizotypia scales), and Schizoidia scale. Genome-wide genetic data was available for ~9.84 million SNPs. Heritability estimates ranged from 16% to 27%. Phenotypic, genetic and environmental correlations ranged from 0.04-0.43, 0.25-0.73, and 0.12-0.43, respectively. Univariate GWAs tests revealed an intronic SNP (rs12449097) at the TMC7 gene (16p12.3) that significantly associated (P = 3.485 × 10-8) with the hypomanic scale. Bivariate GWAs tests including the hypomanic and physical anhedonia scales suggested a further borderline significant SNP (rs188320715; P-value = 5.261 × 10-8, ~572 kb downstream the ARID1B gene at 6q25.3). Gene-based tests highlighted 20 additional genes of which 5 had previously been associated to schizophrenia and/or bipolar disorder: CSMD1, CCDC141, SLC1A2, CACNA1C, and SNAP25. Altogether the findings explained from 3.7% to 14.1% of the corresponding trait heritability. In conclusion, this study provides preliminary genomic evidence suggesting that qualitatively similar biological factors may underlie different psychosis proneness measures, some of which could further predispose to schizophrenia and bipolar disorder. © The Author 2017. Published by Oxford University Press on behalf of the Maryland
clinical experience in implementing whole-genome high-resolution SNP arrays to investigate 33 patients with syndromic and .... Online Mendelian Inheritance in Man database (OMIM, ..... of damaged mitochondria through either autophagy or mito- ..... malformations: associations with maternal and infant character- istics in a ...
van den Berg, Stéphanie M; de Moor, Marleen H M; Verweij, K. J. H.
small sample sizes of those studies. Here, we report on a large meta-analysis of GWA studies for extraversion in 63,030 subjects in 29 cohorts. Extraversion item data from multiple personality inventories were harmonized across inventories and cohorts. No genome-wide significant associations were found...... at the single nucleotide polymorphism (SNP) level but there was one significant hit at the gene level for a long non-coding RNA site (LOC101928162). Genome-wide complex trait analysis in two large cohorts showed that the additive variance explained by common SNPs was not significantly different from zero...
Full Text Available Young-onset hypertension has a stronger genetic component than late-onset counterpart; thus, the identification of genes related to its susceptibility is a critical issue for the prevention and management of this disease. We carried out a two-stage association scan to map young-onset hypertension susceptibility genes. The first-stage analysis, a genome-wide association study, analyzed 175 matched case-control pairs; the second-stage analysis, a confirmatory association study, verified the results at the first stage based on a total of 1,008 patients and 1,008 controls. Single-locus association tests, multilocus association tests and pair-wise gene-gene interaction tests were performed to identify young-onset hypertension susceptibility genes. After considering stringent adjustments of multiple testing, gene annotation and single-nucleotide polymorphism (SNP quality, four SNPs from two SNP triplets with strong association signals (-log(10(p>7 and 13 SNPs from 8 interactive SNP pairs with strong interactive signals (-log(10(p>8 were carefully re-examined. The confirmatory study verified the association for a SNP quartet 219 kb and 495 kb downstream of LOC344371 (a hypothetical gene and RASGRP3 on chromosome 2p22.3, respectively. The latter has been implicated in the abnormal vascular responsiveness to endothelin-1 and angiotensin II in diabetic-hypertensive rats. Intrinsic synergy involving IMPG1 on chromosome 6q14.2-q15 was also verified. IMPG1 encodes interphotoreceptor matrix proteoglycan 1 which has cation binding capacity. The genes are novel hypertension targets identified in this first genome-wide hypertension association study of the Han Chinese population.
Luciano, Michelle; Huffman, Jennifer E.; Arias-Vásquez, Alejandro; Vinkhuyzen, Anna A. E.; Middeldorp, Christel M.; Giegling, Ina; Payton, Antony; Davies, Gail; Zgaga, Lina; Janzing, Joost; Ke, Xiayi; Galesloot, Tessel; Hartmann, Annette M.; Ollier, William; Tenesa, Albert; Hayward, Caroline; Verhagen, Maaike; Montgomery, Grant W.; Hottenga, Jouke-Jan; Konte, Bettina; Starr, John M.; Vitart, Veronique; Vos, Pieter E.; Madden, Pamela A. F.; Willemsen, Gonneke; Konnerth, Heike; Horan, Michael A.; Porteous, David J.; Campbell, Harry; Vermeulen, Sita H.; Heath, Andrew C.; Wright, Alan; Polasek, Ozren; Kovacevic, Sanja B.; Hastie, Nicholas D.; Franke, Barbara; Boomsma, Dorret I.; Martin, Nicholas G.; Rujescu, Dan; Wilson, James F.; Buitelaar, Jan; Pendleton, Neil; Rudan, Igor; Deary, Ian J.
Measures of personality and psychological distress are correlated and exhibit genetic covariance. We conducted univariate genome-wide SNP (similar to 2.5 million) and gene-based association analyses of these traits and examined the overlap in results across traits, including a prediction analysis of
Palomba, Grazia; Loi, Angela; Porcu, Eleonora; Cossu, Antonio; Zara, Ilenia; Budroni, Mario; Dei, Mariano; Lai, Sandra; Mulas, Antonella; Olmeo, Nina; Ionta, Maria Teresa; Atzori, Francesco; Cuccuru, Gianmauro; Pitzalis, Maristella; Zoledziewska, Magdalena; Olla, Nazario; Lovicu, Mario; Pisano, Marina; Abecasis, Gonçalo R; Uda, Manuela; Tanda, Francesco; Michailidou, Kyriaki; Easton, Douglas F; Chanock, Stephen J; Hoover, Robert N; Hunter, David J; Schlessinger, David; Sanna, Serena; Crisponi, Laura; Palmieri, Giuseppe
Despite progress in identifying genes associated with breast cancer, many more risk loci exist. Genome-wide association analyses in genetically-homogeneous populations, such as that of Sardinia (Italy), could represent an additional approach to detect low penetrance alleles. We performed a genome-wide association study comparing 1431 Sardinian patients with non-familial, BRCA1/2-mutation-negative breast cancer to 2171 healthy Sardinian blood donors. DNA was genotyped using GeneChip Human Mapping 500 K Arrays or Genome-Wide Human SNP Arrays 6.0. To increase genomic coverage, genotypes of additional SNPs were imputed using data from HapMap Phase II. After quality control filtering of genotype data, 1367 cases (9 men) and 1658 controls (1156 men) were analyzed on a total of 2,067,645 SNPs. Overall, 33 genomic regions (67 candidate SNPs) were associated with breast cancer risk at the p < 0(-6) level. Twenty of these regions contained defined genes, including one already associated with breast cancer risk: TOX3. With a lower threshold for preliminary significance to p < 10(-5), we identified 11 additional SNPs in FGFR2, a well-established breast cancer-associated gene. Ten candidate SNPs were selected, excluding those already associated with breast cancer, for technical validation as well as replication in 1668 samples from the same population. Only SNP rs345299, located in intron 1 of VAV3, remained suggestively associated (p-value, 1.16 x 10(-5)), but it did not associate with breast cancer risk in pooled data from two large, mixed-population cohorts. This study indicated the role of TOX3 and FGFR2 as breast cancer susceptibility genes in BRCA1/2-wild-type breast cancer patients from Sardinian population.
Palomba, Grazia; Loi, Angela; Porcu, Eleonora; Cossu, Antonio; Zara, Ilenia
Despite progress in identifying genes associated with breast cancer, many more risk loci exist. Genome-wide association analyses in genetically-homogeneous populations, such as that of Sardinia (Italy), could represent an additional approach to detect low penetrance alleles. We performed a genome-wide association study comparing 1431 Sardinian patients with non-familial, BRCA1/2-mutation-negative breast cancer to 2171 healthy Sardinian blood donors. DNA was genotyped using GeneChip Human Mapping 500 K Arrays or Genome-Wide Human SNP Arrays 6.0. To increase genomic coverage, genotypes of additional SNPs were imputed using data from HapMap Phase II. After quality control filtering of genotype data, 1367 cases (9 men) and 1658 controls (1156 men) were analyzed on a total of 2,067,645 SNPs. Overall, 33 genomic regions (67 candidate SNPs) were associated with breast cancer risk at the p < 10 −6 level. Twenty of these regions contained defined genes, including one already associated with breast cancer risk: TOX3. With a lower threshold for preliminary significance to p < 10 −5 , we identified 11 additional SNPs in FGFR2, a well-established breast cancer-associated gene. Ten candidate SNPs were selected, excluding those already associated with breast cancer, for technical validation as well as replication in 1668 samples from the same population. Only SNP rs345299, located in intron 1 of VAV3, remained suggestively associated (p-value, 1.16x10 −5 ), but it did not associate with breast cancer risk in pooled data from two large, mixed-population cohorts. This study indicated the role of TOX3 and FGFR2 as breast cancer susceptibility genes in BRCA1/2-wild-type breast cancer patients from Sardinian population. The online version of this article (doi:10.1186/s12885-015-1392-9) contains supplementary material, which is available to authorized users
Full Text Available Recent genetic association studies have implicated several candidate susceptibility variants for schizophrenia among general populations. Rs1344706, an intronic SNP within ZNF804A, was identified as one of the most compelling candidate risk SNPs for schizophrenia in Europeans through genome-wide association studies (GWASs and replications as well as large-scale meta-analyses. However, in Han Chinese, the results for rs1344706 are inconsistent, and whether rs1344706 is an authentic risk SNP for schizophrenia in Han Chinese is inconclusive. Here, we conducted a systematic meta-analysis of rs1344706 with schizophrenia in Chinese population by combining all available case-control samples (N = 12, including a total of 8,982 cases and 12,342 controls. The results of our meta-analysis were not able to confirm an association of rs1344706 A-allele with schizophrenia (p = 0.10, odds ratio = 1.06, 95% confidence interval = 0.99-1.13. Such absence of association was further confirmed by the non-superiority test (p = 0.0003, suggesting that rs1344706 is not a risk SNP for schizophrenia in Han Chinese. Detailed examinations of individual samples revealed potential sampling bias in previous replication studies in Han Chinese. The absence of rs1344706 association in Han Chinese suggest a potential genetic heterogeneity in the susceptibility of schizophrenia on this locus and also demonstrate the difficulties in replicating genome-wide association findings of schizophrenia across different ethnic populations.
Wan Juhari, Wan Khairunnisa; Md Tamrin, Nur Aida; Mat Daud, Mohd Hanif Ridzuan; Isa, Hatin Wan; Mohd Nasir, Nurfazreen; Maran, Sathiya; Abdul Rajab, Nur Shafawati; Ahmad Amin Noordin, Khairul Bariah; Nik Hassan, Nik Norliza; Tearle, Rick; Razali, Rozaimi; Merican, Amir Feisal; Zilfalil, Bin Alwi
The sequencing of two members of the Royal Kelantan Malay family genomes will provide insights on the Kelantan Malay whole genome sequences. The two Kelantan Malay genomes were analyzed for the SNP markers associated with thalassemia and Helicobacter pylori infection. Helicobacter pylori infection was reported to be low prevalence in the north-east as compared to the west coast of the Peninsular Malaysia and beta-thalassemia was known to be one of the most common inherited and genetic disorder in Malaysia. By combining SNP information from literatures, GWAS study and NCBI ClinVar, 18 unique SNPs were selected for further analysis. From these 18 SNPs, 10 SNPs came from previous study of Helicobacter pylori infection among Malay patients, 6 SNPs were from NCBI ClinVar and 2 SNPs from GWAS studies. The analysis reveals that both Royal Kelantan Malay genomes shared all the 10 SNPs identified by Maran (Single Nucleotide Polymorphims (SNPs) genotypic profiling of Malay patients with and without Helicobacter pylori infection in Kelantan, 2011) and one SNP from GWAS study. In addition, the analysis also reveals that both Royal Kelantan Malay genomes shared 3 SNP markers; HBG1 (rs1061234), HBB (rs1609812) and BCL11A (rs766432) where all three markers were associated with beta-thalassemia. Our findings suggest that the Royal Kelantan Malays carry the SNPs which are associated with protection to Helicobacter pylori infection. In addition they also carry SNPs which are associated with beta-thalassemia. These findings are in line with the findings by other researchers who conducted studies on thalassemia and Helicobacter pylori infection in the non-royal Malay population.
Dreger, Dayna L.; Rimbault, Maud; Davis, Brian W.; Bhatnagar, Adrienne; Parker, Heidi G.
ABSTRACT In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. PMID:27874836
Tsuruta, S; Lourenco, D A L; Misztal, I; Lawlor, T J
The objective of this study was to investigate the feasibility of genomic evaluation for cow mortality and milk production using a single-step methodology. Genomic relationships between cow mortality and milk production were also analyzed. Data included 883,887 (866,700) first-parity, 733,904 (711,211) second-parity, and 516,256 (492,026) third-parity records on cow mortality (305-d milk yields) of Holsteins from Northeast states in the United States. The pedigree consisted of up to 1,690,481 animals including 34,481 bulls genotyped with 36,951 SNP markers. Analyses were conducted with a bivariate threshold-linear model for each parity separately. Genomic information was incorporated as a genomic relationship matrix in the single-step BLUP. Traditional and genomic estimated breeding values (GEBV) were obtained with Gibbs sampling using fixed variances, whereas reliabilities were calculated from variances of GEBV samples. Genomic EBV were then converted into single nucleotide polymorphism (SNP) marker effects. Those SNP effects were categorized according to values corresponding to 1 to 4 standard deviations. Moving averages and variances of SNP effects were calculated for windows of 30 adjacent SNP, and Manhattan plots were created for SNP variances with the same window size. Using Gibbs sampling, the reliability for genotyped bulls for cow mortality was 28 to 30% in EBV and 70 to 72% in GEBV. The reliability for genotyped bulls for 305-d milk yields was 53 to 65% to 81 to 85% in GEBV. Correlations of SNP effects between mortality and 305-d milk yields within categories were the highest with the largest SNP effects and reached >0.7 at 4 standard deviations. All SNP regions explained less than 0.6% of the genetic variance for both traits, except regions close to the DGAT1 gene, which explained up to 2.5% for cow mortality and 4% for 305-d milk yields. Reliability for GEBV with a moderate number of genotyped animals can be calculated by Gibbs samples. Genomic
Duncan, Laramie; Yilmaz, Zeynep; Gaspar, Helena; Walters, Raymond; Goldstein, Jackie; Anttila, Verneri; Bulik-Sullivan, Brendan; Ripke, Stephan; Thornton, Laura; Hinney, Anke; Daly, Mark; Sullivan, Patrick F; Zeggini, Eleftheria; Breen, Gerome; Bulik, Cynthia M
The authors conducted a genome-wide association study of anorexia nervosa and calculated genetic correlations with a series of psychiatric, educational, and metabolic phenotypes. Following uniform quality control and imputation procedures using the 1000 Genomes Project (phase 3) in 12 case-control cohorts comprising 3,495 anorexia nervosa cases and 10,982 controls, the authors performed standard association analysis followed by a meta-analysis across cohorts. Linkage disequilibrium score regression was used to calculate genome-wide common variant heritability (single-nucleotide polymorphism [SNP]-based heritability [h 2 SNP ]), partitioned heritability, and genetic correlations (r g ) between anorexia nervosa and 159 other phenotypes. Results were obtained for 10,641,224 SNPs and insertion-deletion variants with minor allele frequencies >1% and imputation quality scores >0.6. The h 2 SNP of anorexia nervosa was 0.20 (SE=0.02), suggesting that a substantial fraction of the twin-based heritability arises from common genetic variation. The authors identified one genome-wide significant locus on chromosome 12 (rs4622308) in a region harboring a previously reported type 1 diabetes and autoimmune disorder locus. Significant positive genetic correlations were observed between anorexia nervosa and schizophrenia, neuroticism, educational attainment, and high-density lipoprotein cholesterol, and significant negative genetic correlations were observed between anorexia nervosa and body mass index, insulin, glucose, and lipid phenotypes. Anorexia nervosa is a complex heritable phenotype for which this study has uncovered the first genome-wide significant locus. Anorexia nervosa also has large and significant genetic correlations with both psychiatric phenotypes and metabolic traits. The study results encourage a reconceptualization of this frequently lethal disorder as one with both psychiatric and metabolic etiology.
Lakshmi K Matukumalli
Full Text Available The success of genome-wide association (GWA studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP genotyping for the identification of quantitative trait loci (QTL and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF ranging from 0.24 to 0.27. The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.
Arnedo, Javier; Svrakic, Dragan M; Del Val, Coral; Romero-Zaliz, Rocío; Hernández-Cuervo, Helena; Fanous, Ayman H; Pato, Michele T; Pato, Carlos N; de Erausquin, Gabriel A; Cloninger, C Robert; Zwir, Igor
The authors sought to demonstrate that schizophrenia is a heterogeneous group of heritable disorders caused by different genotypic networks that cause distinct clinical syndromes. In a large genome-wide association study of cases with schizophrenia and controls, the authors first identified sets of interacting single-nucleotide polymorphisms (SNPs) that cluster within particular individuals (SNP sets) regardless of clinical status. Second, they examined the risk of schizophrenia for each SNP set and tested replicability in two independent samples. Third, they identified genotypic networks composed of SNP sets sharing SNPs or subjects. Fourth, they identified sets of distinct clinical features that cluster in particular cases (phenotypic sets or clinical syndromes) without regard for their genetic background. Fifth, they tested whether SNP sets were associated with distinct phenotypic sets in a replicable manner across the three studies. The authors identified 42 SNP sets associated with a 70% or greater risk of schizophrenia, and confirmed 34 (81%) or more with similar high risk of schizophrenia in two independent samples. Seventeen networks of SNP sets did not share any SNP or subject. These disjoint genotypic networks were associated with distinct gene products and clinical syndromes (i.e., the schizophrenias) varying in symptoms and severity. Associations between genotypic networks and clinical syndromes were complex, showing multifinality and equifinality. The interactive networks explained the risk of schizophrenia more than the average effects of all SNPs (24%). Schizophrenia is a group of heritable disorders caused by a moderate number of separate genotypic networks associated with several distinct clinical syndromes.
Wang, Jia; Xian, Xiaohua; Xu, Xinfu; Qu, Cunmin; Lu, Kun; Li, Jiana; Liu, Liezhao
Seed coat color is an extremely important breeding characteristic of Brassica napus. To elucidate the factors affecting the genetic architecture of seed coat color, a genome-wide association study (GWAS) of seed coat color was conducted with a diversity panel comprising 520 B. napus cultivars and inbred lines. In total, 22 single-nucleotide polymorphisms (SNPs) distributed on 7 chromosomes were found to be associated with seed coat color. The most significant SNPs were found in 2014 near Bn-scaff_15763_1-p233999, only 43.42 kb away from BnaC06g17050D, which is orthologous to Arabidopsis thaliana TRANSPARENT TESTA 12 (TT12), an important gene involved in the transportation of proanthocyanidin precursors into the vacuole. Two of eight repeatedly detected SNPs can be identified and digested by restriction enzymes. Candidate gene mining revealed that the relevant regions of significant SNP loci on the A09 and C08 chromosomes are highly homologous. Moreover, a comparison of the GWAS results to those of previous quantitative trait locus (QTL) studies showed that 11 SNPs were located in the confidence intervals of the QTLs identified in previous studies based on linkage analyses or association mapping. Our results provide insights into the genetic basis of seed coat color in B. napus, and the beneficial allele, SNP information, and candidate genes should be useful for selecting yellow seeds in B. napus breeding.
Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi Xuan; Han, Bin; Kurata, Nori
. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all
Zhang, Hao; Shaffer, John R.; Hansen, Thomas; Esserlind, Ann-Louise; Boyd, Heather A.; Nohr, Ellen A.; Timpson, Nicholas J.; Fatemifar, Ghazaleh; Paternoster, Lavinia; Evans, David M.; Weyant, Robert J.; Levy, Steven M.; Lathrop, Mark; Smith, George Davey; Murray, Jeffrey C.; Olesen, Jes; Werge, Thomas; Marazita, Mary L.; Sørensen, Thorkild I. A.; Melbye, Mads
The sequence and timing of permanent tooth eruption is thought to be highly heritable and can have important implications for the risk of malocclusion, crowding, and periodontal disease. We conducted a genome-wide association study of number of permanent teeth erupted between age 6 and 14 years, analyzed as age-adjusted standard deviation score averaged over multiple time points, based on childhood records for 5,104 women from the Danish National Birth Cohort. Four loci showed association at Peruption and were also known to influence height and breast cancer, respectively. The two other loci pointed to genomic regions without any previous significant genome-wide association study results. The intronic SNP rs7924176 in ADK could be linked to gene expression in monocytes. The combined effect of the four genetic variants was most pronounced between age 10 and 12 years, where children with 6 to 8 delayed tooth eruption alleles had on average 3.5 (95% confidence interval: 2.9–4.1) fewer permanent teeth than children with 0 or 1 of these alleles. PMID:21931568
Cingolani, Pablo; Platts, Adrian; Wang, Le Lily; Coon, Melissa; Nguyen, Tung; Wang, Luan; Land, Susan J; Lu, Xiangyi; Ruden, Douglas M
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
Tian, Hong-Li; Wang, Feng-Ge; Zhao, Jiu-Ran; Yi, Hong-Mei; Wang, Lu; Wang, Rui; Yang, Yang; Song, Wei
Single nucleotide polymorphisms (SNPs) are abundant and evenly distributed throughout the maize ( Zea mays L.) genome. SNPs have several advantages over simple sequence repeats, such as ease of data comparison and integration, high-throughput processing of loci, and identification of associated phenotypes. SNPs are thus ideal for DNA fingerprinting, genetic diversity analysis, and marker-assisted breeding. Here, we developed a high-throughput and compatible SNP array, maizeSNP3072, containing 3072 SNPs developed from the maizeSNP50 array. To improve genotyping efficiency, a high-quality cluster file, maizeSNP3072_GT.egt, was constructed. All 3072 SNP loci were localized within different genes, where they were distributed in exons (43 %), promoters (21 %), 3' untranslated regions (UTRs; 22 %), 5' UTRs (9 %), and introns (5 %). The average genotyping failure rate using these SNPs was only 6 %, or 3 % using the cluster file to call genotypes. The genotype consistency of repeat sample analysis on Illumina GoldenGate versus Infinium platforms exceeded 96.4 %. The minor allele frequency (MAF) of the SNPs averaged 0.37 based on data from 309 inbred lines. The 3072 SNPs were highly effective for distinguishing among 276 examined hybrids. Comparative analysis using Chinese varieties revealed that the 3072SNP array showed a better marker success rate and higher average MAF values, evaluation scores, and variety-distinguishing efficiency than the maizeSNP50K array. The maizeSNP3072 array thus can be successfully used in DNA fingerprinting identification of Chinese maize varieties and shows potential as a useful tool for germplasm resource evaluation and molecular marker-assisted breeding.
Gardner, S; Jaing, C
The overall goal of this project is to forensically characterize 100 unknown Burkholderia isolates in the US-Australia collaboration. We will identify genome-wide single nucleotide polymorphisms (SNPs) from B. pseudomallei and near neighbor species including B. mallei, B. thailandensis and B. oklahomensis. We will design microarray probes to detect these SNP markers and analyze 100 Burkholderia genomic DNAs extracted from environmental, clinical and near neighbor isolates from Australian collaborators on the Burkholderia SNP microarray. We will analyze the microarray genotyping results to characterize the genetic diversity of these new isolates and triage the samples for whole genome sequencing. In this interim report, we described the SNP analysis and the microarray probe design for the Burkholderia SNP microarray.
Rachel Maree Jones
Full Text Available Research has proposed that autistic-like traits in the general population lie on a continuum, with clinical Autism Spectrum Disorder (ASD representing the extreme end of this distribution. Inherent in this proposal is that biological mechanisms associated with clinical ASD may also underpin variation in autistic-like traits within the general population. A genome-wide association study using 2,462,046 single nucleotide polymorphisms (SNPs was undertaken for ASD in 965 individuals from the Western Australian Pregnancy Cohort (Raine Study. No SNP associations reached genome-wide significance (p < 5.0 x 10-8. However, investigations into nominal observed SNP associations (p < 1.0 x 10-5 add support to two positional candidate genes previously implicated in ASD aetiology, PRKCB1 and CBLN1.The rs198198 SNP (p = 9.587 x 10-6, is located within an intron of the protein kinase C, beta 1 (PRKCB1 gene on chromosome 16p11. The PRKCB1 gene has been previously reported in linkage and association studies for ASD, and its mRNA expression has been shown to be significantly down regulated in ASD cases compared with controls. The rs16946931 SNP (p = 1.78 x 10-6 is located in a region flanking the Cerebellin 1 (CBLN1 gene on chromosome 16q12.1. The CBLN1 gene is involved with synaptogenesis and is part of a gene family previously implicated in ASD. This GWA study is only the second to examine SNPs associated with autistic-like traits in the general population, and provides evidence to support roles for the PRKCB1 and CBLN1 genes in risk of clinical ASD.
Joanna M Biernacka
Full Text Available Genome-wide association studies (GWAS have revealed many single nucleotide polymorphisms (SNPs associated with complex traits. Although these studies frequently fail to identify statistically significant associations, the top association signals from GWAS may be enriched for true associations. We therefore investigated the association of alcohol dependence with 43 SNPs selected from association signals in the first two published GWAS of alcoholism. Our analysis of 808 alcohol-dependent cases and 1,248 controls provided evidence of association of alcohol dependence with SNP rs1614972 in the ADH1C gene (unadjusted p = 0.0017. Because the GWAS study that originally reported association of alcohol dependence with this SNP  included only men, we also performed analyses in sex-specific strata. The results suggest that this SNP has a similar effect in both sexes (men: OR (95%CI = 0.80 (0.66, 0.95; women: OR (95%CI = 0.83 (0.66, 1.03. We also observed marginal evidence of association of the rs1614972 minor allele with lower alcohol consumption in the non-alcoholic controls (p = 0.081, and independently in the alcohol-dependent cases (p = 0.046. Despite a number of potential differences between the samples investigated by the prior GWAS and the current study, data presented here provide additional support for the association of SNP rs1614972 in ADH1C with alcohol dependence and extend this finding by demonstrating association with consumption levels in both non-alcoholic and alcohol-dependent populations. Further studies should investigate the association of other polymorphisms in this gene with alcohol dependence and related alcohol-use phenotypes.
Full Text Available Complex diseases are often highly heritable. However, for many complex traits only a small proportion of the heritability can be explained by observed genetic variants in traditional genome-wide association (GWA studies. Moreover, for some of those traits few significant SNPs have been identified. Single SNP association methods test for association at a single SNP, ignoring the effect of other SNPs. We show using a simple multi-locus odds model of complex disease that moderate to large effect sizes of causal variants may be estimated as relatively small effect sizes in single SNP association testing. This underestimation effect is most severe for diseases influenced by numerous risk variants. We relate the underestimation effect to the concept of non-collapsibility found in the statistics literature. As described, continuous phenotypes generated with linear genetic models are not affected by this underestimation effect. Since many GWA studies apply single SNP analysis to dichotomous phenotypes, previously reported results potentially underestimate true effect sizes, thereby impeding identification of true effect SNPs. Therefore, when a multi-locus model of disease risk is assumed, a multi SNP analysis may be more appropriate.
Davies, G; Marioni, R E; Liewald, D C; Hill, W D; Hagenaars, S P; Harris, S E; Ritchie, S J; Luciano, M; Fawns-Ritchie, C; Lyall, D; Cullen, B; Cox, S R; Hayward, C; Porteous, D J; Evans, J; McIntosh, A M; Gallacher, J; Craddock, N; Pell, J P; Smith, D J; Gale, C R; Deary, I J
People's differences in cognitive functions are partly heritable and are associated with important life outcomes. Previous genome-wide association (GWA) studies of cognitive functions have found evidence for polygenic effects yet, to date, there are few replicated genetic associations. Here we use data from the UK Biobank sample to investigate the genetic contributions to variation in tests of three cognitive functions and in educational attainment. GWA analyses were performed for verbal–numerical reasoning (N=36 035), memory (N=112 067), reaction time (N=111 483) and for the attainment of a college or a university degree (N=111 114). We report genome-wide significant single-nucleotide polymorphism (SNP)-based associations in 20 genomic regions, and significant gene-based findings in 46 regions. These include findings in the ATXN2, CYP2DG, APBA1 and CADM2 genes. We report replication of these hits in published GWA studies of cognitive function, educational attainment and childhood intelligence. There is also replication, in UK Biobank, of SNP hits reported previously in GWA studies of educational attainment and cognitive function. GCTA-GREML analyses, using common SNPs (minor allele frequency>0.01), indicated significant SNP-based heritabilities of 31% (s.e.m.=1.8%) for verbal–numerical reasoning, 5% (s.e.m.=0.6%) for memory, 11% (s.e.m.=0.6%) for reaction time and 21% (s.e.m.=0.6%) for educational attainment. Polygenic score analyses indicate that up to 5% of the variance in cognitive test scores can be predicted in an independent cohort. The genomic regions identified include several novel loci, some of which have been associated with intracranial volume, neurodegeneration, Alzheimer's disease and schizophrenia. PMID:27046643
Full Text Available Blackleg, caused by Leptosphaeria maculans, is a significant disease which affects the sustainable production of canola. This study reports a genome-wide association study based on 18,804 polymorphic SNPs to identify loci associated with qualitative and quantitative resistance to L. maculans. Genomic regions delimited with 503 significant SNP markers, that are associated with resistance evaluated using 12 single spore isolates and pathotypes from four canola stubble were identified. Several significant associations were detected at known disease resistance loci including in the vicinity of recently cloned Rlm2/LepR3 genes, and at new loci on chromosomes A01/C01, A02/C02, A03/C03, A05/C05, A06, A08, and A09. In addition, we validated statistically significant associations on A01, A07 and A10 in four genetic mapping populations, demonstrating that GWAS marker loci are indeed associated with resistance to L. maculans. One of the novel loci identified for the first time, Rlm12, conveys adult plant resistance and mapped within 13.2 kb from Arabidopsis R gene of TIR-NBS class. We showed that resistance loci are located in the vicinity of R genes of A. thaliana and B. napus on the sequenced genome of B. napus cv. Darmor-bzh. Significantly associated SNP markers provide a valuable tool to enrich germplasm for favorable alleles in order to improve the level of resistance to L. maculans in canola.
Ryan, Michael; Diekhans, Mark; Lien, Stephanie; Liu, Yun; Karchin, Rachel
LS-SNP/PDB is a new WWW resource for genome-wide annotation of human non-synonymous (amino acid changing) SNPs. It serves high-quality protein graphics rendered with UCSF Chimera molecular visualization software. The system is kept up-to-date by an automated, high-throughput build pipeline that systematically maps human nsSNPs onto Protein Data Bank structures and annotates several biologically relevant features. LS-SNP/PDB is available at (http://ls-snp.icm.jhu.edu/ls-snp-pdb) and via links from protein data bank (PDB) biology and chemistry tabs, UCSC Genome Browser Gene Details and SNP Details pages and PharmGKB Gene Variants Downloads/Cross-References pages.
Walter, Stefan; Atzmon, Gil; Demerath, Ellen W; Garcia, Melissa E; Kaplan, Robert C; Kumari, Meena; Lunetta, Kathryn L; Milaneschi, Yuri; Tanaka, Toshiko; Tranah, Gregory J; Völker, Uwe; Yu, Lei; Arnold, Alice; Benjamin, Emelia J; Biffar, Reiner; Buchman, Aron S; Boerwinkle, Eric; Couper, David; De Jager, Philip L; Evans, Denis A; Harris, Tamara B; Hoffmann, Wolfgang; Hofman, Albert; Karasik, David; Kiel, Douglas P; Kocher, Thomas; Kuningas, Maris; Launer, Lenore J; Lohman, Kurt K; Lutsey, Pamela L; Mackenbach, Johan; Marciante, Kristin; Psaty, Bruce M; Reiman, Eric M; Rotter, Jerome I; Seshadri, Sudha; Shardell, Michelle D; Smith, Albert V; van Duijn, Cornelia; Walston, Jeremy; Zillikens, M Carola; Bandinelli, Stefania; Baumeister, Sebastian E; Bennett, David A; Ferrucci, Luigi; Gudnason, Vilmundur; Kivimaki, Mika; Liu, Yongmei; Murabito, Joanne M; Newman, Anne B; Tiemeier, Henning; Franceschini, Nora
Human longevity and healthy aging show moderate heritability (20%-50%). We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death. No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p < 5 × 10(-8)). We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p < 10(-5)). These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease. In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings. These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity. Copyright © 2011 Elsevier Inc. All rights reserved.
Harold T Bae
Full Text Available Personality traits have been shown to be associated with longevity and healthy aging. In order to discover novel genetic modifiers associated with personality traits as related with longevity, we performed a genome-wide association study (GWAS on personality factors assessed by NEO-FFI in individuals enrolled in the Long Life Family Study (LLFS, a study of 583 families (N up to 4595 with clustering for longevity in the United States and Denmark. Three SNPs, in almost perfect LD, associated with agreeableness reached genome-wide significance (p<10-8 and replicated in an additional sample of 1279 LLFS subjects, although one (rs9650241 failed to replicate and the other two were not available in two independent replication cohorts, the Baltimore Longitudinal Study of Aging and the New England Centenarian Study. Based on 10,000,000 permutations, the empirical p-value of 2X10-7 was observed for the genome-wide significant SNPs. Seventeen SNPs that reached marginal statistical significance in the two previous GWASs (p-value < 10-4 and 10-5, were also marginally significantly associated in this study (p-value < 0.05, although none of the associations passed the Bonferroni correction. In addition, we tested age-by-SNP interactions and found some significant associations. Since scores of personality traits in LLFS subjects change in the oldest ages, and genetic factors outweigh environmental factors to achieve extreme ages, these age-by-SNP interactions could be a proxy for complex gene-gene interactions affecting personality traits and longevity.
Lin, Hui Yi; Chen, Dung Tsa; Huang, Po Yu
Motivation: Testing SNP-SNP interactions is considered as a key for overcoming bottlenecks of genetic association studies. However, related statistical methods for testing SNP-SNP interactions are underdeveloped. Results: We propose the SNP Interaction Pattern Identifier (SIPI), which tests 45...
Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon
Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.
Tongtawee, Taweesak; Dechsukhum, Chavaboon; Leeanansaksiri, Wilairat; Kaewpitoon, Soraya; Kaewpitoon, Natthawut; Loyd, Ryan A; Matrakool, Likit; Panpimanmas, Sukij
Helicobacter pylori plays an important role in gastric cancer, which has a relatively low inciduence in Thailand. MDM2 is a major negative regulator of p53, the key tumor suppressor involved in tumorigenesis of the majority of human cancers. Whether its expression might explain the relative lack of gastric cancer in Thailand was assessed here. This single-center study was conducted in the northeast region of Thailand. Gastric mucosa from 100 patients with Helicobacter pylori associated gastritis was analyzed for MDM2 SNP309 using real-time PCR hybridization (light-cycler) probes. In the total 100 Helicobacter pylori associated gastritis cases the incidence of SNP 309 T/T homozygous was 78 % with SNP309 G/T heterozygous found in 19% and SNP309 G/G homozygous in 3%. The result show SNP 309 T/T and SNP 309 G/T to be rather common in the Thai population. Our study indicates that the MDM2 SNP309 G/G homozygous genotype might be a risk factor for gastric cancer in Thailand and the fact that it is infrequent could explain to some extent the low incidence of gastric cancer in the Thai population.
Sahana, G; Guldbrandtsen, B; Bendixen, C
A genome-wide association study was conducted using a mixed model analysis for QTL for fertility traits in Danish and Swedish Holstein cattle. The analysis incorporated 2,531 progeny tested bulls, and a total of 36 387 SNP markers on 29 bovine autosomes were used. Eleven fertility traits were ana...
Bigdeli, Tim B.; Ripke, Stephan; Bacanu, Silviu-Alin
Genome-wide association studies (GWAS) of schizophrenia have yielded more than 100 common susceptibility variants, and strongly support a substantial polygenic contribution of a large number of small allelic effects. It has been hypothesized that familial schizophrenia is largely a consequence...... of inherited rather than environmental factors. We investigated the extent to which familiality of schizophrenia is associated with enrichment for common risk variants detectable in a large GWAS. We analyzed single nucleotide polymorphism (SNP) data for cases reporting a family history of psychotic illness (N...... history subgroup. Comparison of genome-wide polygenic risk scores based on GWAS summary statistics indicated a significant enrichment for SNP effects among family history positive compared to family history negative cases (Nagelkerke's R2=0.0021; P=0.00331; P-value threshold
Jinam, Timothy A; Phipps, Maude E; Saitou, Naruya
Southeast Asia houses various culturally and linguistically diverse ethnic groups. In Malaysia, where the Malay, Chinese, and Indian ethnic groups form the majority, there exist minority groups such as the "negritos" who are believed to be descendants of the earliest settlers of Southeast Asia. Here we report patterns of genetic substructure and admixture in two Malaysian negrito populations (Jehai and Kensiu), using ~50,000 genome-wide single-nucleotide polymorphism (SNP) data. We found traces of recent admixture in both the negrito populations, particularly in the Jehai, with the Malay through principal component analysis and STRUCTURE analysis software, which suggested that the admixture was as recent as one generation ago. We also identified significantly differentiated nonsynonymous SNPs and haplotype blocks related to intracellular transport, metabolic processes, and detection of stimulus. These results highlight the different levels of admixture experienced by the two Malaysian negritos. Delineating admixture and differentiated genomic regions should be of importance in designing and interpretation of molecular anthropology and disease association studies. Copyright © 2013 Wayne State University Press, Detroit, Michigan 48201-1309.
Yang, Bin; Cui, Leilei; Perez-Enciso, Miguel; Traspov, Aleksei; Crooijmans, Richard P M A; Zinovieva, Natalia; Schook, Lawrence B; Archibald, Alan; Gatphayak, Kesinee; Knorr, Christophe; Triantafyllidis, Alex; Alexandri, Panoraia; Semiadi, Gono; Hanotte, Olivier; Dias, Deodália; Dovč, Peter; Uimari, Pekka; Iacolina, Laura; Scandura, Massimo; Groenen, Martien A M; Huang, Lusheng; Megens, Hendrik-Jan
Pigs were domesticated independently in Eastern and Western Eurasia early during the agricultural revolution, and have since been transported and traded across the globe. Here, we present a worldwide survey on 60K genome-wide single nucleotide polymorphism (SNP) data for 2093 pigs, including 1839 domestic pigs representing 122 local and commercial breeds, 215 wild boars, and 39 out-group suids, from Asia, Europe, America, Oceania and Africa. The aim of this study was to infer global patterns in pig domestication and diversity related to demography, migration, and selection. A deep phylogeographic division reflects the dichotomy between early domestication centers. In the core Eastern and Western domestication regions, Chinese pigs show differentiation between breeds due to geographic isolation, whereas this is less pronounced in European pigs. The inferred European origin of pigs in the Americas, Africa, and Australia reflects European expansion during the sixteenth to nineteenth centuries. Human-mediated introgression, which is due, in particular, to importing Chinese pigs into the UK during the eighteenth and nineteenth centuries, played an important role in the formation of modern pig breeds. Inbreeding levels vary markedly between populations, from almost no runs of homozygosity (ROH) in a number of Asian wild boar populations, to up to 20% of the genome covered by ROH in a number of Southern European breeds. Commercial populations show moderate ROH statistics. For domesticated pigs and wild boars in Asia and Europe, we identified highly differentiated loci that include candidate genes related to muscle and body development, central nervous system, reproduction, and energy balance, which are putatively under artificial selection. Key events related to domestication, dispersal, and mixing of pigs from different regions are reflected in the 60K SNP data, including the globalization that has recently become full circle since Chinese pig breeders in the past
Panitz, Frank; Stengaard, Henrik; Hornshoj, Henrik
MOTIVATION: Single nucleotide polymorphisms (SNPs) analysis is an important means to study genetic variation. A fast and cost-efficient approach to identify large numbers of novel candidates is the SNP mining of large scale sequencing projects. The increasing availability of sequence trace data...... manual annotation, which is immediately accessible and can be easily shared with external collaborators. RESULTS: Large-scale SNP mining of polymorphisms bases on porcine EST sequences yielded more than 7900 candidate SNPs in coding regions (cSNPs), which were annotated relative to the human genome. Non...
Full Text Available Abstract Background Breast cancer predisposition genes identified to date (e.g., BRCA1 and BRCA2 are responsible for less than 5% of all breast cancer cases. Many studies have shown that the cancer risks associated with individual commonly occurring single nucleotide polymorphisms (SNPs are incremental. However, polygenic models suggest that multiple commonly occurring low to modestly penetrant SNPs of cancer related genes might have a greater effect on a disease when considered in combination. Methods In an attempt to identify the breast cancer risk conferred by SNP interactions, we have studied 19 SNPs from genes involved in major cancer related pathways. All SNPs were genotyped by TaqMan 5'nuclease assay. The association between the case-control status and each individual SNP, measured by the odds ratio and its corresponding 95% confidence interval, was estimated using unconditional logistic regression models. At the second stage, two-way interactions were investigated using multivariate logistic models. The robustness of the interactions, which were observed among SNPs with stronger functional evidence, was assessed using a bootstrap approach, and correction for multiple testing based on the false discovery rate (FDR principle. Results None of these SNPs contributed to breast cancer risk individually. However, we have demonstrated evidence for gene-gene (SNP-SNP interaction among these SNPs, which were associated with increased breast cancer risk. Our study suggests cross talk between the SNPs of the DNA repair and immune system (XPD-[Lys751Gln] and IL10-[G(-1082A], cell cycle and estrogen metabolism (CCND1-[Pro241Pro] and COMT-[Met108/158Val], cell cycle and DNA repair (BARD1-[Pro24Ser] and XPD-[Lys751Gln], and within carcinogen metabolism (GSTP1-[Ile105Val] and COMT-[Met108/158Val] pathways. Conclusion The importance of these pathways and their communication in breast cancer predisposition has been emphasized previously, but their
Onay, Venüs Ümmiye; Ozcelik, Hilmi; Briollais, Laurent; Knight, Julia A; Shi, Ellen; Wang, Yuanyuan; Wells, Sean; Li, Hong; Rajendram, Isaac; Andrulis, Irene L
Breast cancer predisposition genes identified to date (e.g., BRCA1 and BRCA2) are responsible for less than 5% of all breast cancer cases. Many studies have shown that the cancer risks associated with individual commonly occurring single nucleotide polymorphisms (SNPs) are incremental. However, polygenic models suggest that multiple commonly occurring low to modestly penetrant SNPs of cancer related genes might have a greater effect on a disease when considered in combination. In an attempt to identify the breast cancer risk conferred by SNP interactions, we have studied 19 SNPs from genes involved in major cancer related pathways. All SNPs were genotyped by TaqMan 5'nuclease assay. The association between the case-control status and each individual SNP, measured by the odds ratio and its corresponding 95% confidence interval, was estimated using unconditional logistic regression models. At the second stage, two-way interactions were investigated using multivariate logistic models. The robustness of the interactions, which were observed among SNPs with stronger functional evidence, was assessed using a bootstrap approach, and correction for multiple testing based on the false discovery rate (FDR) principle. None of these SNPs contributed to breast cancer risk individually. However, we have demonstrated evidence for gene-gene (SNP-SNP) interaction among these SNPs, which were associated with increased breast cancer risk. Our study suggests cross talk between the SNPs of the DNA repair and immune system (XPD-[Lys751Gln] and IL10-[G(-1082)A]), cell cycle and estrogen metabolism (CCND1-[Pro241Pro] and COMT-[Met108/158Val]), cell cycle and DNA repair (BARD1-[Pro24Ser] and XPD-[Lys751Gln]), and within carcinogen metabolism (GSTP1-[Ile105Val] and COMT-[Met108/158Val]) pathways. The importance of these pathways and their communication in breast cancer predisposition has been emphasized previously, but their biological interactions through SNPs have not been described
Full Text Available Suicidal behavior (SB has a complex etiology involving genes and environment. One of the genetic components in SB could be copy number variations (CNVs, as CNVs are implicated in neurodevelopmental disorders. However, a recently published genome-wide and case-control study did not observe any significant role of CNVs in SB. Here we complemented these initial observations by instead using a family-based trio-sample that is robust to control biases, having severe suicide attempt (SA in offspring as main outcome (n = 660 trios. We first tested for CNV associations on the genome-wide Illumina 1M SNP-array by using FBAT-CNV methodology, which allows for evaluating CNVs without reliance on CNV calling algorithms, analogous to a common SNP-based GWAS. We observed association of certain T-cell receptor markers, but this likely reflected inter-individual variation in somatic rearrangements rather than association with SA outcome. Next, we used the PennCNV software to call 385 putative rare (100 kb CNVs, observed in n = 225 SA offspring. Nine SA offspring had rare CNV calls in a set of previously schizophrenia-associated loci, indicating the importance of such CNVs in certain SA subjects. Several additional, very large (>1MB sized CNV calls in 15 other SA offspring also spanned pathogenic regions or other neural genes of interest. Overall, 45 SA had CNVs enriched for 65 medically relevant genes previously shown to be affected by CNVs, which were characterized by a neurodevelopmental biology. A neurodevelopmental implication was partly congruent with our previous SNP-based GWAS, but follow-up analysis here indicated that carriers of rare CNVs had a decreased burden of common SNP risk-alleles compared to non-carriers. In conclusion, while CNVs did not show genome-wide association by the FBAT-CNV methodology, our preliminary observations indicate rare pathogenic CNVs affecting neurodevelopmental functions in a subset of SA, who were distinct from SA having
de Tayrac, Marie; Roth, Marie-Paule; Jouanolle, Anne-Marie; Coppin, Hélène; le Gac, Gérald; Piperno, Alberto; Férec, Claude; Pelucchi, Sara; Scotet, Virginie; Bardou-Jacquet, Edouard; Ropert, Martine; Bouvet, Régis; Génin, Emmanuelle; Mosser, Jean; Deugnier, Yves
Hereditary hemochromatosis (HH) is the most common form of genetic iron loading disease. It is mainly related to the homozygous C282Y/C282Y mutation in the HFE gene that is, however, a necessary but not a sufficient condition to develop clinical and even biochemical HH. This suggests that modifier genes are likely involved in the expressivity of the disease. Our aim was to identify such modifier genes. We performed a genome-wide association study (GWAS) using DNA collected from 474 unrelated C282Y homozygotes. Associations were examined for both quantitative iron burden indices and clinical outcomes with 534,213 single nucleotide polymorphisms (SNP) genotypes, with replication analyses in an independent sample of 748 C282Y homozygotes from four different European centres. One SNP met genome-wide statistical significance for association with transferrin concentration (rs3811647, GWAS p value of 7×10(-9) and replication p value of 5×10(-13)). This SNP, located within intron 11 of the TF gene, had a pleiotropic effect on serum iron (GWAS p value of 4.9×10(-6) and replication p value of 3.2×10(-6)). Both serum transferrin and iron levels were associated with serum ferritin levels, amount of iron removed and global clinical stage (pHFE-associated HH (HFE-HH) patients, identified the rs3811647 polymorphism in the TF gene as the only SNP significantly associated with iron metabolism through serum transferrin and iron levels. Because these two outcomes were clearly associated with the biochemical and clinical expression of the disease, an indirect link between the rs3811647 polymorphism and the phenotypic presentation of HFE-HH is likely. Copyright © 2014 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.
Dikmen, Serdal; Cole, John B.; Null, Daniel J.; Hansen, Peter J.
Heat stress compromises production, fertility, and health of dairy cattle. One mitigation strategy is to select individuals that are genetically resistant to heat stress. Most of the negative effects of heat stress on animal performance are a consequence of either physiological adaptations to regulate body temperature or adverse consequences of failure to regulate body temperature. Thus, selection for regulation of body temperature during heat stress could increase thermotolerance. The objective was to perform a genome-wide association study (GWAS) for rectal temperature (RT) during heat stress in lactating Holstein cows and identify SNPs associated with genes that have large effects on RT. Records on afternoon RT where the temperature-humidity index was ≥78.2 were obtained from 4,447 cows sired by 220 bulls, resulting in 1,440 useable genotypes from the Illumina BovineSNP50 BeadChip with 39,759 SNP. For GWAS, 2, 3, 4, 5, and 10 adjacent SNP were averaged to identify consensus genomic regions associated with RT. The largest proportion of SNP variance (0.07 to 0.44%) was explained by markers flanking the region between 28,877,547 and 28,907,154 bp on Bos taurus autosome (BTA) 24. That region is flanked by U1 (28,822,883 to 28,823,043) and NCAD (28,992,666 to 29,241,119). In addition, the SNP at 58,500,249 bp on BTA 16 explained 0.08% and 0.11% of the SNP variance for 2- and 3-SNP analyses, respectively. That contig includes SNORA19, RFWD2 and SCARNA3. Other SNPs associated with RT were located on BTA 16 (close to CEP170 and PLD5), BTA 5 (near SLCO1C1 and PDE3A), BTA 4 (near KBTBD2 and LSM5), and BTA 26 (located in GOT1, a gene implicated in protection from cellular stress). In conclusion, there are QTL for RT in heat-stressed dairy cattle. These SNPs could prove useful in genetic selection and for identification of genes involved in physiological responses to heat stress. PMID:23935954
SacconePhD, Scott F [Washington University, St. Louis; Chesler, Elissa J [ORNL; Bierut, Laura J [Washington University, St. Louis; Kalivas, Peter J [Medical College of South Carolina, Charleston; Lerman, Caryn [University of Pennsylvania; Saccone, Nancy L [Washington University, St. Louis; Uhl, George R [Johns Hopkins University; Li, Chuan-Yun [Peking University; Philip, Vivek M [ORNL; Edenberg, Howard [Indiana University; Sherry, Steven [National Center for Biotechnology Information; Feolo, Michael [National Center for Biotechnology Information; Moyzis, Robert K [Johns Hopkins University; Rutter, Joni L [National Institute of Drug Abuse
Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.
Dunston Georgia M
Full Text Available Abstract Background Admixture mapping is a powerful approach for identifying genetic variants involved in human disease that exploits the unique genomic structure in recently admixed populations. To use existing published panels of ancestry-informative markers (AIMs for admixture mapping, markers have to be genotyped de novo for each admixed study sample and samples representing the ancestral parental populations. The increased availability of dense marker data on commercial chips has made it feasible to develop panels wherein the markers need not be predetermined. Results We developed two panels of AIMs (~2,000 markers each based on the Affymetrix Genome-Wide Human SNP Array 6.0 for admixture mapping with African American samples. These two AIM panels had good map power that was higher than that of a denser panel of ~20,000 random markers as well as other published panels of AIMs. As a test case, we applied the panels in an admixture mapping study of hypertension in African Americans in the Washington, D.C. metropolitan area. Conclusions Developing marker panels for admixture mapping from existing genome-wide genotype data offers two major advantages: (1 no de novo genotyping needs to be done, thereby saving costs, and (2 markers can be filtered for various quality measures and replacement markers (to minimize gaps can be selected at no additional cost. Panels of carefully selected AIMs have two major advantages over panels of random markers: (1 the map power from sparser panels of AIMs is higher than that of ~10-fold denser panels of random markers, and (2 clusters can be labeled based on information from the parental populations. With current technology, chip-based genome-wide genotyping is less expensive than genotyping ~20,000 random markers. The major advantage of using random markers is the absence of ascertainment effects resulting from the process of selecting markers. The ability to develop marker panels informative for ancestry from
Yin, Chang Shik; Park, Hi Joon; Chung, Joo-Ho; Lee, Hye-Jung; Lee, Byung-Cheol
Four-constitution medicine (FCM), also known as Sasang constitutional medicine, and the heritage of the long history of individualized acupuncture medicine tradition, is one of the holistic and traditional systems of constitution to appraise and categorize individual differences into four major types. This study first reports a genome-wide association study on FCM, to explore the genetic basis of FCM and facilitate the integration of FCM with conventional individual differences research. Healthy individuals of the Korean population were classified into the four constitutional types (FCTs). A total of 353,202 single nucleotide polymorphisms (SNPs) were typed using whole genome amplified samples, and six-way comparison of FCM types provided lists of significantly differential SNPs. In one-to-one FCT comparisons, 15,944 SNPs were significantly differential, and 5 SNPs were commonly significant in all of the three comparisons. In one-to-two FCT comparisons, 22,616 SNPs were significantly differential, and 20 SNPs were commonly significant in all of the three comparison groups. This study presents the association between genome-wide SNP profiles and the categorization of the FCM, and it could further provide a starting point of genome-based identification and research of the constitutions of FCM.
Full Text Available BACKGROUND: Possible single nucleotide polymorphism (SNP interactions in breast cancer are usually not investigated in genome-wide association studies. Previously, we proposed a particle swarm optimization (PSO method to compute these kinds of SNP interactions. However, this PSO does not guarantee to find the best result in every implement, especially when high-dimensional data is investigated for SNP-SNP interactions. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we propose IPSO algorithm to improve the reliability of PSO for the identification of the best protective SNP barcodes (SNP combinations and genotypes with maximum difference between cases and controls associated with breast cancer. SNP barcodes containing different numbers of SNPs were computed. The top five SNP barcode results are retained for computing the next SNP barcode with a one-SNP-increase for each processing step. Based on the simulated data for 23 SNPs of six steroid hormone metabolisms and signalling-related genes, the performance of our proposed IPSO algorithm is evaluated. Among 23 SNPs, 13 SNPs displayed significant odds ratio (OR values (1.268 to 0.848; p<0.05 for breast cancer. Based on IPSO algorithm, the jointed effect in terms of SNP barcodes with two to seven SNPs show significantly decreasing OR values (0.84 to 0.57; p<0.05 to 0.001. Using PSO algorithm, two to four SNPs show significantly decreasing OR values (0.84 to 0.77; p<0.05 to 0.001. Based on the results of 20 simulations, medians of the maximum differences for each SNP barcode generated by IPSO are higher than by PSO. The interquartile ranges of the boxplot, as well as the upper and lower hinges for each n-SNP barcode (n = 3∼10 are more narrow in IPSO than in PSO, suggesting that IPSO is highly reliable for SNP barcode identification. CONCLUSIONS/SIGNIFICANCE: Overall, the proposed IPSO algorithm is robust to provide exact identification of the best protective SNP barcodes for breast cancer.
We characterized 11 single nucleotide ploymorphism (SNP) markers for the Chinese black sleeper, Bostrychus sinensis. These markers were isolated from a genomic library and tested in ten geographically distant individuals of B. sinensis. Polymorphisms of these SNP loci were assessed using a wild population including ...
Zhang, Linsheng; Znoyko, Iya; Costa, Luciano J; Conlin, Laura K; Daber, Robert D; Self, Sally E; Wolff, Daynna J
Chronic lymphocytic leukemia (CLL) is a clinically heterogeneous disease. The methods currently used for monitoring CLL and determining conditions for treatment are limited in their ability to predict disease progression, patient survival, and response to therapy. Although clonal diversity and the acquisition of new chromosomal abnormalities during the disease course (clonal evolution) have been associated with disease progression, their prognostic potential has been underappreciated because cytogenetic and fluorescence in situ hybridization (FISH) studies have a restricted ability to detect genomic abnormalities and clonal evolution. We hypothesized that whole genome analysis using high resolution single nucleotide polymorphism (SNP) microarrays would be useful to detect diversity and infer clonal evolution to offer prognostic information. In this study, we used the Infinium Omni1 BeadChip (Illumina, San Diego, CA) array for the analysis of genetic variation and percent mosaicism in 25 non-selected CLL patients to explore the prognostic value of the assessment of clonal diversity in patients with CLL. We calculated the percentage of mosaicism for each abnormality by applying a mathematical algorithm to the genotype frequency data and by manual determination using the Simulated DNA Copy Number (SiDCoN) tool, which was developed from a computer model of mosaicism. At least one genetic abnormality was identified in each case, and the SNP data was 98% concordant with FISH results. Clonal diversity, defined as the presence of two or more genetic abnormalities with differing percentages of mosaicism, was observed in 12 patients (48%), and the diversity correlated with the disease stage. Clonal diversity was present in most cases of advanced disease (Rai stages III and IV) or those with previous treatment, whereas 9 of 13 patients without detected clonal diversity were asymptomatic or clinically stable. In conclusion, SNP microarray studies with simultaneous evaluation
Zhao Patrick X
Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.
Restriction-site associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in non-model organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species.
Full Text Available Piglet uniformity (PU and farrowing interval (FI are important reproductive traits related to production and economic profits in the pig industry. However, the genetic architecture of the longitudinal trends of reproductive traits still remains elusive. Herein, we performed a genome-wide association study (GWAS to detect potential genetic variation and candidate genes underlying the phenotypic records at different parities for PU and FI in a population of 884 Large White pigs. In total, 12 significant SNPs were detected on SSC1, 3, 4, 9, and 14, which collectively explained 1–1.79% of the phenotypic variance for PU from parity 1 to 4, and 2.58–4.11% for FI at different stages. Of these, seven SNPs were located within 16 QTL regions related to swine reproductive traits. One QTL region was associated with birth body weight (related to PU and contained the peak SNP MARC0040730, and another was associated with plasma FSH concentration (related to FI and contained the SNP MARC0031325. Finally, some positional candidate genes for PU and FI were identified because of their roles in prenatal skeletal muscle development, fetal energy substrate, pre-implantation, and the expression of mammary gland epithelium. Identification of novel variants and candidate genes will greatly advance our understanding of the genetic mechanisms of PU and FI, and suggest a specific opportunity for improving marker assisted selection or genomic selection in pigs.
Wen, Zixiang; Boyse, John F; Song, Qijian; Cregan, Perry B; Wang, Dechun
Crop improvement always involves selection of specific alleles at genes controlling traits of agronomic importance, likely resulting in detectable signatures of selection within the genome of modern soybean (Glycine max L. Merr.). The identification of these signatures of selection is meaningful from the perspective of evolutionary biology and for uncovering the genetic architecture of agronomic traits. To this end, two populations of soybean, consisting of 342 landraces and 1062 improved lines, were genotyped with the SoySNP50K Illumina BeadChip containing 52,041 single nucleotide polymorphisms (SNPs), and systematically phenotyped for 9 agronomic traits. A cross-population composite likelihood ratio (XP-CLR) method was used to screen the signals of selective sweeps. A total of 125 candidate selection regions were identified, many of which harbored genes potentially involved in crop improvement. To further investigate whether these candidate regions were in fact enriched for genes affected by selection, genome-wide association studies (GWAS) were conducted on 7 selection traits targeted in soybean breeding (grain yield, plant height, lodging, maturity date, seed coat color, seed protein and oil content) and 2 non-selection traits (pubescence and flower color). Major genomic regions associated with selection traits overlapped with candidate selection regions, whereas no overlap of this kind occurred for the non-selection traits, suggesting that the selection sweeps identified are associated with traits of agronomic importance. Multiple novel loci and refined map locations of known loci related to these traits were also identified. These findings illustrate that comparative genomic analyses, especially when combined with GWAS, are a promising approach to dissect the genetic architecture of complex traits.
Liu, Z; Goddard, M E; Hayes, B J; Reinhardt, F; Reents, R
Routine genomic evaluations in animal breeding are usually based on either a BLUP with genomic relationship matrix (GBLUP) or single nucleotide polymorphism (SNP) BLUP model. For a multi-step genomic evaluation, these 2 alternative genomic models were proven to give equivalent predictions for genomic reference animals. The model equivalence was verified also for young genotyped animals without phenotypes. Due to incomplete linkage disequilibrium of SNP markers to genes or causal mutations responsible for genetic inheritance of quantitative traits, SNP markers cannot explain all the genetic variance. A residual polygenic effect is normally fitted in the genomic model to account for the incomplete linkage disequilibrium. In this study, we start by showing the proof that the multi-step GBLUP and SNP BLUP models are equivalent for the reference animals, when they have a residual polygenic effect included. Second, the equivalence of both multi-step genomic models with a residual polygenic effect was also verified for young genotyped animals without phenotypes. Additionally, we derived formulas to convert genomic estimated breeding values of the GBLUP model to its components, direct genomic values and residual polygenic effect. Third, we made a proof that the equivalence of these 2 genomic models with a residual polygenic effect holds also for single-step genomic evaluation. Both the single-step GBLUP and SNP BLUP models lead to equal prediction for genotyped animals with phenotypes (e.g., reference animals), as well as for (young) genotyped animals without phenotypes. Finally, these 2 single-step genomic models with a residual polygenic effect were proven to be equivalent for estimation of SNP effects, too. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
de Moor, Marleen H.M.; van den Berg, Stéphanie M.; Verweij, Karin J.H.; Krueger, Robert F.; Luciano, Michelle; Vasquez, Alejandro Arias; Matteson, Lindsay K.; Derringer, Jaime; Esko, Tõnu; Amin, Najaf; Gordon, Scott D.; Hansell, Narelle K.; Hart, Amy B.; Seppälä, Ilkka; Huffman, Jennifer E.; Konte, Bettina; Lahti, Jari; Lee, Minyoung; Miller, Mike; Nutile, Teresa; Tanaka, Toshiko; Teumer, Alexander; Viktorin, Alexander; Wedenoja, Juho; Abecasis, Goncalo R.; Adkins, Daniel E.; Agrawal, Arpana; Allik, Jüri; Appel, Katja; Bigdeli, Timothy B.; Busonero, Fabio; Campbell, Harry; Costa, Paul T.; Smith, George Davey; Davies, Gail; de Wit, Harriet; Ding, Jun; Engelhardt, Barbara E.; Eriksson, Johan G.; Fedko, Iryna O.; Ferrucci, Luigi; Franke, Barbara; Giegling, Ina; Grucza, Richard; Hartmann, Annette M.; Heath, Andrew C.; Heinonen, Kati; Henders, Anjali K.; Homuth, Georg; Hottenga, Jouke-Jan; Janzing, Joost; Jokela, Markus; Karlsson, Robert; Kemp, John P.; Kirkpatrick, Matthew G.; Latvala, Antti; Lehtimäki, Terho; Liewald, David C.; Madden, Pamela A.F.; Magri, Chiara; Magnusson, Patrik K.E.; Marten, Jonathan; Maschio, Andrea; Medland, Sarah E.; Mihailov, Evelin; Milaneschi, Yuri; Montgomery, Grant W.; Nauck, Matthias; Ouwens, Klaasjan G.; Palotie, Aarno; Pettersson, Erik; Polasek, Ozren; Qian, Yong; Pulkki-Råback, Laura; Raitakari, Olli T.; Realo, Anu; Rose, Richard J.; Ruggiero, Daniela; Schmidt, Carsten O.; Slutske, Wendy S.; Sorice, Rossella; Starr, John M.; Pourcain, Beate St; Sutin, Angelina R.; Timpson, Nicholas J.; Trochet, Holly; Vermeulen, Sita; Vuoksimaa, Eero; Widen, Elisabeth; Wouda, Jasper; Wright, Margaret J.; Zgaga, Lina; Scotland, Generation; Porteous, David; Minelli, Alessandra; Palmer, Abraham A.; Rujescu, Dan; Ciullo, Marina; Hayward, Caroline; Rudan, Igor; Metspalu, Andres; Kaprio, Jaakko; Deary, Ian J.; Räikkönen, Katri; Wilson, James F.; Keltikangas-Järvinen, Liisa; Bierut, Laura J.; Hettema, John M.; Grabe, Hans J.; van Duijn, Cornelia M.; Evans, David M.; Schlessinger, David; Pedersen, Nancy L.; Terracciano, Antonio; McGue, Matt; Penninx, Brenda W.J.H.; Martin, Nicholas G.; Boomsma, Dorret I.
Importance Neuroticism is a personality trait that is briefly defined by emotional instability. It is a robust genetic risk factor for Major Depressive Disorder (MDD) and other psychiatric disorders. Hence, neuroticism is an important phenotype for psychiatric genetics. The Genetics of Personality Consortium (GPC) has created a resource for genome-wide association analyses of personality traits in over 63,000 participants (including MDD cases). Objective To identify genetic variants associated with neuroticism by performing a meta-analysis of genome-wide association (GWA) results based on 1000Genomes imputation, to evaluate if common genetic variants as assessed by Single Nucleotide Polymorphisms (SNPs) explain variation in neuroticism by estimating SNP-based heritability, and to examine whether SNPs that predict neuroticism also predict MDD. Setting 30 cohorts with genome-wide genotype, personality and MDD data from the GPC. Participants The study included 63,661 participants from 29 discovery cohorts and 9,786 participants from a replication cohort. Participants came from Europe, the United States or Australia. Main outcome measure(s) Neuroticism scores harmonized across all cohorts by Item Response Theory (IRT) analysis, and clinically assessed MDD case-control status. Results A genome-wide significant SNP was found in the MAGI1 gene (rs35855737; P=9.26 × 10−9 in the discovery meta-analysis, and P=2.38 × 10−8 in the meta-analysis of all 30 cohorts). Common genetic variants explain 15% of the variance in neuroticism. Polygenic scores based on the meta-analysis of neuroticism in 27 of the discovery cohorts significantly predicted neuroticism in 2 independent cohorts. Importantly, polygenic scores also predicted MDD in these cohorts. Conclusions and relevance This study identifies a novel locus for neuroticism. The variant is located in a known gene that has been associated with bipolar disorder and schizophrenia in previous studies. In addition, the study
Sebastiaan M Bol
Full Text Available HIV-1 infected macrophages play an important role in rendering resting T cells permissive for infection, in spreading HIV-1 to T cells, and in the pathogenesis of AIDS dementia. During highly active anti-retroviral treatment (HAART, macrophages keep producing virus because tissue penetration of antiretrovirals is suboptimal and the efficacy of some is reduced. Thus, to cure HIV-1 infection with antiretrovirals we will also need to efficiently inhibit viral replication in macrophages. The majority of the current drugs block the action of viral enzymes, whereas there is an abundance of yet unidentified host factors that could be targeted. We here present results from a genome-wide association study identifying novel genetic polymorphisms that affect in vitro HIV-1 replication in macrophages.Monocyte-derived macrophages from 393 blood donors were infected with HIV-1 and viral replication was determined using Gag p24 antigen levels. Genomic DNA from individuals with macrophages that had relatively low (n = 96 or high (n = 96 p24 production was used for SNP genotyping with the Illumina 610 Quad beadchip. A total of 494,656 SNPs that passed quality control were tested for association with HIV-1 replication in macrophages, using linear regression. We found a strong association between in vitro HIV-1 replication in monocyte-derived macrophages and SNP rs12483205 in DYRK1A (p = 2.16 × 10(-5. While the association was not genome-wide significant (p<1 × 10(-7, we could replicate this association using monocyte-derived macrophages from an independent group of 31 individuals (p = 0.0034. Combined analysis of the initial and replication cohort increased the strength of the association (p = 4.84 × 10(-6. In addition, we found this SNP to be associated with HIV-1 disease progression in vivo in two independent cohort studies (p = 0.035 and p = 0.0048.These findings suggest that the kinase DYRK1A is involved in the replication of HIV-1, in vitro in macrophages
Pierson, Tyler Mark; Simeonov, Dimitre R; Sincan, Murat; Adams, David A; Markello, Thomas; Golas, Gretchen; Fuentes-Fajardo, Karin; Hansen, Nancy F; Cherukuri, Praveen F; Cruz, Pedro; Blackstone, Craig; Tifft, Cynthia; Boerkoel, Cornelius F; Gahl, William A
Fatty acid hydroxylase-associated neurodegeneration due to fatty acid 2-hydroxylase deficiency presents with a wide range of phenotypes including spastic paraplegia, leukodystrophy, and/or brain iron deposition. All previously described families with this disorder were consanguineous, with homozygous mutations in the probands. We describe a 10-year-old male, from a non-consanguineous family, with progressive spastic paraplegia, dystonia, ataxia, and cognitive decline associated with a sural axonal neuropathy. The use of high-throughput sequencing techniques combined with SNP array analyses revealed a novel paternally derived missense mutation and an overlapping novel maternally derived ∼28-kb genomic deletion in FA2H. This patient provides further insight into the consistent features of this disorder and expands our understanding of its phenotypic presentation. The presence of a sural nerve axonal neuropathy had not been previously associated with this disorder and so may extend the phenotype. PMID:22146942
Heather E Wheeler
Full Text Available Chemotherapeutic agents are used in the treatment of many cancers, yet variable resistance and toxicities among individuals limit successful outcomes. Several studies have indicated outcome differences associated with ancestry among patients with various cancer types. Using both traditional SNP-based and newly developed gene-based genome-wide approaches, we investigated the genetics of chemotherapeutic susceptibility in lymphoblastoid cell lines derived from 83 African Americans, a population for which there is a disparity in the number of genome-wide studies performed. To account for population structure in this admixed population, we incorporated local ancestry information into our association model. We tested over 2 million SNPs and identified 325, 176, 240, and 190 SNPs that were suggestively associated with cytarabine-, 5'-deoxyfluorouridine (5'-DFUR-, carboplatin-, and cisplatin-induced cytotoxicity, respectively (p≤10(-4. Importantly, some of these variants are found only in populations of African descent. We also show that cisplatin-susceptibility SNPs are enriched for carboplatin-susceptibility SNPs. Using a gene-based genome-wide association approach, we identified 26, 11, 20, and 41 suggestive candidate genes for association with cytarabine-, 5'-DFUR-, carboplatin-, and cisplatin-induced cytotoxicity, respectively (p≤10(-3. Fourteen of these genes showed evidence of association with their respective chemotherapeutic phenotypes in the Yoruba from Ibadan, Nigeria (p<0.05, including TP53I11, COPS5 and GAS8, which are known to be involved in tumorigenesis. Although our results require further study, we have identified variants and genes associated with chemotherapeutic susceptibility in African Americans by using an approach that incorporates local ancestry information.
Chuang, Li-Yeh; Lane, Hsien-Yuan; Lin, Yu-Da; Lin, Ming-Teng; Yang, Cheng-Hong; Chang, Hsueh-Wei
Facial emotion perception (FEP) can affect social function. We previously reported that parts of five tested single-nucleotide polymorphisms (SNPs) in the MET and AKT1 genes may individually affect FEP performance. However, the effects of SNP-SNP interactions on FEP performance remain unclear. This study compared patients with high and low FEP performances (n = 89 and 93, respectively). A particle swarm optimization (PSO) algorithm was used to identify the best SNP barcodes (i.e., the SNP combinations and genotypes that revealed the largest differences between the high and low FEP groups). The analyses of individual SNPs showed no significant differences between the high and low FEP groups. However, comparisons of multiple SNP-SNP interactions involving different combinations of two to five SNPs showed that the best PSO-generated SNP barcodes were significantly associated with high FEP score. The analyses of the joint effects of the best SNP barcodes for two to five interacting SNPs also showed that the best SNP barcodes had significantly higher odds ratios (2.119 to 3.138; P < 0.05) compared to other SNP barcodes. In conclusion, the proposed PSO algorithm effectively identifies the best SNP barcodes that have the strongest associations with FEP performance. This study also proposes a computational methodology for analyzing complex SNP-SNP interactions in social cognition domains such as recognition of facial emotion.
Gregersen Peter K
Full Text Available Abstract Background Case-control genetic studies of complex human diseases can be confounded by population stratification. This issue can be addressed using panels of ancestry informative markers (AIMs that can provide substantial population substructure information. Previously, we described a panel of 128 SNP AIMs that were designed as a tool for ascertaining the origins of subjects from Europe, Sub-Saharan Africa, Americas, and East Asia. Results In this study, genotypes from Human Genome Diversity Panel populations were used to further evaluate a 93 SNP AIM panel, a subset of the 128 AIMS set, for distinguishing continental origins. Using both model-based and relatively model-independent methods, we here confirm the ability of this AIM set to distinguish diverse population groups that were not previously evaluated. This study included multiple population groups from Oceana, South Asia, East Asia, Sub-Saharan Africa, North and South America, and Europe. In addition, the 93 AIM set provides population substructure information that can, for example, distinguish Arab and Ashkenazi from Northern European population groups and Pygmy from other Sub-Saharan African population groups. Conclusion These data provide additional support for using the 93 AIM set to efficiently identify continental subject groups for genetic studies, to identify study population outliers, and to control for admixture in association studies.
Yuan, Han; Dougherty, Joseph D.
Lay Abstract Autism spectrum disorders (ASDs) are pervasive developmental disorders which have both a genetic and environmental component. One source of the environmental component is the in utero (prenatal) environment. The maternal genome can potentially contribute to the risk of autism in children by altering this prenatal environment. In this study, the possibility of maternal genotype effects was explored by looking for common variants (single nucleotide polymorphisms, or SNPs) in the maternal genome associated with increased risk of autism in children. We performed a case/control genome-wide association study (GWAS) using mothers of probands as cases and either fathers of probands or normal females as controls, using two collections of families with autism. We did not identify any SNP that reached significance and thus a common variant of large effect is unlikely. However, there was evidence for the possibility of a large number of alleles each carrying a small effect. This suggested that if there is a contribution to autism risk through common-variant maternal genetic effects, it may be the result of multiple loci of small effects. We did not investigate rare variants in this study. Scientific Abstract Like most psychiatric disorders, autism spectrum disorders have both a genetic and an environmental component. While previous studies have clearly demonstrated the contribution of in utero (prenatal) environment on autism risk, most of them focused on transient environmental factors. Based on a recent sibling study, we hypothesized that environmental factors could also come from the maternal genome, which would result in persistent effects across siblings. In this study, the possibility of maternal genotype effects was examined by looking for common variants (single nucleotide polymorphisms, or SNPs) in the maternal genome associated with increased risk of autism in children. A case/control genome-wide association study (GWAS) was performed using mothers of
Siedlinski, Mateusz; Cho, Michael H.; Bakke, Per; Gulsvik, Amund; Lomas, David A.; Anderson, Wayne; Kong, Xiangyang; Rennard, Stephen I.; Beaty, Terri H.; Hokanson, John E.; Crapo, James D.; Silverman, Edwin K.
Background Cigarette smoking is a major risk factor for COPD and COPD severity. Previous genome-wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) associated with the number of cigarettes smoked per day (CPD) and a Dopamine Beta-Hydroxylase (DBH) locus associated with smoking cessation in multiple populations. Objective To identify SNPs associated with lifetime average and current CPD, age at smoking initiation, and smoking cessation in COPD subjects. Methods GWAS were conducted in 4 independent cohorts encompassing 3,441 ever-smoking COPD subjects (GOLD stage II or higher). Untyped SNPs were imputed using HapMap (phase II) panel. Results from all cohorts were meta-analyzed. Results Several SNPs near the HLA region on chromosome 6p21 and in an intergenic region on chromosome 2q21 showed associations with age at smoking initiation, both with the lowest p=2×10−7. No SNPs were associated with lifetime average CPD, current CPD or smoking cessation with p<10−6. Nominally significant associations with candidate SNPs within alpha-nicotinic acetylcholine receptors 3/5 (CHRNA3/CHRNA5; e.g. p=0.00011 for SNP rs1051730) and Cytochrome P450 2A6 (CYP2A6; e.g. p=2.78×10−5 for a nonsynonymous SNP rs1801272) regions were observed for lifetime average CPD, however only CYP2A6 showed evidence of significant association with current CPD. A candidate SNP (rs3025343) in the DBH was significantly (p=0.015) associated with smoking cessation. Conclusion We identified two candidate regions associated with age at smoking initiation in COPD subjects. Associations of CHRNA3/CHRNA5 and CYP2A6 loci with CPD and DBH with smoking cessation are also likely of importance in the smoking behaviors of COPD patients. PMID:21685187
Seung Hwan Lee
Full Text Available This genome-wide association study (GWAS was conducted to identify major loci that are significantly associated with carcass weight, and their effects, in order to provide increased understanding of the genetic architecture of carcass weight in Hanwoo. This genome-wide association study identified one major chromosome region ranging from 23 Mb to 25 Mb on chromosome 14 as being associated with carcass weight in Hanwoo. Significant Bonferroni-corrected genome-wide associations (P<1.52×10(-6 were detected for 6 Single Nucleotide Polymorphic (SNP loci for carcass weight on chromosome 14. The most significant SNP was BTB-01280026 (P = 4.02×10(-11, located in the 25 Mb region on Bos taurus autosome 14 (BTA14. The other 5 significant SNPs were Hapmap27934-BTC-065223 (P = 4.04×10(-11 in 25.2 Mb, BTB-01143580 (P = 6.35×10(-11 in 24.3 Mb, Hapmap30932-BTC-011225 (P = 5.92×10(-10 in 24.8 Mb, Hapmap27112-BTC-063342 (P = 5.18×10(-9 in 25.4 Mb, and Hapmap24414-BTC-073009 (P = 7.38×10(-8 in 25.4 Mb, all on BTA 14. One SNP (BTB-01143580; P = 6.35×10(-11 lies independently from the other 5 SNPs. The 5 SNPs that lie together showed a large Linkage disequilibrium (LD block (block size of 553 kb with LD coefficients ranging from 0.53 to 0.89 within the block. The most significant SNPs accounted for 6.73% to 10.55% of additive genetic variance, which is quite a large proportion of the total additive genetic variance. The most significant SNP (BTB-01280026; P = 4.02×10(-11 had 16.96 kg of allele substitution effect, and the second most significant SNP (Hapmap27934-BTC-065223; P = 4.04×10(-11 had 18.06 kg of effect on carcass weight, which correspond to 44% and 47%, respectively, of the phenotypic standard deviation for carcass weight in Hanwoo cattle. Our results demonstrated that carcass weight was affected by a major Quantitative Trait Locus (QTL with a large effect and by many SNPs with small effects that are normally
Pearce, Madison E; Alikhan, Nabil-Fareed; Dallman, Timothy J; Zhou, Zhemin; Grant, Kathie; Maiden, Martin C J
Multi-country outbreaks of foodborne bacterial disease present challenges in their detection, tracking, and notification. As food is increasingly distributed across borders, such outbreaks are becoming more common. This increases the need for high-resolution, accessible, and replicable isolate typing schemes. Here we evaluate a core genome multilocus typing (cgMLST) scheme for the high-resolution reproducible typing of Salmonella enterica (S. enterica) isolates, by its application to a large European outbreak of S. enterica serovar Enteritidis. This outbreak had been extensively characterised using single nucleotide polymorphism (SNP)-based approaches. The cgMLST analysis was congruent with the original SNP-based analysis, the epidemiological data, and whole genome MLST (wgMLST) analysis. Combination of the cgMLST and epidemiological data confirmed that the genetic diversity among the isolates predated the outbreak, and was likely present at the infection source. There was consequently no link between country of isolation and genetic diversity, but the cgMLST clusters were congruent with date of isolation. Furthermore, comparison with publicly available Enteritidis isolate data demonstrated that the cgMLST scheme presented is highly scalable, enabling outbreaks to be contextualised within the Salmonella genus. The cgMLST scheme is therefore shown to be a standardised and scalable typing method, which allows Salmonella outbreaks to be analysed and compared across laboratories and jurisdictions. Copyright © 2018. Published by Elsevier B.V.
Bradley J Foresman
Full Text Available Barley yellow dwarf viruses (BYDVs are responsible for the disease barley yellow dwarf (BYD and affect many cereals including oat (Avena sativa L.. Until recently, the molecular marker technology in oat has not allowed for many marker-trait association studies to determine the genetic mechanisms for tolerance. A genome-wide association study (GWAS was performed on 428 spring oat lines using a recently developed high-density oat single nucleotide polymorphism (SNP array as well as a SNP-based consensus map. Marker-trait associations were performed using a Q-K mixed model approach to control for population structure and relatedness. Six significant SNP-trait associations representing two QTL were found on chromosomes 3C (Mrg17 and 18D (Mrg04. This is the first report of BYDV tolerance QTL on chromosome 3C (Mrg17 and 18D (Mrg04. Haplotypes using the two QTL were evaluated and distinct classes for tolerance were identified based on the number of favorable alleles. A large number of lines carrying both favorable alleles were observed in the panel.
Interspecies hybridization on DNA resequencing microarrays: efficiency of sequence recovery and accuracy of SNP detection in human, ape, and codfish mitochondrial DNA genomes sequenced on a human-specific MitoChip
Carr Steven M
Full Text Available Abstract Background Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific biodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes. Results In the human genome, 99.67% of the sequence was recovered with 100.0% accuracy. Accuracy of SNP identification declines log-linearly with sequence divergence from the reference, from 0.067 to 0.247 errors per SNP in the chimpanzee and gorilla genomes, respectively. Efficiency of sequence recovery declines with the increase of the number of interspecific SNPs in the 25b interval tiled by the reference oligonucleotides. In the gorilla genome, which differs from the human reference by 10%, and in which 46% of these 25b regions contain 3 or more SNP differences from the reference, only 88% of the sequence is recoverable. In the codfish genome, which differs from the reference by > 30%, less than 4% of the sequence is recoverable, in short islands ≥ 12b that are conserved between primates and fish. Conclusion Experimental DNAs bind inefficiently to homologous reference oligonucleotide sets on a re-sequencing microarray when their sequences differ by
Sun, Yaqi; Wang, Hongyang; Wang, Chao; Yu, Shaobo; Liu, Jing; Zhang, Yu; Fan, Bin; Li, Kui; Liu, Bang
Copy number variations (CNVs) represent a substantial source of structural variants in mammals and contribute to both normal phenotypic variability and disease susceptibility. Although low-resolution CNV maps are produced in many domestic animals, and several reports have been published about the CNVs of porcine genome, the differences between Chinese and western pigs still remain to be elucidated. In this study, we used Porcine SNP60 BeadChip and PennCNV algorithm to perform a genome-wide CNV detection in 302 individuals from six Chinese indigenous breeds (Tongcheng, Laiwu, Luchuan, Bama, Wuzhishan and Ningxiang pigs), three western breeds (Yorkshire, Landrace and Duroc) and one hybrid (Tongcheng×Duroc). A total of 348 CNV Regions (CNVRs) across genome were identified, covering 150.49 Mb of the pig genome or 6.14% of the autosomal genome sequence. In these CNVRs, 213 CNVRs were found to exist only in the six Chinese indigenous breeds, and 60 CNVRs only in the three western breeds. The characters of CNVs in four Chinese normal size breeds (Luchuan, Tongcheng and Laiwu pigs) and two minipig breeds (Bama and Wuzhishan pigs) were also analyzed in this study. Functional annotation suggested that these CNVRs possess a great variety of molecular function and may play important roles in phenotypic and production traits between Chinese and western breeds. Our results are important complementary to the CNV map in pig genome, which provide new information about the diversity of Chinese and western pig breeds, and facilitate further research on porcine genome CNVs. PMID:25198154
Eynard, Sonia E.
Concern about the status of livestock breeds and their conservation has increased as selection and small population sizes caused loss of genetic diversity. Meanwhile, dense SNP chips and whole genome sequences (WGS) became available, providing opportunities to accurately quantify the impact of
Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.
Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619
Full Text Available A whole genome association (WGA study was performed to detect significant polymorphisms for meat quality traits in an F2 cross population (N = 478 that were generated with Korean native pig sires and Landrace dams in National Livestock Research Institute, Songwhan, Korea. The animals were genotyped using Illumina porcine 60k SNP beadchips, in which a set of 46,865 SNPs were available for the WGA analyses on ten carcass quality traits; live weight, crude protein, crude lipids, crude ash, water holding capacity, drip loss, shear force, CIE L, CIE a and CIE b. Phenotypes were regressed on additive and dominance effects for each SNP using a simple linear regression model, after adjusting for sex, sire and slaughter stage as fixed effects. With the significant SNPs for each trait (p<0.001, a stepwise regression procedure was applied to determine the best set of SNPs with the additive and/or dominance effects. A total of 106 SNPs, or quantitative trait loci (QTL were detected, and about 32 to 66% of the total phenotypic variation was explained by the significant SNPs for each trait. The QTL were identified in most porcine chromosomes (SSCs, in which majority of the QTL were detected in SSCs 1, 2, 12, 13, 14 and 16. Several QTL clusters were identified on SSCs 12, 16 and 17, and a cluster of QTL influencing crude protein, crude lipid, drip loss, shear force, CIE a and CIE b were located between 20 and 29 Mb of SSC12. A pleiotropic QTL for drip loss, CIE L and CIE b was also detected on SSC16. These QTL need to be validated in commercial pig populations for genetic improvement in meat quality via marker-assisted selection.
Mdladla, K; Dzomba, E F; Huson, H J; Muchadeyi, F C
The sustainability of goat farming in marginal areas of southern Africa depends on local breeds that are adapted to specific agro-ecological conditions. Unimproved non-descript goats are the main genetic resources used for the development of commercial meat-type breeds of South Africa. Little is known about genetic diversity and the genetics of adaptation of these indigenous goat populations. This study investigated the genetic diversity, population structure and breed relations, linkage disequilibrium, effective population size and persistence of gametic phase in goat populations of South Africa. Three locally developed meat-type breeds of the Boer (n = 33), Savanna (n = 31), Kalahari Red (n = 40), a feral breed of Tankwa (n = 25) and unimproved non-descript village ecotypes (n = 110) from four goat-producing provinces of the Eastern Cape, KwaZulu-Natal, Limpopo and North West were assessed using the Illumina Goat 50K SNP Bead Chip assay. The proportion of SNPs with minor allele frequencies >0.05 ranged from 84.22% in the Tankwa to 97.58% in the Xhosa ecotype, with a mean of 0.32 ± 0.13 across populations. Principal components analysis, admixture and pairwise FST identified Tankwa as a genetically distinct population and supported clustering of the populations according to their historical origins. Genome-wide FST identified 101 markers potentially under positive selection in the Tankwa. Average linkage disequilibrium was highest in the Tankwa (r(2) = 0.25 ± 0.26) and lowest in the village ecotypes (r(2) range = 0.09 ± 0.12 to 0.11 ± 0.14). We observed an effective population size of 100 kb with the exception of those in Savanna and Tswana populations. This study highlights the high level of genetic diversity in South African indigenous goats as well as the utility of the genome-wide SNP marker panels in genetic studies of these populations. © 2016 Stichting International Foundation for Animal Genetics.
Full Text Available Single nucleotide polymorphisms (SNPs have been increasingly utilized to investigate somatic genetic abnormalities in premalignancy and cancer. LOH is a common alteration observed during cancer development, and SNP assays have been used to identify LOH at specific chromosomal regions. The design of such studies requires consideration of the resolution for detecting LOH throughout the genome and identification of the number and location of SNPs required to detect genetic alterations in specific genomic regions. Our study evaluated SNP distribution patterns and used probability models, Monte Carlo simulation, and real human subject genotype data to investigate the relationships between the number of SNPs, SNP HET rates, and the sensitivity (resolution for detecting LOH. We report that variances of SNP heterozygosity rate in dbSNP are high for a large proportion of SNPs. Two statistical methods proposed for directly inferring SNP heterozygosity rates require much smaller sample sizes (intermediate sizes and are feasible for practical use in SNP selection or verification. Using HapMap data, we showed that a region of LOH greater than 200 kb can be reliably detected, with losses smaller than 50 kb having a substantially lower detection probability when using all SNPs currently in the HapMap database. Higher densities of SNPs may exist in certain local chromosomal regions that provide some opportunities for reliably detecting LOH of segment sizes smaller than 50 kb. These results suggest that the interpretation of the results from genome-wide scans for LOH using commercial arrays need to consider the relationships among inter-SNP distance, detection probability, and sample size for a specific study. New experimental designs for LOH studies would also benefit from considering the power of detection and sample sizes required to accomplish the proposed aims.
He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L
SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.
Full Text Available The papers published in this Special Issue “SNP arrays” (Single Nucleotide Polymorphism Arrays focus on several perspectives associated with arrays of this type. The range of papers vary from a case report to reviews, thereby targeting wider audiences working in this field. The research focus of SNP arrays is often human cancers but this Issue expands that focus to include areas such as rare conditions, animal breeding and bioinformatics tools. Given the limited scope, the spectrum of papers is nothing short of remarkable and even from a technical point of view these papers will contribute to the field at a general level. Three of the papers published in this Special Issue focus on the use of various SNP array approaches in the analysis of three different cancer types. Two of the papers concentrate on two very different rare conditions, applying the SNP arrays slightly differently. Finally, two other papers evaluate the use of the SNP arrays in the context of genetic analysis of livestock. The findings reported in these papers help to close gaps in the current literature and also to give guidelines for future applications of SNP arrays.
Full Text Available Abstract Background SNP (Single Nucleotide Polymorphism markers are rapidly becoming the markers of choice for applications in breeding because of next generation sequencing technology developments. For SNP development by NGS technologies, correct assembly of the huge amounts of sequence data generated is essential. Little is known about assembler's performance, especially when dealing with highly heterogeneous species that show a high genome complexity and what the possible consequences are of differences in assemblies on SNP retrieval. This study tested two assemblers (CAP3 and CLC on 454 data from four lily genotypes and compared results with respect to SNP retrieval. Results CAP3 assembly resulted in higher numbers of contigs, lower numbers of reads per contig, and shorter average read lengths compared to CLC. Blast comparisons showed that CAP3 contigs were highly redundant. Contrastingly, CLC in rare cases combined paralogs in one contig. Redundant and chimeric contigs may lead to erroneous SNPs. Filtering for redundancy can be done by blasting selected SNP markers to the contigs and discarding all the SNP markers that show more than one blast hit. Results on chimeric contigs showed that only four out of 2,421 SNP markers were selected from chimeric contigs. Conclusion In practice, CLC performs better in assembling highly heterogeneous genome sequences compared to CAP3, and consequently SNP retrieval is more efficient. Additionally a simple flow scheme is suggested for SNP marker retrieval that can be valid for all non-model species.
Full Text Available Abstract Background Until recently, only a small number of low- and mid-throughput methods have been used for single nucleotide polymorphism (SNP discovery and genotyping in grapevine (Vitis vinifera L.. However, following completion of the sequence of the highly heterozygous genome of Pinot Noir, it has been possible to identify millions of electronic SNPs (eSNPs thus providing a valuable source for high-throughput genotyping methods. Results Herein we report the first application of the SNPlex™ genotyping system in grapevine aiming at the anchoring of an eukaryotic genome. This approach combines robust SNP detection with automated assay readout and data analysis. 813 candidate eSNPs were developed from non-repetitive contigs of the assembled genome of Pinot Noir and tested in 90 progeny of Syrah × Pinot Noir cross. 563 new SNP-based markers were obtained and mapped. The efficiency rate of 69% was enhanced to 80% when multiple displacement amplification (MDA methods were used for preparation of genomic DNA for the SNPlex assay. Conclusion Unlike other SNP genotyping methods used to investigate thousands of SNPs in a few genotypes, or a few SNPs in around a thousand genotypes, the SNPlex genotyping system represents a good compromise to investigate several hundred SNPs in a hundred or more samples simultaneously. Therefore, the use of the SNPlex assay, coupled with whole genome amplification (WGA, is a good solution for future applications in well-equipped laboratories.
Jakobsen, M. A.; Sprogoe, U.
designed a genomic assay based on sequencing targeting the SNPs underlying the antigens of the Knops system. Study Design/Methods: Samples from a total of 105 blood donors and 2 patients were examined for polymorphisms in CR1 exon 29 by using PCR and subsequent Sanger sequencing. Results......Background/Case Studies: The antigens of the Knops (Kn) blood group system are associated with SNPs located on exon 29 and (to lesser extent) on exon 26 of the complement receptor 1 (CR1) gene. Because of a lack of proper typing antibodies, serologic detection of Kn antigens is not feasible. We....../Findings: With regard to Kn a and b antigens, we found SNP frequencies to be 90.5% for G/G (4681)* associated with Kn(a+b-) and 9.5% for G/A associated with Kn(a+b+). None of the 107 patients/donors were found to be homozygous for A/A associated with Kn(ab+). The frequencies of SNPs associated with the KCAM antigen...
Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F
High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.
Background Thoroughbred racehorses are subject to non-traumatic distal limb bone fractures that occur during racing and exercise. Susceptibility to fracture may be due to underlying disturbances in bone metabolism which have a genetic cause. Fracture risk has been shown to be heritable in several species but this study is the first genetic analysis of fracture risk in the horse. Results Fracture cases (n = 269) were horses that sustained catastrophic distal limb fractures while racing on UK racecourses, necessitating euthanasia. Control horses (n = 253) were over 4 years of age, were racing during the same time period as the cases, and had no history of fracture at the time the study was carried out. The horses sampled were bred for both flat and National Hunt (NH) jump racing. 43,417 SNPs were employed to perform a genome-wide association analysis and to estimate the proportion of genetic variance attributable to the SNPs on each chromosome using restricted maximum likelihood (REML). Significant genetic variation associated with fracture risk was found on chromosomes 9, 18, 22 and 31. Three SNPs on chromosome 18 (62.05 Mb – 62.15 Mb) and one SNP on chromosome 1 (14.17 Mb) reached genome-wide significance (p fracture than cases, p = 1 × 10-4), while a second haplotype increases fracture risk (cases at 3.39 times higher risk of fracture than controls, p = 0.042). Conclusions Fracture risk in the Thoroughbred horse is a complex condition with an underlying genetic basis. Multiple genomic regions contribute to susceptibility to fracture risk. This suggests there is the potential to develop SNP-based estimators for genetic risk of fracture in the Thoroughbred racehorse, using methods pioneered in livestock genetics such as genomic selection. This information would be useful to racehorse breeders and owners, enabling them to reduce the risk of injury in their horses. PMID:24559379
Full Text Available Abstract Background It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information. Methods The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identity-by-descent relationship matrix (averaged over all marker loci. The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data. Results We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree. Conclusions Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identity-by-state information. Although, in principle, genomic selection based on identity-by-state does not require
Senn, Helen; Ogden, Rob; Cezard, Timothee; Gharbi, Karim; Iqbal, Zamin; Johnson, Eric; Kamps-Hughes, Nick; Rosell, Frank; McEwing, Ross
In this study, we used restriction site-associated DNA (RAD) sequencing to discover SNP markers suitable for population genetic and parentage analysis with the aim of using them for monitoring the reintroduction of the Eurasian beaver (Castor fibre) to Scotland. In the absence of a reference genome for beaver, we built contigs and discovered SNPs within them using paired-end RAD data, so as to have sufficient flanking region around the SNPs to conduct marker design. To do this, we used a simple pipeline which catalogued the Read 1 data in stacks and then used the assembler cortex_var to conduct de novo assembly and genotyping of multiple samples using the Read 2 data. The analysis of around 1.1 billion short reads of sequence data was reduced to a set of 2579 high-quality candidate SNP markers that were polymorphic in Norwegian and Bavarian beaver. Both laboratory validation of a subset of eight of the SNPs (1.3% error) and internal validation by confirming patterns of Mendelian inheritance in a family group (0.9% error) confirmed the success of this approach. © 2013 John Wiley & Sons Ltd.
Full Text Available Abstract Background A number of tools for the examination of linkage disequilibrium (LD patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb. We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Description GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range. The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%, distance limits between SNPs (minimum and maximum, r2 (0.3 to 1, HapMap population sample (CEU, YRI and JPT+CHB combined and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file. Conclusion GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.
Zanella, Ricardo; Peixoto, Jane O; Cardoso, Fernando F; Cardoso, Leandro L; Biegelmeyer, Patrícia; Cantão, Maurício E; Otaviano, Antonio; Freitas, Marcelo S; Caetano, Alexandre R; Ledur, Mônica C
Genetic improvement in livestock populations can be achieved without significantly affecting genetic diversity if mating systems and selection decisions take genetic relationships among individuals into consideration. The objective of this study was to examine the genetic diversity of two commercial breeds of pigs. Genotypes from 1168 Landrace (LA) and 1094 Large White (LW) animals from a commercial breeding program in Brazil were obtained using the Illumina PorcineSNP60 Beadchip. Inbreeding estimates based on pedigree (F x) and genomic information using runs of homozygosity (F ROH) and the single nucleotide polymorphisms (SNP) by SNP inbreeding coefficient (F SNP) were obtained. Linkage disequilibrium (LD), correlation of linkage phase (r) and effective population size (N e ) were also estimated. Estimates of inbreeding obtained with pedigree information were lower than those obtained with genomic data in both breeds. We observed that the extent of LD was slightly larger at shorter distances between SNPs in the LW population than in the LA population, which indicates that the LW population was derived from a smaller N e . Estimates of N e based on genomic data were equal to 53 and 40 for the current populations of LA and LW, respectively. The correlation of linkage phase between the two breeds was equal to 0.77 at distances up to 50 kb, which suggests that genome-wide association and selection should be performed within breed. Although selection intensities have been stronger in the LA breed than in the LW breed, levels of genomic and pedigree inbreeding were lower for the LA than for the LW breed. The use of genomic data to evaluate population diversity in livestock animals can provide new and more precise insights about the effects of intense selection for production traits. Resulting information and knowledge can be used to effectively increase response to selection by appropriately managing the rate of inbreeding, minimizing negative effects of inbreeding
Daniel Zanetti Scherrer
Full Text Available Background:Evidences suggest that paraoxonase 1 (PON1 confers important antioxidant and anti-inflammatory properties when associated with high-density lipoprotein (HDL.Objective:To investigate the relationships between p.Q192R SNP of PON1, biochemical parameters and carotid atherosclerosis in an asymptomatic, normolipidemic Brazilian population sample.Methods:We studied 584 volunteers (females n = 326, males n = 258; 19-75 years of age. Total genomic DNA was extracted and SNP was detected in the TaqMan® SNP OpenArray® genotyping platform (Applied Biosystems, Foster City, CA. Plasma lipoproteins and apolipoproteins were determined and PON1 activity was measured using paraoxon as a substrate. High-resolution β-mode ultrasonography was used to measure cIMT and the presence of carotid atherosclerotic plaques in a subgroup of individuals (n = 317.Results:The presence of p.192Q was associated with a significant increase in PON1 activity (RR = 12.30 (11.38; RQ = 46.96 (22.35; QQ = 85.35 (24.83 μmol/min; p Conclusion:In low-risk individuals, the presence of the p.192Q variant of PON1 is associated with a beneficial plasma lipid profile but not with carotid atherosclerosis.
Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten
Full Text Available An increasing interest is being placed in the detection of genes, or genomic regions, that have been targeted by selection because identifying signatures of selection can lead to a better understanding of genotype-phenotype relationships. A common strategy for the detection of selection signatures is to compare samples from distinct populations and to search for genomic regions with outstanding genetic differentiation. The aim of this study was to detect selective signatures in layer chicken populations using a recently proposed approach, hapFLK, which exploits linkage disequilibrium information while accounting appropriately for the hierarchical structure of populations. We performed the analysis on 70 individuals from three commercial layer breeds (White Leghorn, White Rock and Rhode Island Red, genotyped for approximately 1 million SNPs. We found a total of 41 and 107 regions with outstanding differentiation or similarity using hapFLK and its single SNP counterpart FLK respectively. Annotation of selection signature regions revealed various genes and QTL corresponding to productions traits, for which layer breeds were selected. A number of the detected genes were associated with growth and carcass traits, including IGF-1R, AGRP and STAT5B. We also annotated an interesting gene associated with the dark brown feather color mutational phenotype in chickens (SOX10. We compared FST, FLK and hapFLK and demonstrated that exploiting linkage disequilibrium information and accounting for hierarchical population structure decreased the false detection rate.
Struan F A Grant
Full Text Available Recently an association was demonstrated between the single nucleotide polymorphism (SNP, rs9939609, within the FTO locus and obesity as a consequence of a genome wide association (GWA study of type 2 diabetes in adults. We examined the effects of two perfect surrogates for this SNP plus 11 other SNPs at this locus with respect to our childhood obesity cohort, consisting of both Caucasians and African Americans (AA. Utilizing data from our ongoing GWA study in our cohort of 418 Caucasian obese children (BMI>or=95th percentile, 2,270 Caucasian controls (BMI<95th percentile, 578 AA obese children and 1,424 AA controls, we investigated the association of the previously reported variation at the FTO locus with the childhood form of this disease in both ethnicities. The minor allele frequencies (MAF of rs8050136 and rs3751812 (perfect surrogates for rs9939609 i.e. both r(2 = 1 in the Caucasian cases were 0.448 and 0.443 respectively while they were 0.391 and 0.386 in Caucasian controls respectively, yielding for both an odds ratio (OR of 1.27 (95% CI 1.08-1.47; P = 0.0022. Furthermore, the MAFs of rs8050136 and rs3751812 in the AA cases were 0.449 and 0.115 respectively while they were 0.436 and 0.090 in AA controls respectively, yielding an OR of 1.05 (95% CI 0.91-1.21; P = 0.49 and of 1.31 (95% CI 1.050-1.643; P = 0.017 respectively. Investigating all 13 SNPs present on the Illumina HumanHap550 BeadChip in this region of linkage disequilibrium, rs3751812 was the only SNP conferring significant risk in AA. We have therefore replicated and refined the association in an AA cohort and distilled a tag-SNP, rs3751812, which captures the ancestral origin of the actual mutation. As such, variants in the FTO gene confer a similar magnitude of risk of obesity to children as to their adult counterparts and appear to have a global impact.
Sahana, Goutam; Guldbrandtsen, Bernt; Bendixen, Christian
Six genomic regions affecting clinical mastitis were identified through a GWAS study with imputed BovineHD chip genotype data in the Nordic Holstein cattle population. The association analyses were carried out using a SNP-by-SNP analysis by fitting the regression of allele dosage and a polygenic...... Effect Predictor (VEP) vers. 2.6 using ENSEMBL vers. 67 databases. Candidate polymorphisms affecting clinical mastitis were selected based on their association with the traits and functional annotations. A strong positional candidate gene for mastitis resistance on chromosome-6 is the NPFFR2 which...... Factor Receptor Alpha (LIFR) emerged as a strong candidate gene for mastitis resistance. The LIFR gene is involved in acute phase response and is expressed in saliva and mammary gland....
The release of build 10.2 of the swine genome was a marked improvement over previous builds and has proven extremely useful. However, as most know, there are regions of the genome that this particular build does not accurately represent. For instance, nearly 25% of the 62,162 SNP on the Illumina Por...
Doran, Anthony G; Berry, Donagh P; Creevey, Christopher J
Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation. Following adjustment for false discovery (q-value carcass traits using a single SNP regression approach. Using a Bayesian approach, 46 QTL were associated (posterior probability > 0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway. A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth such as glucagon and leptin. Several biological pathways, including PPAR signaling, were
Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.
In this thesis the results are described of investigations of various application of genome wide SNP (single nucleotide polymorphism) markers. The set of SNP markers was identified by GBS (genotyping by sequencing) strategy. The resulting dataset of 129,156 SNPs across 83 tetraploid varieties was
Bol, Sebastiaan M.; Moerland, Perry D.; Limou, Sophie; van Remmerden, Yvonne; Coulonges, Cédric; van Manen, Daniëlle; Herbeck, Joshua T.; Fellay, Jacques; Sieberer, Margit; Sietzema, Jantine G.; van 't Slot, Ruben; Martinson, Jeremy; Zagury, Jean-François; Schuitemaker, Hanneke; van 't Wout, Angélique B.
Background HIV-1 infected macrophages play an important role in rendering resting T cells permissive for infection, in spreading HIV-1 to T cells, and in the pathogenesis of AIDS dementia. During highly active anti-retroviral treatment (HAART), macrophages keep producing virus because tissue penetration of antiretrovirals is suboptimal and the efficacy of some is reduced. Thus, to cure HIV-1 infection with antiretrovirals we will also need to efficiently inhibit viral replication in macrophages. The majority of the current drugs block the action of viral enzymes, whereas there is an abundance of yet unidentified host factors that could be targeted. We here present results from a genome-wide association study identifying novel genetic polymorphisms that affect in vitro HIV-1 replication in macrophages. Methodology/Principal Findings Monocyte-derived macrophages from 393 blood donors were infected with HIV-1 and viral replication was determined using Gag p24 antigen levels. Genomic DNA from individuals with macrophages that had relatively low (n = 96) or high (n = 96) p24 production was used for SNP genotyping with the Illumina 610 Quad beadchip. A total of 494,656 SNPs that passed quality control were tested for association with HIV-1 replication in macrophages, using linear regression. We found a strong association between in vitro HIV-1 replication in monocyte-derived macrophages and SNP rs12483205 in DYRK1A (p = 2.16×10−5). While the association was not genome-wide significant (p<1×10−7), we could replicate this association using monocyte-derived macrophages from an independent group of 31 individuals (p = 0.0034). Combined analysis of the initial and replication cohort increased the strength of the association (p = 4.84×10−6). In addition, we found this SNP to be associated with HIV-1 disease progression in vivo in two independent cohort studies (p = 0.035 and p = 0.0048). Conclusions/Significance These findings suggest that
Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick
Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP
Psychosis Endophenotypes International Consortium; Wellcome Trust Case-Control Consortium; Bramon, E.; Pirinen, M.; Strange, A.; Lin, K.; Freeman, C.; Bellenguez, C.; Su, Z.; Band, G.; Pearson, R.; Vukcevic, D.; Langford, C.; Deloukas, P.; Hunt, S.
BACKGROUND: Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories. METHODS: 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 69...
Tosato, Sarah; Myin-germeys, Inez; Barroso, Ines; Bender, Stephan; Giegling, Ina; Arranz, Maria J.; Donnelly, Peter; Bellenguez, Celine; Brown, Matthew A.; Lawrie, Stephen; Kalaydjieva, Luba; Vukcevic, Damjan; Kahn, Rene S.; Dronov, Serge; Walshe, Muriel
Background: Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories.Methods: 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 695,19...
Davies, G; Harris, S E; Reynolds, C A; Payton, A; Knight, H M; Liewald, D C; Lopez, L M; Luciano, M; Gow, A J; Corley, J; Henderson, R; Murray, C; Pattie, A; Fox, H C; Redmond, P; Lutz, M W; Chiba-Falek, O; Linnertz, C; Saith, S; Haggarty, P; McNeill, G; Ke, X; Ollier, W; Horan, M; Roses, A D; Ponting, C P; Porteous, D J; Tenesa, A; Pickles, A; Starr, J M; Whalley, L J; Pedersen, N L; Pendleton, N; Visscher, P M; Deary, I J
Cognitive decline is a feared aspect of growing old. It is a major contributor to lower quality of life and loss of independence in old age. We investigated the genetic contribution to individual differences in nonpathological cognitive ageing in five cohorts of older adults. We undertook a genome-wide association analysis using 549 692 single-nucleotide polymorphisms (SNPs) in 3511 unrelated adults in the Cognitive Ageing Genetics in England and Scotland (CAGES) project. These individuals have detailed longitudinal cognitive data from which phenotypes measuring each individual's cognitive changes were constructed. One SNP--rs2075650, located in TOMM40 (translocase of the outer mitochondrial membrane 40 homolog)--had a genome-wide significant association with cognitive ageing (P=2.5 × 10(-8)). This result was replicated in a meta-analysis of three independent Swedish cohorts (P=2.41 × 10(-6)). An Apolipoprotein E (APOE) haplotype (adjacent to TOMM40), previously associated with cognitive ageing, had a significant effect on cognitive ageing in the CAGES sample (P=2.18 × 10(-8); females, P=1.66 × 10(-11); males, P=0.01). Fine SNP mapping of the TOMM40/APOE region identified both APOE (rs429358; P=3.66 × 10(-11)) and TOMM40 (rs11556505; P=2.45 × 10(-8)) as loci that were associated with cognitive ageing. Imputation and conditional analyses in the discovery and replication cohorts strongly suggest that this effect is due to APOE (rs429358). Functional genomic analysis indicated that SNPs in the TOMM40/APOE region have a functional, regulatory non-protein-coding effect. The APOE region is significantly associated with nonpathological cognitive ageing. The identity and mechanism of one or multiple causal variants remain unclear.
Len J Wade
Full Text Available The rapid progress in rice genotyping must be matched by advances in phenotyping. A better understanding of genetic variation in rice for drought response, root traits, and practical methods for studying them are needed. In this study, the OryzaSNP set (20 diverse genotypes that have been genotyped for SNP markers was phenotyped in a range of field and container studies to study the diversity of rice root growth and response to drought. Of the root traits measured across more than 20 root experiments, root dry weight showed the most stable genotypic performance across studies. The environment (E component had the strongest effect on yield and root traits. We identified genomic regions correlated with root dry weight, percent deep roots, maximum root depth, and grain yield based on a correlation analysis with the phenotypes and aus, indica, or japonica introgression regions using the SNP data. Two genomic regions were identified as hot spots in which root traits and grain yield were co-located; on chromosome 1 (39.7-40.7 Mb and on chromosome 8 (20.3-21.9 Mb. Across experiments, the soil type/ growth medium showed more correlations with plant growth than the container dimensions. Although the correlations among studies and genetic co-location of root traits from a range of study systems points to their potential utility to represent responses in field studies, the best correlations were observed when the two setups had some similar properties. Due to the co-location of the identified genomic regions (from introgression block analysis with QTL for a number of previously reported root and drought traits, these regions are good candidates for detailed characterization to contribute to understanding rice improvement for response to drought. This study also highlights the utility of characterizing a small set of 20 genotypes for root growth, drought response, and related genomic regions.
Lu, Timothy Tehua; Lao, Oscar; Nothnagel, Michael
of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself. A second marker set, specifically selected for ancestry sensitivity using singular value decomposition, performed even more poorly and was no more capable......Genetic matching potentially provides a means to alleviate the effects of incomplete Mendelian randomization in population-based gene-disease association studies. We therefore evaluated the genetic-matched pair study design on the basis of genome-wide SNP data (309,790 markers; Affymetrix Gene......Chip Human Mapping 500K Array) from 2457 individuals, sampled at 23 different recruitment sites across Europe. Using pair-wise identity-by-state (IBS) as a matching criterion, we tried to derive a subset of markers that would allow identification of the best overall matching (BOM) partner for a given...
Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896
Full Text Available Genome-wide association studies (GWASs have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART. With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study.
Full Text Available Abstract Pre-harvest sprouting (PHS is a major abiotic factor affecting grain weight and quality, and is caused by an early break in seed dormancy. Association mapping (AM is used to detect correlations between phenotypes and genotypes based on linkage disequilibrium (LD in wheat breeding programs. We evaluated seed dormancy in 80 Chinese wheat founder parents in five environments and performed a genome-wide association study using 6,057 markers, including 93 simple sequence repeat (SSR, 1,472 diversity array technology (DArT, and 4,492 single nucleotide polymorphism (SNP markers. The general linear model (GLM and the mixed linear model (MLM were used in this study, and two significant markers (tPt-7980 and wPt-6457 were identified. Both markers were located on Chromosome 1B, with wPt-6457 having been identified in a previously reported chromosomal position. The significantly associated loci contain essential information for cloning genes related to resistance to PHS and can be used in wheat breeding programs.
Yang, Wenzhao; Tempelman, Robert J
Hierarchical mixed effects models have been demonstrated to be powerful for predicting genomic merit of livestock and plants, on the basis of high-density single-nucleotide polymorphism (SNP) marker panels, and their use is being increasingly advocated for genomic predictions in human health. Two particularly popular approaches, labeled BayesA and BayesB, are based on specifying all SNP-associated effects to be independent of each other. BayesB extends BayesA by allowing a large proportion of SNP markers to be associated with null effects. We further extend these two models to specify SNP effects as being spatially correlated due to the chromosomally proximal effects of causal variants. These two models, that we respectively dub as ante-BayesA and ante-BayesB, are based on a first-order nonstationary antedependence specification between SNP effects. In a simulation study involving 20 replicate data sets, each analyzed at six different SNP marker densities with average LD levels ranging from r(2) = 0.15 to 0.31, the antedependence methods had significantly (P 0. 24) with differences exceeding 3%. A cross-validation study was also conducted on the heterogeneous stock mice data resource (http://mus.well.ox.ac.uk/mouse/HS/) using 6-week body weights as the phenotype. The antedependence methods increased cross-validation prediction accuracies by up to 3.6% compared to their classical counterparts (P benchmark data sets and demonstrated that the antedependence methods were more accurate than their classical counterparts for genomic predictions, even for individuals several generations beyond the training data.
Filippi, Carla V; Aguirre, Natalia; Rivas, Juan G; Zubrzycki, Jeremias; Puebla, Andrea; Cordes, Diego; Moreno, Maria V; Fusari, Corina M; Alvarez, Daniel; Heinz, Ruth A; Hopp, Horacio E; Paniego, Norma B; Lia, Veronica V
Argentina has a long tradition of sunflower breeding, and its germplasm is a valuable genetic resource worldwide. However, knowledge of the genetic constitution and variability levels of the Argentinean germplasm is still scarce, rendering the global map of cultivated sunflower diversity incomplete. In this study, 42 microsatellite loci and 384 single nucleotide polymorphisms (SNPs) were used to characterize the first association mapping population used for quantitative trait loci mapping in sunflower, along with a selection of allied open-pollinated and composite populations from the germplasm bank of the National Institute of Agricultural Technology of Argentina. The ability of different kinds of markers to assess genetic diversity and population structure was also evaluated. The analysis of polymorphism in the set of sunflower accessions studied here showed that both the microsatellites and SNP markers were informative for germplasm characterization, although to different extents. In general, the estimates of genetic variability were moderate. The average genetic diversity, as quantified by the expected heterozygosity, was 0.52 for SSR loci and 0.29 for SNPs. Within SSR markers, those derived from non-coding regions were able to capture higher levels of diversity than EST-SSR. A significant correlation was found between SSR and SNP- based genetic distances among accessions. Bayesian and multivariate methods were used to infer population structure. Evidence for the existence of three different genetic groups was found consistently across data sets (i.e., SSR, SNP and SSR + SNP), with the maintainer/restorer status being the most prevalent characteristic associated with group delimitation. The present study constitutes the first report comparing the performance of SSR and SNP markers for population genetics analysis in cultivated sunflower. We show that the SSR and SNP panels examined here, either used separately or in conjunction, allowed consistent
Douglas Mark Ruden
Full Text Available This paper describes a new program SnpSift for filtering differential DNA sequence variants between two or more experimental genomes after genotoxic chemical exposure. Here, we illustrate how SnpSift can be used to identify candidate phenotype-relevant variants including single nucleotide polymorphisms (SNPs, multiple nucleotide polymorphisms (MNPs, insertions and deletions (InDels in mutant strains isolated from genome-wide chemical mutagenesis of Drosophila melanogaster. First, the genomes of two independently-isolated mutant fly strains that are allelic for a novel recessive male-sterile locus generated by genotoxic chemical exposure were sequenced using the Illumina next-generation DNA sequencer to obtain 20- to 29-fold coverage of the euchromatic sequences. The sequencing reads were processed and variants were called using standard bioinformatic tools. Next, SnpEff was used to annotate all sequence variants and their potential mutational effects on associated genes. Then, SnpSift was used to filter and select differential variants that potentially disrupt a common gene in the two allelic mutant strains. The potential causative DNA lesions were partially validated by capillary sequencing of PCR-amplified DNA in the genetic interval as defined by meiotic mapping and deletions that remove defined regions of the chromosome. Of the five candidate genes located in the genetic interval, the Pka-like gene CG12069 was found to carry a separate premature stop codon mutation in each of the two allelic mutants whereas the other 4 candidate genes within the interval have wild-type sequences. The Pka-like gene is therefore a strong candidate gene for the male-sterile locus. These results demonstrate that combining SnpEff and SnpSift can expedite the identification of candidate phenotype-causative mutations in chemically-mutagenized Drosophila strains. This technique can also be used to characterize the variety of mutations generated by genotoxic
Javanrouh, Niloufar; Daneshpour, Maryam S; Soltanian, Ali Reza; Tapak, Leili
Obesity is a serious health problem that leads to low quality of life and early mortality. To the purpose of prevention and gene therapy for such a worldwide disease, genome wide association study is a powerful tool for finding SNPs associated with increased risk of obesity. To conduct an association analysis, kernel machine regression is a generalized regression method, has an advantage of considering the epistasis effects as well as the correlation between individuals due to unknown factors. In this study, information of the people who participated in Tehran cardio-metabolic genetic study was used. They were genotyped for the chromosomal region, evaluation 986 variations located at 16q12.2; build 38hg. Kernel machine regression and single SNP analysis were used to assess the association between obesity and SNPs genotyped data. We found that associated SNP sets with obesity, were almost in the FTO (P = 0.01), AIKTIP (P = 0.02) and MMP2 (P = 0.02) genes. Moreover, two SNPs, i.e., rs10521296 and rs11647470, showed significant association with obesity using kernel regression (P = 0.02). In conclusion, significant sets were randomly distributed throughout the region with more density around the FTO, AIKTIP and MMP2 genes. Furthermore, two intergenic SNPs showed significant association after using kernel machine regression. Therefore, more studies have to be conducted to assess their functionality or precise mechanism. Copyright © 2018 Elsevier B.V. All rights reserved.
Li, Shengting; Ma, Lijia; Li, Heng
Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical...
Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc
Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...
Seyerle, Amanda A; Lin, Henry J; Gogarten, Stephanie M; Stilp, Adrienne; Méndez Giráldez, Raul; Soliman, Elsayed; Baldassari, Antoine; Graff, Mariaelisa; Heckbert, Susan; Kerr, Kathleen F; Kooperberg, Charles; Rodriguez, Carlos; Guo, Xiuqing; Yao, Jie; Sotoodehnia, Nona; Taylor, Kent D; Whitsel, Eric A; Rotter, Jerome I; Laurie, Cathy C; Avery, Christy L
PR interval (PR) is a heritable electrocardiographic measure of atrial and atrioventricular nodal conduction. Changes in PR duration may be associated with atrial fibrillation, heart failure and all-cause mortality. Hispanic/Latino populations have high burdens of cardiovascular morbidity and mortality, are highly admixed and represent exceptional opportunities for novel locus identification. However, they remain chronically understudied. We present the first genome-wide association study (GWAS) of PR in 14 756 participants of Hispanic/Latino ancestry from three studies. Study-specific summary results of the association between 1000 Genomes Phase 1 imputed single-nucleotide polymorphisms (SNPs) and PR assumed an additive genetic model and were adjusted for global ancestry, study centre/region and clinical covariates. Results were combined using fixed-effects, inverse variance weighted meta-analysis. Sequential conditional analyses were used to identify independent signals. Replication of novel loci was performed in populations of Asian, African and European descent. ENCODE and RoadMap data were used to annotate results. We identified a novel genome-wide association (PPR at ID2 (rs6730558), which replicated in Asian and European populations (PPR loci to Hispanics/Latinos. Bioinformatics annotation provided evidence for regulatory function in cardiac tissue. Further, for six loci that generalised, the Hispanic/Latino index SNP was genome-wide significant and identical to (or in high linkage disequilibrium with) the previously identified GWAS lead SNP. Our results suggest that genetic determinants of PR are consistent across race/ethnicity, but extending studies to admixed populations can identify novel associations, underscoring the importance of conducting genetic studies in diverse populations. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise
Ashraf, Bilal; Janss, Luc; Jensen, Just
sample). The GBSeq data can be used directly in genomic models in the form of individual SNP allele-frequency estimates (e.g., reference reads/total reads per polymorphic site per individual), but is subject to measurement error due to the low sequencing depth per individual. Due to technical reasons....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...
Beaty, Terri H; Murray, Jeffrey C; Marazita, Mary L
Case-parent trios were used in a genome-wide association study of cleft lip with and without cleft palate. SNPs near two genes not previously associated with cleft lip with and without cleft palate (MAFB, most significant SNP rs13041247, with odds ratio (OR) per minor allele = 0.704, 95% CI 0.635...
Full Text Available The 6q25.1 locus was first identified via a genome-wide association study (GWAS in Chinese women and marked by single nucleotide polymorphism (SNP rs2046210, approximately 180 Kb upstream of ESR1. There have been conflicting reports about the association of this locus with breast cancer in Europeans, and a GWAS in Europeans identified a different SNP, tagged here by rs12662670. We examined the associations of both SNPs in up to 61,689 cases and 58,822 controls from forty-four studies collaborating in the Breast Cancer Association Consortium, of which four studies were of Asian and 39 of European descent. Logistic regression was used to estimate odds ratios (OR and 95% confidence intervals (CI. Case-only analyses were used to compare SNP effects in Estrogen Receptor positive (ER+ versus negative (ER- tumours. Models including both SNPs were fitted to investigate whether the SNP effects were independent. Both SNPs are significantly associated with breast cancer risk in both ethnic groups. Per-allele ORs are higher in Asian than in European studies [rs2046210: OR (A/G = 1.36 (95% CI 1.26-1.48, p = 7.6 × 10(-14 in Asians and 1.09 (95% CI 1.07-1.11, p = 6.8 × 10(-18 in Europeans. rs12662670: OR (G/T = 1.29 (95% CI 1.19-1.41, p = 1.2 × 10(-9 in Asians and 1.12 (95% CI 1.08-1.17, p = 3.8 × 10(-9 in Europeans]. SNP rs2046210 is associated with a significantly greater risk of ER- than ER+ tumours in Europeans [OR (ER- = 1.20 (95% CI 1.15-1.25, p = 1.8 × 10(-17 versus OR (ER+ = 1.07 (95% CI 1.04-1.1, p = 1.3 × 10(-7, p(heterogeneity = 5.1 × 10(-6]. In these Asian studies, by contrast, there is no clear evidence of a differential association by tumour receptor status. Each SNP is associated with risk after adjustment for the other SNP. These results suggest the presence of two variants at 6q25.1 each independently associated with breast cancer risk in Asians and in Europeans. Of these two, the one tagged by rs2046210 is associated with a greater
Acikel, Cengizhan; Aydin Son, Yesim; Celik, Cemil; Gul, Husamettin
Multifactor dimensionality reduction (MDR) is a nonparametric approach that can be used to detect relevant interactions between single-nucleotide polymorphisms (SNPs). The aim of this study was to build the best genomic model based on SNP associations and to identify candidate polymorphisms that are the underlying molecular basis of the bipolar disorders. This study was performed on Whole-Genome Association Study of Bipolar Disorder (dbGaP [database of Genotypes and Phenotypes] study accession number: phs000017.v3.p1) data. After preprocessing of the genotyping data, three classification-based data mining methods (ie, random forest, naïve Bayes, and k-nearest neighbor) were performed. Additionally, as a nonparametric, model-free approach, the MDR method was used to evaluate the SNP profiles. The validity of these methods was evaluated using true classification rate, recall (sensitivity), precision (positive predictive value), and F-measure. Random forests, naïve Bayes, and k-nearest neighbors identified 16, 13, and ten candidate SNPs, respectively. Surprisingly, the top six SNPs were reported by all three methods. Random forests and k-nearest neighbors were more successful than naïve Bayes, with recall values >0.95. On the other hand, MDR generated a model with comparable predictive performance based on five SNPs. Although different SNP profiles were identified in MDR compared to the classification-based models, all models mapped SNPs to the DOCK10 gene. Three classification-based data mining approaches, random forests, naïve Bayes, and k-nearest neighbors, have prioritized similar SNP profiles as predictors of bipolar disorders, in contrast to MDR, which has found different SNPs through analysis of two-way and three-way interactions. The reduced number of associated SNPs discovered by MDR, without loss in the classification performance, would facilitate validation studies and decision support models, and would reduce the cost to develop predictive and
Fernández Ana I
Full Text Available Abstract Background The traditional strategy to map QTL is to use linkage analysis employing a limited number of markers. These analyses report wide QTL confidence intervals, making very difficult to identify the gene and polymorphisms underlying the QTL effects. The arrival of genome-wide panels of SNPs makes available thousands of markers increasing the information content and therefore the likelihood of detecting and fine mapping QTL regions. The aims of the current study are to confirm previous QTL regions for growth and body composition traits in different generations of an Iberian x Landrace intercross (IBMAP and especially identify new ones with narrow confidence intervals by employing the PorcineSNP60 BeadChip in linkage analyses. Results Three generations (F3, Backcross 1 and Backcross 2 of the IBMAP and their related animals were genotyped with PorcineSNP60 BeadChip. A total of 8,417 SNPs equidistantly distributed across autosomes were selected after filtering by quality, position and frequency to perform the QTL scan. The joint and separate analyses of the different IBMAP generations allowed confirming QTL regions previously identified in chromosomes 4 and 6 as well as new ones mainly for backfat thickness in chromosomes 4, 5, 11, 14 and 17 and shoulder weight in chromosomes 1, 2, 9 and 13; and many other to the chromosome-wide signification level. In addition, most of the detected QTLs displayed narrow confidence intervals, making easier the selection of positional candidate genes. Conclusions The use of higher density of markers has allowed to confirm results obtained in previous QTL scans carried out with microsatellites. Moreover several new QTL regions have been now identified in regions probably not covered by markers in previous scans, most of these QTLs displayed narrow confidence intervals. Finally, prominent putative biological and positional candidate genes underlying those QTL effects are listed based on recent porcine
Davies, G; Armstrong, N; Bis, J C; Bressler, J; Chouraki, V; Giddaluru, S; Hofer, E; Ibrahim-Verbaas, C A; Kirin, M; Lahti, J; van der Lee, S J; Le Hellard, S; Liu, T; Marioni, R E; Oldmeadow, C; Postmus, I; Smith, A V; Smith, J A; Thalamuthu, A; Thomson, R; Vitart, V; Wang, J; Yu, L; Zgaga, L; Zhao, W; Boxall, R; Harris, S E; Hill, W D; Liewald, D C; Luciano, M; Adams, H; Ames, D; Amin, N; Amouyel, P; Assareh, A A; Au, R; Becker, J T; Beiser, A; Berr, C; Bertram, L; Boerwinkle, E; Buckley, B M; Campbell, H; Corley, J; De Jager, P L; Dufouil, C; Eriksson, J G; Espeseth, T; Faul, J D; Ford, I; Scotland, Generation; Gottesman, R F; Griswold, M E; Gudnason, V; Harris, T B; Heiss, G; Hofman, A; Holliday, E G; Huffman, J; Kardia, S L R; Kochan, N; Knopman, D S; Kwok, J B; Lambert, J-C; Lee, T; Li, G; Li, S-C; Loitfelder, M; Lopez, O L; Lundervold, A J; Lundqvist, A; Mather, K A; Mirza, S S; Nyberg, L; Oostra, B A; Palotie, A; Papenberg, G; Pattie, A; Petrovic, K; Polasek, O; Psaty, B M; Redmond, P; Reppermund, S; Rotter, J I; Schmidt, H; Schuur, M; Schofield, P W; Scott, R J; Steen, V M; Stott, D J; van Swieten, J C; Taylor, K D; Trollor, J; Trompet, S; Uitterlinden, A G; Weinstein, G; Widen, E; Windham, B G; Jukema, J W; Wright, A F; Wright, M J; Yang, Q; Amieva, H; Attia, J R; Bennett, D A; Brodaty, H; de Craen, A J M; Hayward, C; Ikram, M A; Lindenberger, U; Nilsson, L-G; Porteous, D J; Räikkönen, K; Reinvang, I; Rudan, I; Sachdev, P S; Schmidt, R; Schofield, P R; Srikanth, V; Starr, J M; Turner, S T; Weir, D R; Wilson, J F; van Duijn, C; Launer, L; Fitzpatrick, A L; Seshadri, S; Mosley, T H; Deary, I J
General cognitive function is substantially heritable across the human life course from adolescence to old age. We investigated the genetic contribution to variation in this important, health- and well-being-related trait in middle-aged and older adults. We conducted a meta-analysis of genome-wide association studies of 31 cohorts (N=53 949) in which the participants had undertaken multiple, diverse cognitive tests. A general cognitive function phenotype was tested for, and created in each cohort by principal component analysis. We report 13 genome-wide significant single-nucleotide polymorphism (SNP) associations in three genomic regions, 6q16.1, 14q12 and 19q13.32 (best SNP and closest gene, respectively: rs10457441, P=3.93 × 10−9, MIR2113; rs17522122, P=2.55 × 10−8, AKAP6; rs10119, P=5.67 × 10−9, APOE/TOMM40). We report one gene-based significant association with the HMGN1 gene located on chromosome 21 (P=1 × 10−6). These genes have previously been associated with neuropsychiatric phenotypes. Meta-analysis results are consistent with a polygenic model of inheritance. To estimate SNP-based heritability, the genome-wide complex trait analysis procedure was applied to two large cohorts, the Atherosclerosis Risk in Communities Study (N=6617) and the Health and Retirement Study (N=5976). The proportion of phenotypic variation accounted for by all genotyped common SNPs was 29% (s.e.=5%) and 28% (s.e.=7%), respectively. Using polygenic prediction analysis, ~1.2% of the variance in general cognitive function was predicted in the Generation Scotland cohort (N=5487; P=1.5 × 10−17). In hypothesis-driven tests, there was significant association between general cognitive function and four genes previously associated with Alzheimer's disease: TOMM40, APOE, ABCG1 and MEF2C. PMID:25644384
Gretchen H. Roffler; Stephen J. Amish; Seth Smith; Ted Cosart; Marty Kardos; Michael K. Schwartz; Gordon Luikart
Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding...
Cingolani, Pablo; Platts, Adrian; Wang, Le Lily; Coon, Melissa; Nguyen, Tung; Wang, Luan; Land, Susan J.; Lu, Xiangyi; Ruden, Douglas M.
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w1118; iso-2; iso-3 strain and the reference y1; cn1 bw1 sp1 strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5′UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5′ and 3′ UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory. PMID:22728672
Li, Yongle; Ruperao, Pradeep; Batley, Jacqueline; Edwards, David; Khan, Tanveer; Colmer, Timothy D; Pang, Jiayin; Siddique, Kadambot H M; Sutton, Tim
Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs). We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS) model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3), p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS) model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.
Nicholette D Palmer
Full Text Available African Americans are disproportionately affected by type 2 diabetes (T2DM yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD and 1029 population-based controls. The most significant SNPs (n = 550 independent loci were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071, were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05. Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10(-8. SNP rs7560163 (P = 7.0×10(-9, OR (95% CI = 0.75 (0.67-0.84 is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217 were associated with T2DM (P<0.05 and reached more nominal levels of significance (P<2.5×10(-5 in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
Manichaikul, Ani; Hoffman, Eric A.; Smolonska, Joanna; Gao, Wei; Cho, Michael H.; Baumhauer, Heather; Budoff, Matthew; Austin, John H. M.; Washko, George R.; Carr, J. Jeffrey; Kaufman, Joel D.; Pottinger, Tess; Powell, Charles A.; Wijmenga, Cisca; Zanen, Pieter; Groen, Harry J. M.; Postma, Dirkje S.; Wanner, Adam; Rouhani, Farshid N.; Brantly, Mark L.; Powell, Rhea; Smith, Benjamin M.; Rabinowitz, Dan; Raffel, Leslie J.; Hinckley Stukovsky, Karen D.; Crapo, James D.; Beaty, Terri H.; Hokanson, John E.; Silverman, Edwin K.; Dupuis, Josée; O’Connor, George T.; Boezen, H. Marike; Rich, Stephen S.
Rationale: Pulmonary emphysema overlaps partially with spirometrically defined chronic obstructive pulmonary disease and is heritable, with moderately high familial clustering. Objectives: To complete a genome-wide association study (GWAS) for the percentage of emphysema-like lung on computed tomography in the Multi-Ethnic Study of Atherosclerosis (MESA) Lung/SNP Health Association Resource (SHARe) Study, a large, population-based cohort in the United States. Methods: We determined percent emphysema and upper-lower lobe ratio in emphysema defined by lung regions less than −950 HU on cardiac scans. Genetic analyses were reported combined across four race/ethnic groups: non-Hispanic white (n = 2,587), African American (n = 2,510), Hispanic (n = 2,113), and Chinese (n = 704) and stratified by race and ethnicity. Measurements and Main Results: Among 7,914 participants, we identified regions at genome-wide significance for percent emphysema in or near SNRPF (rs7957346; P = 2.2 × 10−8) and PPT2 (rs10947233; P = 3.2 × 10−8), both of which replicated in an additional 6,023 individuals of European ancestry. Both single-nucleotide polymorphisms were previously implicated as genes influencing lung function, and analyses including lung function revealed independent associations for percent emphysema. Among Hispanics, we identified a genetic locus for upper-lower lobe ratio near the α-mannosidase–related gene MAN2B1 (rs10411619; P = 1.1 × 10−9; minor allele frequency [MAF], 4.4%). Among Chinese, we identified single-nucleotide polymorphisms associated with upper-lower lobe ratio near DHX15 (rs7698250; P = 1.8 × 10−10; MAF, 2.7%) and MGAT5B (rs7221059; P = 2.7 × 10−8; MAF, 2.6%), which acts on α-linked mannose. Among African Americans, a locus near a third α-mannosidase–related gene, MAN1C1 (rs12130495; P = 9.9 × 10−6; MAF, 13.3%) was associated with percent emphysema. Conclusions: Our results suggest that some genes previously identified as
Fabyano Fonseca e Silva
Full Text Available ABSTRACT: Genome association analyses have been successful in identifying quantitative trait loci (QTLs for pig body weights measured at a single age. However, when considering the whole weight trajectories over time in the context of genome association analyses, it is important to look at the markers that affect growth curve parameters. The easiest way to consider them is via the two-step method, in which the growth curve parameters and marker effects are estimated separately, thereby resulting in a reduction of the statistical power and the precision of estimates. One efficient solution is to adopt nonlinear mixed models (NMM, which enables a joint modeling of the individual growth curves and marker effects. Our aim was to propose a genome association analysis for growth curves in pigs based on NMM as well as to compare it with the traditional two-step method. In addition, we also aimed to identify the nearest candidate genes related to significant SNP (single nucleotide polymorphism markers. The NMM presented a higher number of significant SNPs for adult weight (A and maturity rate (K, and provided a direct way to test SNP significance simultaneously for both the A and K parameters. Furthermore, all significant SNPs from the two-step method were also reported in the NMM analysis. The ontology of the three candidate genes (SH3BGRL2, MAPK14, and MYL9 derived from significant SNPs (simultaneously affecting A and K allows us to make inferences with regards to their contribution to the pig growth process in the population studied.
Full Text Available While circulating levels of soluble Intercellular Adhesion Molecule 1 (sICAM-1 have been associated with diverse conditions including myocardial infarction, stroke, malaria, and diabetes, comprehensive analysis of the common genetic determinants of sICAM-1 is not available. In a genome-wide association study conducted among 6,578 participants in the Women's Genome Health Study, we find that three SNPs at the ICAM1 (19p13.2 locus (rs1799969, rs5498 and rs281437 are non-redundantly associated with plasma sICAM-1 concentrations at a genome-wide significance level (P<5x10(-8, thus extending prior results from linkage and candidate gene studies. We also find that a single SNP (rs507666, P = 5.1x10(-29 at the ABO (9q34.2 locus is highly correlated with sICAM-1 concentrations. The novel association at the ABO locus provides evidence for a previously unknown regulatory role of histo-blood group antigens in inflammatory adhesion processes.
Deben, Christophe; Op de Beeck, Ken; Van den Bossche, Jolien; Jacobs, Julie; Lardon, Filip; Wouters, An; Peeters, Marc; Van Camp, Guy; Rolfo, Christian; Deschoolmeester, Vanessa; Pauwels, Patrick
Objectives: Two functional polymorphisms in the MDM2 promoter region, SNP309T>G and SNP285G>C, have been shown to impact MDM2 expression and cancer risk. Currently available data on the prognostic value of MDM2 SNP309 in non-small cell lung cancer (NSCLC) is contradictory and unavailable for SNP285. The goal of this study was to clarify the role of these MDM2 SNPs in the outcome of NSCLC patients. Materials and Methods: In this study we genotyped SNP309 and SNP285 in 98 NSCLC adenocarcinoma patients and determined MDM2 mRNA and protein levels. In addition, we assessed the prognostic value of these common SNPs on overall and progression free survival, taking into account the TP53 status of the tumor. Results and Conclusion: We found that the SNP285C allele, but not the SNP309G allele, was significantly associated with increased MDM2 mRNA expression levels (p = 0.025). However, we did not observe an association with MDM2 protein levels for SNP285. The SNP309G allele was significantly associated with the presence of wild type TP53 (p = 0.047) and showed a strong trend towards increased MDM2 protein levels (p = 0.068). In addition, patients harboring the SNP309G allele showed a worse overall survival, but only in the presence of wild type TP53. The SNP285C allele was significantly associated with an early age of diagnosis and metastasis. Additionally, the SNP285C allele acted as an independent predictor for worse progression free survival (HR = 3.97; 95% CI = 1.51 - 10.42; p = 0.005). Our data showed that both SNP309 (in the presence of wild type TP53) and SNP285 act as negative prognostic markers for NSCLC patients, implicating a prominent role for these variants in the outcome of these patients. PMID:28819417
Background The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome. Results Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. Conclusion The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery
Aslam Muhammad L
whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey.
Meisel, S.F.; Beeken, R.J.; Jaarsveld, C.H.M. van; Wardle, J.
AIM: We tested the hypothesis that the obesity-associated FTO SNP rs9939609 would be associated with clinically significant weight gain (>/= 5% of initial body weight) in the first year of university; a time identified as high risk for weight gain. METHODS: We collected anthropometric data from
Xu, Li-Xin; Holland, Heidrun; Kirsten, Holger; Ahnert, Peter; Krupp, Wolfgang; Bauer, Manfred; Schober, Ralf; Mueller, Wolf; Fritzsch, Dominik; Meixensberger, Jürgen; Koschny, Ronald
According to the World Health Organization gangliogliomas are classified as well-differentiated and slowly growing neuroepithelial tumors, composed of neoplastic mature ganglion and glial cells. It is the most frequent tumor entity observed in patients with long-term epilepsy. Comprehensive cytogenetic and molecular cytogenetic data including high-resolution genomic profiling (single nucleotide polymorphism (SNP)-array) of gangliogliomas are scarce but necessary for a better oncological understanding of this tumor entity. For a detailed characterization at the single cell and cell population levels, we analyzed genomic alterations of three gangliogliomas using trypsin-Giemsa banding (GTG-banding) and by spectral karyotyping (SKY) in combination with SNP-array and gene expression array experiments. By GTG and SKY, we could confirm frequently detected chromosomal aberrations (losses within chromosomes 10, 13 and 22; gains within chromosomes 5, 7, 8 and 12), and identify so far unknown genetic aberrations like the unbalanced non-reciprocal translocation t(1;18)(q21;q21). Interestingly, we report on the second so far detected ganglioglioma with ring chromosome 1. Analyses of SNP-array data from two of the tumors and respective germline DNA (peripheral blood) identified few small gains and losses and a number of copy-neutral regions with loss of heterozygosity (LOH) in germline and in tumor tissue. In comparison to germline DNA, tumor tissues did not show substantial regions with significant loss or gain or with newly developed LOH. Gene expression analyses of tumor-specific genes revealed similarities in the profile of the analyzed samples regarding different relevant pathways. Taken together, we describe overlapping but also distinct and novel genetic aberrations of three gangliogliomas. © 2014 Japanese Society of Neuropathology.
Strain-specific genomic diversity in the Mycobacterium tuberculosis complex (MTBC) is an important factor in pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance. Several systems have been proposed to classify MTBC strains into distinct lineages and families. Here, we investigate single-nucleotide polymorphisms (SNPs) as robust (stable) markers of genetic variation for phylogenetic analysis. We identify ∼92k SNP across a global collection of 1,601 genomes. The SNP-based phylogeny is consistent with the gold-standard regions of difference (RD) classification system. Of the ∼7k strain-specific SNPs identified, 62 markers are proposed to discriminate known circulating strains. This SNP-based barcode is the first to cover all main lineages, and classifies a greater number of sublineages than current alternatives. It may be used to classify clinical isolates to evaluate tools to control the disease, including therapeutics and vaccines whose effectiveness may vary by strain type. © 2014 Macmillan Publishers Limited.
Gonçalves da Silva, Anders; Barendse, William; Kijas, James W; Barris, Wes C; McWilliam, Sean; Bunch, Rowan J; McCullough, Russell; Harrison, Blair; Hoelzel, A Rus; England, Phillip R
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep-sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize 'bycatch'-polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand-bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single-copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms. © 2014 John Wiley & Sons Ltd.
Mota, R R; Guimarães, S E F; Fortes, M R S; Hayes, B; Silva, F F; Verardo, L L; Kelly, M J; de Campos, C F; Guimarães, J D; Wenceslau, R R; Penitente-Filho, J M; Garcia, J F; Moore, S
We performed a genome-wide mapping for the age at first calving (AFC) with the goal of annotating candidate genes that regulate fertility in Nellore cattle. Phenotypic data from 762 cows and 777k SNP genotypes from 2,992 bulls and cows were used. Single nucleotide polymorphism (SNP) effects based on the single-step GBLUP methodology were blocked into adjacent windows of 1 Megabase (Mb) to explain the genetic variance. SNP windows explaining more than 0.40% of the AFC genetic variance were identified on chromosomes 2, 8, 9, 14, 16 and 17. From these windows, we identified 123 coding protein genes that were used to build gene networks. From the association study and derived gene networks, putative candidate genes (e.g., PAPPA, PREP, FER1L6, TPR, NMNAT1, ACAD10, PCMTD1, CRH, OPKR1, NPBWR1 and NCOA2) and transcription factors (TF) (STAT1, STAT3, RELA, E2F1 and EGR1) were strongly associated with female fertility (e.g., negative regulation of luteinizing hormone secretion, folliculogenesis and establishment of uterine receptivity). Evidence suggests that AFC inheritance is complex and controlled by multiple loci across the genome. As several windows explaining higher proportion of the genetic variance were identified on chromosome 14, further studies investigating the interaction across haplotypes to better understand the molecular architecture behind AFC in Nellore cattle should be undertaken. © 2017 Blackwell Verlag GmbH.
Lane, Jérôme; McLaren, Paul J.; Dorrell, Lucy; Shianna, Kevin V.; Stemke, Amanda; Pelak, Kimberly; Moore, Stephen; Oldenburg, Johannes; Alvarez-Roman, Maria Teresa; Angelillo-Scherrer, Anne; Boehlen, Francoise; Bolton-Maggs, Paula H.B.; Brand, Brigit; Brown, Deborah; Chiang, Elaine; Cid-Haro, Ana Rosa; Clotet, Bonaventura; Collins, Peter; Colombo, Sara; Dalmau, Judith; Fogarty, Patrick; Giangrande, Paul; Gringeri, Alessandro; Iyer, Rathi; Katsarou, Olga; Kempton, Christine; Kuriakose, Philip; Lin, Judith; Makris, Mike; Manco-Johnson, Marilyn; Tsakiris, Dimitrios A.; Martinez-Picado, Javier; Mauser-Bunschoten, Evelien; Neff, Anne; Oka, Shinichi; Oyesiku, Lara; Parra, Rafael; Peter-Salonen, Kristiina; Powell, Jerry; Recht, Michael; Shapiro, Amy; Stine, Kimo; Talks, Katherine; Telenti, Amalio; Wilde, Jonathan; Yee, Thynn Thynn; Wolinsky, Steven M.; Martinson, Jeremy; Hussain, Shehnaz K.; Bream, Jay H.; Jacobson, Lisa P.; Carrington, Mary; Goedert, James J.; Haynes, Barton F.; McMichael, Andrew J.; Goldstein, David B.; Fellay, Jacques
Human genetic variation contributes to differences in susceptibility to HIV-1 infection. To search for novel host resistance factors, we performed a genome-wide association study (GWAS) in hemophilia patients highly exposed to potentially contaminated factor VIII infusions. Individuals with hemophilia A and a documented history of factor VIII infusions before the introduction of viral inactivation procedures (1979–1984) were recruited from 36 hemophilia treatment centers (HTCs), and their genome-wide genetic variants were compared with those from matched HIV-infected individuals. Homozygous carriers of known CCR5 resistance mutations were excluded. Single nucleotide polymorphisms (SNPs) and inferred copy number variants (CNVs) were tested using logistic regression. In addition, we performed a pathway enrichment analysis, a heritability analysis, and a search for epistatic interactions with CCR5 Δ32 heterozygosity. A total of 560 HIV-uninfected cases were recruited: 36 (6.4%) were homozygous for CCR5 Δ32 or m303. After quality control and SNP imputation, we tested 1 081 435 SNPs and 3686 CNVs for association with HIV-1 serostatus in 431 cases and 765 HIV-infected controls. No SNP or CNV reached genome-wide significance. The additional analyses did not reveal any strong genetic effect. Highly exposed, yet uninfected hemophiliacs form an ideal study group to investigate host resistance factors. Using a genome-wide approach, we did not detect any significant associations between SNPs and HIV-1 susceptibility, indicating that common genetic variants of major effect are unlikely to explain the observed resistance phenotype in this population. PMID:23372042
Full Text Available Abstract Background Diagnostic analysis of patients with developmental disorders has improved over recent years largely due to the use of microarray technology. Array methods that facilitate copy number analysis have enabled the diagnosis of up to 20% more patients with previously normal karyotyping results. A substantial number of patients remain undiagnosed, however. Methods and Results Using the Genome-Wide Human SNP array 6.0, we analyzed 35 patients with a developmental disorder of unknown cause and normal array comparative genomic hybridization (array CGH results, in order to characterize previously undefined genomic aberrations. We detected no seemingly pathogenic copy number aberrations. Most of the vast amount of data produced by the array was polymorphic and non-informative. Filtering of this data, based on copy number variant (CNV population frequencies as well as phenotypically relevant genes, enabled pinpointing regions of allelic homozygosity that included candidate genes correlating to the phenotypic features in four patients, but results could not be confirmed. Conclusions In this study, the use of an ultra high-resolution SNP array did not contribute to further diagnose patients with developmental disorders of unknown cause. The statistical power of these results is limited by the small size of the patient cohort, and interpretation of these negative results can only be applied to the patients studied here. We present the results of our study and the recurrence of clustered allelic homozygosity present in this material, as detected by the SNP 6.0 array.
Full Text Available A high density genetic linkage map for the complex allotetraploid crop species Brassica napus (oilseed rape was constructed in a late-generation recombinant inbred line (RIL population, using genome-wide single nucleotide polymorphism (SNP markers assayed by the Brassica 60 K Infinium BeadChip Array. The linkage map contains 9164 SNP markers covering 1832.9 cM. 1232 bins account for 7648 of the markers. A subset of 2795 SNP markers, with an average distance of 0.66 cM between adjacent markers, was applied for QTL mapping of seed colour and the cell wall fiber components acid detergent lignin (ADL, cellulose and hemicellulose. After phenotypic analyses across four different environments a total of 11 QTL were detected for seed colour and fiber traits. The high-density map considerably improved QTL resolution compared to the previous low-density maps. A previously identified major QTL with very high effects on seed colour and ADL was pinpointed to a narrow genome interval on chromosome A09, while a minor QTL explaining 8.1% to 14.1% of variation for ADL was detected on chromosome C05. Five and three QTL accounting for 4.7% to 21.9% and 7.3% to 16.9% of the phenotypic variation for cellulose and hemicellulose, respectively, were also detected. To our knowledge this is the first description of QTL for seed cellulose and hemicellulose in B. napus, representing interesting new targets for improving oil content. The high density SNP genetic map enables navigation from interesting B. napus QTL to Brassica genome sequences, giving useful new information for understanding the genetics of key seed quality traits in rapeseed.
Full Text Available Polypoid species play significant roles in agriculture and food production. Many crop species are polyploid, such as potato, wheat, strawberry, and sugarcane. Genotyping has been a daunting task for genetic studies of polyploid crops, which lags far behind the diploid crop species. Single nucleotide polymorphism (SNP array is considered to be one of, high-throughput, relatively cost-efficient and automated genotyping approaches. However, there are significant challenges for SNP identification in complex, polyploid genomes, which has seriously slowed SNP discovery and array development in polyploid species. Ploidy is a significant factor impacting SNP qualities and validation rates of SNP markers in SNP arrays, which has been proven to be a very important tool for genetic studies and molecular breeding. In this review, we (1 discussed the pros and cons of SNP array in general for high throughput genotyping, (2 presented the challenges of and solutions to SNP calling in polyploid species, (3 summarized the SNP selection criteria and considerations of SNP array design for polyploid species, (4 illustrated SNP array applications in several different polyploid crop species, then (5 discussed challenges, available software, and their accuracy comparisons for genotype calling based on SNP array data in polyploids, and finally (6 provided a series of SNP array design and genotype calling recommendations. This review presents a complete overview of SNP array development and applications in polypoid crops, which will benefit the research in molecular breeding and genetics of crops with complex genomes.
Full Text Available Most of the previously reported loci for total immunoglobulin E (IgE levels are related to Th2 cell-dependent pathways. We undertook a genome-wide association study (GWAS to identify genetic loci responsible for IgE regulation. A total of 479,940 single nucleotide polymorphisms (SNPs were tested for association with total serum IgE levels in 1180 Japanese adults. Fine-mapping with SNP imputation demonstrated 6 candidate regions: the PYHIN1/IFI16, MHC classes I and II, LEMD2, GRAMD1B, and chr13∶60576338 regions. Replication of these candidate loci in each region was assessed in 2 independent Japanese cohorts (n = 1110 and 1364, respectively. SNP rs3130941 in the HLA-C region was consistently associated with total IgE levels in 3 independent populations, and the meta-analysis yielded genome-wide significance (P = 1.07×10(-10. Using our GWAS results, we also assessed the reproducibility of previously reported gene associations with total IgE levels. Nine of 32 candidate genes identified by a literature search were associated with total IgE levels after correction for multiple testing. Our findings demonstrate that SNPs in the HLA-C region are strongly associated with total serum IgE levels in the Japanese population and that some of the previously reported genetic associations are replicated across ethnic groups.
Full Text Available The mouse double minute 2 (MDM2 gene encodes a phosphoprotein that interacts with P53 and negatively regulates its activity. The SNP309 polymorphism (T-G in the promoter of MDM2 gene has been reported to be associated with enhanced MDM2 expression and tumor development. Studies investigating the association between MDM2 SNP309 polymorphism and colorectal cancer (CRC risk reported conflicting results. We performed a meta-analysis of all available studies to explore the association of this polymorphism with CRC risk.All studies published up to July 2013 on the association between MDM2 SNP309 polymorphism and CRC risk were identified by searching electronic databases PubMed, EMBASE, and Chinese Biomedical Literature database (CBM databases. The association between the MDM2 SNP309 polymorphism and CRC risk was assessed by odds ratios (ORs together with their 95% confidence intervals (CIs.A total of 14 case-control studies including 4460 CRC cases and 4828 controls were identified. We did not find a significant association between the MDM2 SNP309 polymorphism and CRC risk in all genetic models in overall population. However, in subgroup analysis by ethnicity, significant associations were found in Asians (TG vs. TT: OR = 1.197, 95% CI = 1.055-1.358, P=0.005; GG+TG vs. TT: OR = 1.246, 95% CI = 1.106-1.404, P=0.000 and Africans. When stratified by HWE in controls, significantly increased risk was also found among the studies consistent with HWE (TG vs. TT: OR = 1.166, 95% CI = 1.037-1.311, P= 0.010. In subgroup analysis according to p53 mutation status, and gender, no any significant association was detected.The present meta-analysis suggests that the MDM2 is a candidate gene for CRC susceptibility. The MDM2 SNP309 polymorphism may be a risk factor for CRC in Asians.
Gross, Arnd; Tönjes, Anke; Scholz, Markus
When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness. We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control
Laing, Chad R; Buchanan, Cody; Taboada, Eduardo N; Zhang, Yongxiang; Karmali, Mohamed A; Thomas, James E; Gannon, Victor Pj
Many approaches have been used to study the evolution, population structure and genetic diversity of Escherichia coli O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of E. coli O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH). Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP) typing. In this study an in silico comparison of six different genotyping approaches was performed on 19 E. coli genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the E. coli O157:H7 population, and to compare genotyping methods for O157:H7 strains. In silico determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to E. coli K12 and E. coli O55:H7, O145:NM and sorbitol-fermenting O157 strains. The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping methods should provide data that can be stored centrally and
Karmali Mohamed A
Full Text Available Abstract Background Many approaches have been used to study the evolution, population structure and genetic diversity of Escherichia coli O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of E. coli O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH. Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP typing. In this study an in silico comparison of six different genotyping approaches was performed on 19 E. coli genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the E. coli O157:H7 population, and to compare genotyping methods for O157:H7 strains. Results In silico determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to E. coli K12 and E. coli O55:H7, O145:NM and sorbitol-fermenting O157 strains. Conclusion The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping
Nielsen, Rasmus; Williamson, Scott; Kim, Yuseob
of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence...
Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).
Full Text Available Intramuscular fat (IMF content and fatty acid composition affect the organoleptic quality and nutritional value of pork. A genome-wide association study was performed on 138 Duroc pigs genotyped with a 60k SNP chip to detect biologically relevant genomic variants influencing fat content and composition. Despite the limited sample size, the genome-wide association study was powerful enough to detect the association between fatty acid composition and a known haplotypic variant in SCD (SSC14 and to reveal an association of IMF and fatty acid composition in the LEPR region (SSC6. The association of LEPR was later validated with an independent set of 853 pigs using a candidate quantitative trait nucleotide. The SCD gene is responsible for the biosynthesis of oleic acid (C18:1 from stearic acid. This locus affected the stearic to oleic desaturation index (C18:1/C18:0, C18:1, and saturated (SFA and monounsaturated (MUFA fatty acids content. These effects were consistently detected in gluteus medius, longissimus dorsi, and subcutaneous fat. The association of LEPR with fatty acid composition was detected only in muscle and was, at least in part, a consequence of its effect on IMF content, with increased IMF resulting in more SFA, less polyunsaturated fatty acids (PUFA, and greater SFA/PUFA ratio. Marker substitution effects estimated with a subset of 65 animals were used to predict the genomic estimated breeding values of 70 animals born 7 years later. Although predictions with the whole SNP chip information were in relatively high correlation with observed SFA, MUFA, and C18:1/C18:0 (0.48-0.60, IMF content and composition were in general better predicted by using only SNPs at the SCD and LEPR loci, in which case the correlation between predicted and observed values was in the range of 0.36 to 0.54 for all traits. Results indicate that markers in the SCD and LEPR genes can be useful to select for optimum fatty acid profiles of pork.
Full Text Available Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs. We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3, p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.
Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16
Dalia Z. Alomari
Full Text Available Mineral concentrations in cereals are important for human health, especially for people who depend mainly on consuming cereal diet. In this study, we carried out a genome-wide association study (GWAS of calcium concentrations in wheat (Triticum aestivum L. grains using a European wheat diversity panel of 353 varieties [339 winter wheat (WW plus 14 of spring wheat (SW] and phenotypic data based on two field seasons. High genotyping densities of single-nucleotide polymorphism (SNP markers were obtained from the application of the 90k iSELECT ILLUMINA chip and a 35k Affymetrix chip. Inductively coupled plasma optical emission spectrometry (ICP-OES was used to measure the calcium concentrations of the wheat grains. Best linear unbiased estimates (BLUEs for calcium were calculated across the seasons and ranged from 288.20 to 647.50 among the varieties (μg g-1 DW with a mean equaling 438.102 (μg g-1 DW, and the heritability was 0.73. A total of 485 SNP marker–trait associations (MTAs were detected in data obtained from grains cultivated in both of the two seasons and BLUE values by considering associations with a -log10 (P-value ≥3.0. Among these SNP markers, we detected 276 markers with a positive allele effect and 209 markers with a negative allele effect. These MTAs were found on all chromosomes except chromosomes 3D, 4B, and 4D. The most significant association was located on chromosome 5A (114.5 cM and was linked to a gene encoding cation/sugar symporter activity as a potential candidate gene. Additionally, a number of candidate genes for the uptake or transport of calcium were located near significantly associated SNPs. This analysis highlights a number of genomic regions and candidate genes for further analysis as well as the challenges faced when mapping environmentally variable traits in genetically highly diverse variety panels. The research demonstrates the feasibility of the GWAS approach for illuminating the genetic architecture of
Fan, B; Gorbach, D M; Rothschild, M F
Significant progress on pig genetics and genomics research has been witnessed in recent years due to the integration of advanced molecular biology techniques, bioinformatics and computational biology, and the collaborative efforts of researchers in the swine genomics community. Progress on expanding the linkage map has slowed down, but the efforts have created a higher-resolution physical map integrating the clone map and BAC end sequence. The number of QTL mapped is still growing and most of the updated QTL mapping results are available through PigQTLdb. Additionally, expression studies using high-throughput microarrays and other gene expression techniques have made significant advancements. The number of identified non-coding RNAs is rapidly increasing and their exact regulatory functions are being explored. A publishable draft (build 10) of the swine genome sequence was available for the pig genomics community by the end of December 2010. Build 9 of the porcine genome is currently available with Ensembl annotation; manual annotation is ongoing. These drafts provide useful tools for such endeavors as comparative genomics and SNP scans for fine QTL mapping. A recent community-wide effort to create a 60K porcine SNP chip has greatly facilitated whole-genome association analyses, haplotype block construction and linkage disequilibrium mapping, which can contribute to whole-genome selection. The future 'systems biology' that integrates and optimizes the information from all research levels can enhance the pig community's understanding of the full complexity of the porcine genome. These recent technological advances and where they may lead are reviewed. Copyright © 2011 S. Karger AG, Basel.
Hari Deo eUpadhyaya
Full Text Available Identification of potential genes/alleles governing complex seed-protein content (SPC trait is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study, high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150-200 kb LD (linkage disequilibrium decay] was utilized. This led to identification of seven most effective genomic loci (genes associated [10 to 20% with 41% combined PVE (phenotypic variation explained] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line mapping population (ICC 12299 x ICC 4958 by selective genotyping. The seed-specific expression, including differential up-regulation (> 4-fold of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with high level of contrasting seed-protein content (21-22% was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait
Zhen-lin ZHANG; Jin-wei HE; Yue-juan QIN; Yun-qiu HU; Miao LI; Yu-juan LIU; Hao ZHANG; Wei-wei HU
Aim: To assess the contribution of single nucleotide polymorphisms (SNP) and haplotypes in the peroxisome proliferator-activated receptor-γ co-activator-1(PPARGC1) and adiponectin genes to normal bone mineral density (BMD) variation in healthy Chinese women and men. Methods: We performed population-based (ANOVA) and family-based (quantitative trait locus transmission disequi-librium test) association studies of PPARGC1 and adiponectin genes. SNP in the 2 genes were genotyped. BMD was measured using dual-energy X-ray absorptiometry in the lumbar spine and hip in 401 nuclear families with a total of1260 subjects, including 458 premenopausal women, 20-40 years of age; 401 post-menopausal women (mothers), 43-74 years of age; and 401 men (fathers), 49-76years of age. Results: Significant within-family association was found between the Thr394Thr polymorphism in the PPGAGC1 gene and peak BMD in the femoral neck (P=0.026). Subsequent permutations were in agreement with this significant within-family association result (P=0.016), but Thr394Thr SNP only accounted for0.7% of the variation in femoral neck peak BMD. However, no significant within-family association was detected between each SNP in the adiponect in gene and peak BMD. Although no significant association was found between BMD and SNP in the PPARGC1 and adiponectin genes in both men and postmenopausal women, haplotype 2 (T-T) in the adiponect in gene was associated with lumbar spine BMD in postmenopausal women (P=0.019). Conclusion: Our findings sug-gest that Thr394Thr SNP in the PPARGC1 gene was associated with peak BMD in the femoral neck in Chinese women. Confirmation of our results is needed in other populations and with more functional markers within and flanking the PPARGC1 or adiponectin genes region.
Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D
Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to
Full Text Available The transcription factor NRF2 plays a pivotal role in protecting normal cells from external toxic challenges and oxidative stress, whereas it can also endow cancer cells resistance to anticancer drugs. At present little information is available about the genetic polymorphisms of the NRF2 gene and their clinical relevance. We aimed to investigate the single nucleotide polymorphisms in the NRF2 gene as a prognostic biomarker in lung cancer.We prepared genomic DNA samples from 387 Japanese patients with primary lung cancer and detected SNP (c.-617C>A; rs6721961 in the ARE-like loci of the human NRF2 gene by the rapid genetic testing method we developed in this study. We then analyzed the association between the SNP in the NRF2 gene and patients' overall survival.Patients harboring wild-type (WT homozygous (c.-617C/C, SNP heterozygous (c.-617C/A, and SNP homozygous (c.-617A/A alleles numbered 216 (55.8%, 147 (38.0%, and 24 (6.2%, respectively. Multivariate logistic regression models revealed that SNP homozygote (c.-617A/A was significantly related to gender. Its frequency was four-fold higher in female patients than in males (10.8% female vs 2.7% male and was associated with female non-smokers with adenocarcinoma. Interestingly, lung cancer patients carrying NRF2 SNP homozygous alleles (c.-617A/A and the 309T (WT allele in the MDM2 gene exhibited remarkable survival over 1,700 days after surgical operation (log-rank p = 0.021.SNP homozygous (c.-617A/A alleles in the NRF2 gene are associated with female non-smokers with adenocarcinoma and regarded as a prognostic biomarker for assessing overall survival of patients with lung adenocarcinoma.
Wattacheril, Julia; Lavine, Joel E; Chalasani, Naga P; Guo, Xiuqing; Kwon, Soonil; Schwimmer, Jeffrey; Molleston, Jean P; Loomba, Rohit; Brunt, Elizabeth M; Chen, Yii-Der Ida; Goodarzi, Mark O; Taylor, Kent D; Yates, Katherine P; Tonascia, James; Rotter, Jerome I
To identify genetic loci associated with features of histologic severity of nonalcoholic fatty liver disease in a cohort of Hispanic boys. There were 234 eligible Hispanic boys age 2-17 years with clinical, laboratory, and histologic data enrolled in the Nonalcoholic Steatohepatitis Clinical Research Network included in the analysis of 624 297 single nucleotide polymorphisms (SNPs). After the elimination of 4 outliers and 22 boys with cryptic relatedness, association analyses were performed on 208 DNA samples with corresponding liver histology. Logistic regression analyses were carried out for qualitative traits and linear regression analyses were applied for quantitative traits. The median age and body mass index z-score were 12.0 years (IQR, 11.0-14.0) and 2.4 (IQR, 2.1-2.6), respectively. The nonalcoholic fatty liver disease activity score (scores 1-4 vs 5-8) was associated with SNP rs11166927 on chromosome 8 in the TRAPPC9 region (P = 8.7 -07 ). Fibrosis stage was associated with SNP rs6128907 on chromosome 20, near actin related protein 5 homolog (p = 9.9 -07 ). In comparing our results in Hispanic boys with those of previously reported SNPs in adult nonalcoholic steatohepatitis, 2 of 26 susceptibility loci were associated with nonalcoholic fatty liver disease activity score and 2 were associated with fibrosis stage. In this discovery genome-wide association study, we found significant novel gene effects on histologic traits associated with nonalcoholic fatty liver disease activity score and fibrosis that are distinct from those previously recognized by adult nonalcoholic fatty liver disease genome-wide association studies. Copyright © 2017 Elsevier Inc. All rights reserved.
Cui, Peng; Ding, Feng; Lin, Qiang; Zhang, Lingfang; Li, Ang; Zhang, Zhang; Hu, Songnian; Yu, Jun
Here, we evaluate the contribution of two major biological processes—DNA replication and transcription—to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.
Here, we evaluate the contribution of two major biological processes—DNA replication and transcription—to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.
Kulkarni, Krishnanand P; Patil, Gunvant; Valliyodan, Babu; Vuong, Tri D; Shannon, J Grover; Nguyen, Henry T; Lee, Jeong-Dong
The objective of this study was to determine the genetic relationship between the oleic acid and protein content. The genotypes having high oleic acid and elevated protein (HOEP) content were crossed with five elite lines having normal oleic acid and average protein (NOAP) content. The selected accessions were grown at six environments in three different locations and phenotyped for protein, oil, and fatty acid components. The mean protein content of parents, HOEP, and NOAP lines was 34.6%, 38%, and 34.9%, respectively. The oleic acid concentration of parents, HOEP, and NOAP lines was 21.7%, 80.5%, and 20.8%, respectively. The HOEP plants carried both FAD2-1A (S117N) and FAD2-1B (P137R) mutant alleles contributing to the high oleic acid phenotype. Comparative genome analysis using whole-genome resequencing data identified six genes having single nucleotide polymorphism (SNP) significantly associated with the traits analyzed. A single SNP in the putative gene Glyma.10G275800 was associated with the elevated protein content, and palmitic, oleic, and linoleic acids. The genes from the marker intervals of previously identified QTL did not carry SNPs associated with protein content and fatty acid composition in the lines used in this study, indicating that all the genes except Glyma.10G278000 may be the new genes associated with the respective traits.
Rebekah E Oliver
Full Text Available A physically anchored consensus map is foundational to modern genomics research; however, construction of such a map in oat (Avena sativa L., 2n = 6x = 42 has been hindered by the size and complexity of the genome, the scarcity of robust molecular markers, and the lack of aneuploid stocks. Resources developed in this study include a modified SNP discovery method for complex genomes, a diverse set of oat SNP markers, and a novel chromosome-deficient SNP anchoring strategy. These resources were applied to build the first complete, physically-anchored consensus map of hexaploid oat. Approximately 11,000 high-confidence in silico SNPs were discovered based on nine million inter-varietal sequence reads of genomic and cDNA origin. GoldenGate genotyping of 3,072 SNP assays yielded 1,311 robust markers, of which 985 were mapped in 390 recombinant-inbred lines from six bi-parental mapping populations ranging in size from 49 to 97 progeny. The consensus map included 985 SNPs and 68 previously-published markers, resolving 21 linkage groups with a total map distance of 1,838.8 cM. Consensus linkage groups were assigned to 21 chromosomes using SNP deletion analysis of chromosome-deficient monosomic hybrid stocks. Alignments with sequenced genomes of rice and Brachypodium provide evidence for extensive conservation of genomic regions, and renewed encouragement for orthology-based genomic discovery in this important hexaploid species. These results also provide a framework for high-resolution genetic analysis in oat, and a model for marker development and map construction in other species with complex genomes and limited resources.
Kadri, N.K.; Koks, P.D.; Meuwissen, T.H.E.
Background: A newly recognized type of genetic variation, Copy Number Variation (CNV), is detected in mammalian genomes, e.g. the cattle genome. This form of variation can potentially cause phenotypic variation. Our objective was to determine whether dense SNP (single nucleotide polymorphisms)
Bonnici, Vincenzo; Manca, Vincenzo
In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.
Full Text Available Gene set analysis is a powerful tool for interpreting a genome-wide association study result and is gaining popularity these days. Comparison of the gene sets obtained for a variety of traits measured from a single genetic epidemiology dataset may give insights into the biological mechanisms underlying these traits. Based on the previously published single nucleotide polymorphism (SNP genotype data on 8,842 individuals enrolled in the Korea Association Resource project, we performed a series of systematic genome-wide association analyses for 49 quantitative traits of basic epidemiological, anthropometric, or blood chemistry parameters. Each analysis result was subjected to subsequent gene set analyses based on Gene Ontology (GO terms using gene set analysis software, GSA-SNP, identifying a set of GO terms significantly associated to each trait (pcorr < 0.05. Pairwise comparison of the traits in terms of the semantic similarity in their GO sets revealed surprising cases where phenotypically uncorrelated traits showed high similarity in terms of biological pathways. For example, the pH level was related to 7 other traits that showed low phenotypic correlations with it. A literature survey implies that these traits may be regulated partly by common pathways that involve neuronal or nerve systems.
Full Text Available Single nucleotide polymorphisms (SNPs are widely used in genetics and genomics research. The Pacific oyster (Crassostrea gigas is an economically and ecologically important marine bivalve, and it possesses one of the highest levels of genomic DNA variation among animal species. Pacific oyster SNPs have been extensively investigated; however, the mechanisms by which these SNPs may be used in a high-throughput, transferable, and economical manner remain to be elucidated. Here, we constructed an oyster 190K SNP array using Affymetrix Axiom genotyping technology. We designed 190,420 SNPs on the chip; these SNPs were selected from 54 million SNPs identified through re-sequencing of 472 Pacific oysters collected in China, Japan, Korea, and Canada. Our genotyping results indicated that 133,984 (70.4% SNPs were polymorphic and successfully converted on the chip. The SNPs were distributed evenly throughout the oyster genome, located in 3,595 scaffolds with a length of ~509.4 million; the average interval spacing was 4,210 bp. In addition, 111,158 SNPs were distributed in 21,050 coding genes, with an average of 5.3 SNPs per gene. In comparison with genotypes obtained through re-sequencing, ~69% of the converted SNPs had a concordance rate of >0.971; the mean concordance rate was 0.966. Evaluation based on genotypes of full-sib family individuals revealed that the average genotyping accuracy rate was 0.975. Carrying 133 K polymorphic SNPs, our oyster 190K SNP array is the first commercially available high-density SNP chip for mollusks, with the highest throughput. It represents a valuable tool for oyster genome-wide association studies, fine linkage mapping, and population genetics.
Pérez-Enciso, Miguel; Rincón, Juan C; Legarra, Andrés
The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial sequence data can improve upon existing SNP (single nucleotide polymorphism) array-based selection strategies by simulation using a mixed coalescence - gene-dropping approach. We simulated 20 or 100 causal mutations (quantitative trait nucleotides, QTN) within 65 predefined 'gene' regions, each 10 kb long, within a genome composed of ten 3-Mb chromosomes. We compared prediction accuracy by cross-validation using a medium-density chip (7.5 k SNPs), a high-density (HD, 17 k) and sequence data (335 k). Genetic evaluation was based on a GBLUP method. The simulations showed: (1) a law of diminishing returns with increasing number of SNPs; (2) a modest effect of SNP ascertainment bias in arrays; (3) a small advantage of using whole-genome sequence data vs. HD arrays i.e. ~4%; (4) a minor effect of NGS errors except when imputation error rates are high (≥20%); and (5) if QTN were known, prediction accuracy approached 1. Since this is obviously unrealistic, we explored milder assumptions. We showed that, if all SNPs within causal genes were included in the prediction model, accuracy could also dramatically increase by ~40%. However, this criterion was highly sensitive to either misspecification (including wrong genes) or to the use of an incomplete gene list; in these cases, accuracy fell rapidly towards that reached when all SNPs from sequence data were blindly included in the model. Our study shows that, unless an accurate prior estimate on the functionality of SNPs can be included in the predictor, there is a law of diminishing returns with increasing SNP density. As a result, use of whole-genome sequence data may not result in a highly increased selection response over high
Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros
The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Biazzi, Elisa; Nazzicari, Nelson; Pecetti, Luciano; Brummer, E Charles; Palmonari, Alberto; Tava, Aldo; Annicchiarico, Paolo
Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3-0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits
Bertram, Lars; Lange, Christoph; Mullin, Kristina; Parkinson, Michele; Hsiao, Monica; Hogan, Meghan F; Schjeide, Brit M M; Hooli, Basavaraj; Divito, Jason; Ionita, Iuliana; Jiang, Hongyu; Laird, Nan; Moscarillo, Thomas; Ohlsen, Kari L; Elliott, Kathryn; Wang, Xin; Hu-Lince, Diane; Ryder, Marie; Murphy, Amy; Wagner, Steven L; Blacker, Deborah; Becker, K David; Tanzi, Rudolph E
Alzheimer's disease (AD) is a genetically complex and heterogeneous disorder. To date four genes have been established to either cause early-onset autosomal-dominant AD (APP, PSEN1, and PSEN2(1-4)) or to increase susceptibility for late-onset AD (APOE5). However, the heritability of late-onset AD is as high as 80%, (6) and much of the phenotypic variance remains unexplained to date. We performed a genome-wide association (GWA) analysis using 484,522 single-nucleotide polymorphisms (SNPs) on a large (1,376 samples from 410 families) sample of AD families of self-reported European descent. We identified five SNPs showing either significant or marginally significant genome-wide association with a multivariate phenotype combining affection status and onset age. One of these signals (p = 5.7 x 10(-14)) was elicited by SNP rs4420638 and probably reflects APOE-epsilon4, which maps 11 kb proximal (r2 = 0.78). The other four signals were tested in three additional independent AD family samples composed of nearly 2700 individuals from almost 900 families. Two of these SNPs showed significant association in the replication samples (combined p values 0.007 and 0.00002). The SNP (rs11159647, on chromosome 14q31) with the strongest association signal also showed evidence of association with the same allele in GWA data generated in an independent sample of approximately 1,400 AD cases and controls (p = 0.04). Although the precise identity of the underlying locus(i) remains elusive, our study provides compelling evidence for the existence of at least one previously undescribed AD gene that, like APOE-epsilon4, primarily acts as a modifier of onset age.
Full Text Available The aim of the paper was to identify of the SNP rs23472497 associated with canine atopic dermatitis (cAD. cAD is a common inflammatory skin disease that is considered to be a naturally occurring, spontaneous model of human atopic dermatitis (eczema. The material involved 60 dogs from 6 different breeds. Canine genomic DNA was isolated from saliva by modified method with using DNAzol® and linear polyacrylamide (LPA carrier and from blood by using commercial kit NucleospinBlood and used in order to estimate rs23472497 SNP genotypes by ACRS-PCR method. The PCR products were digested with NlaIII restriction enzyme. In the population of Czech Pointer and Slovak Wirehaired Pointer we detected all genotypes AA, AG and GG with frequency 0.0732, 0.5122 and 0.4146 for Czech Pointer and 0.1818, 0.5455 and 0.2727 for Slovak Wirehaired Pointer. In Border Collie was observed heterozygote genotype AG and homozygote genotype GG with frequency 0.6667 and 0.3333, subsequently. In German Wirehaired Pointer, Australian Shepherd dog and American Staffordshire terrier we detected only genotype AG with frequency 1. The A allele was distributed with an allele frequency ranging from 0.3293 to 0.5. The G allele was distributed with an allele frequency ranging from 0.5 to 0.6707.
Brant K Peterson
Full Text Available The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via "SNP chip" microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more of individuals for a range of markers (hundreds to hundreds of thousands. Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms.
Full Text Available High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432, but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the 'Golden Delicious' genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.
Børglum, A D; Demontis, D; Grove, J
Genetic and environmental components as well as their interaction contribute to the risk of schizophrenia, making it highly relevant to include environmental factors in genetic studies of schizophrenia. This study comprises genome-wide association (GWA) and follow-up analyses of all individuals...... born in Denmark since 1981 and diagnosed with schizophrenia as well as controls from the same birth cohort. Furthermore, we present the first genome-wide interaction survey of single nucleotide polymorphisms (SNPs) and maternal cytomegalovirus (CMV) infection. The GWA analysis included 888 cases...... was found for rs7902091 (P(SNP × CMV)=7.3 × 10(-7)) in CTNNA3, a gene not previously implicated in schizophrenia, stressing the importance of including environmental factors in genetic studies....
Matthew C. McClure
Full Text Available A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS, they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800 selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR, and minor allele frequency (MAF in the Irish cattle population. Large datasets require sample and SNP quality control (QC. Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present, and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non
(polyphen-2, SNAP), as well as by the ESEfinder program, and one nonsense nsSNP was found. For noncoding ... mon type of genetic variation in the human genome that are ...... polymorphisms in type 2 diabetes mellitus and in android type.
Begum, Hasina; Spindel, Jennifer E; Lalusin, Antonio; Borromeo, Teresita; Gregorio, Glenn; Hernandez, Jose; Virk, Parminder; Collard, Bertrand; McCouch, Susan R
Genome-wide association mapping studies (GWAS) are frequently used to detect QTL in diverse collections of crop germplasm, based on historic recombination events and linkage disequilibrium across the genome. Generally, diversity panels genotyped with high density SNP panels are utilized in order to assay a wide range of alleles and haplotypes and to monitor recombination breakpoints across the genome. By contrast, GWAS have not generally been performed in breeding populations. In this study we performed association mapping for 19 agronomic traits including yield and yield components in a breeding population of elite irrigated tropical rice breeding lines so that the results would be more directly applicable to breeding than those from a diversity panel. The population was genotyped with 71,710 SNPs using genotyping-by-sequencing (GBS), and GWAS performed with the explicit goal of expediting selection in the breeding program. Using this breeding panel we identified 52 QTL for 11 agronomic traits, including large effect QTLs for flowering time and grain length/grain width/grain-length-breadth ratio. We also identified haplotypes that can be used to select plants in our population for short stature (plant height), early flowering time, and high yield, and thus demonstrate the utility of association mapping in breeding populations for informing breeding decisions. We conclude by exploring how the newly identified significant SNPs and insights into the genetic architecture of these quantitative traits can be leveraged to build genomic-assisted selection models.
Dijkstra, Akkelies E; Smolonska, Joanna; van den Berge, Maarten
by replication and meta-analysis in 11 additional cohorts. In total 2,704 subjects with, and 7,624 subjects without CMH were included, all current or former heavy smokers (≥20 pack-years). Additional studies were performed to test the functional relevance of the most significant single nucleotide polymorphism...... (SNP). RESULTS: A strong association with CMH, consistent across all cohorts, was observed with rs6577641 (p = 4.25×10(-6), OR = 1.17), located in intron 9 of the special AT-rich sequence-binding protein 1 locus (SATB1) on chromosome 3. The risk allele (G) was associated with higher mRNA expression...... of smokers develops CMH. A plausible explanation for this phenomenon is a predisposing genetic constitution. Therefore, we performed a genome wide association (GWA) study of CMH in Caucasian populations. METHODS: GWA analysis was performed in the NELSON-study using the Illumina 610 array, followed...
Albrechtsen, Anders; Nielsen, Finn Cilius; Nielsen, Rasmus
Chip-based high-throughput genotyping has facilitated genome-wide studies of genetic diversity. Many studies have utilized these large data sets to make inferences about the demographic history of human populations using measures of genetic differentiation such as F(ST) or principal component...... on direct sequencing. In addition, we also analyze publicly available genome-wide data. We demonstrate that the ascertainment biases will distort measures of human diversity and possibly change conclusions drawn from these measures in some times unexpected ways. We also show that details of the genotyping...... analyses. However, the single nucleotide polymorphism (SNP) chip data suffer from ascertainment biases caused by the SNP discovery process in which a small number of individuals from selected populations are used as discovery panels. In this study, we investigate the effect of the ascertainment bias...
Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection.
Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter
A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set
Børglum, A D; Demontis, D; Grove, J; Pallesen, J; Hollegaard, M V; Pedersen, C B; Hedemand, A; Mattheisen, M; Uitterlinden, A; Nyegaard, M; Ørntoft, T; Wiuf, C; Didriksen, M; Nordentoft, M; Nöthen, M M; Rietschel, M; Ophoff, R A; Cichon, S; Yolken, R H; Hougaard, D M; Mortensen, P B; Mors, O
Genetic and environmental components as well as their interaction contribute to the risk of schizophrenia, making it highly relevant to include environmental factors in genetic studies of schizophrenia. This study comprises genome-wide association (GWA) and follow-up analyses of all individuals born in Denmark since 1981 and diagnosed with schizophrenia as well as controls from the same birth cohort. Furthermore, we present the first genome-wide interaction survey of single nucleotide polymorphisms (SNPs) and maternal cytomegalovirus (CMV) infection. The GWA analysis included 888 cases and 882 controls, and the follow-up investigation of the top GWA results was performed in independent Danish (1396 cases and 1803 controls) and German-Dutch (1169 cases, 3714 controls) samples. The SNPs most strongly associated in the single-marker analysis of the combined Danish samples were rs4757144 in ARNTL (P=3.78 × 10(-6)) and rs8057927 in CDH13 (P=1.39 × 10(-5)). Both genes have previously been linked to schizophrenia or other psychiatric disorders. The strongest associated SNP in the combined analysis, including Danish and German-Dutch samples, was rs12922317 in RUNDC2A (P=9.04 × 10(-7)). A region-based analysis summarizing independent signals in segments of 100 kb identified a new region-based genome-wide significant locus overlapping the gene ZEB1 (P=7.0 × 10(-7)). This signal was replicated in the follow-up analysis (P=2.3 × 10(-2)). Significant interaction with maternal CMV infection was found for rs7902091 (P(SNP × CMV)=7.3 × 10(-7)) in CTNNA3, a gene not previously implicated in schizophrenia, stressing the importance of including environmental factors in genetic studies.
Full Text Available Abstract Background We report an attempt to extend the previously successful approach of combining SNP (single nucleotide polymorphism microarrays and DNA pooling (SNP-MaP employing high-density microarrays. Whereas earlier studies employed a range of Affymetrix SNP microarrays comprising from 10 K to 500 K SNPs, this most recent investigation used the 6.0 chip which displays 906,600 SNP probes and 946,000 probes for the interrogation of CNVs (copy number variations. The genotyping assay using the Affymetrix SNP 6.0 array is highly demanding on sample quality due to the small feature size, low redundancy, and lack of mismatch probes. Findings In the first study published so far using this microarray on pooled DNA, we found that pooled cheek swab DNA could not accurately predict real allele frequencies of the samples that comprised the pools. In contrast, the allele frequency estimates using blood DNA pools were reasonable, although inferior compared to those obtained with previously employed Affymetrix microarrays. However, it might be possible to improve performance by developing improved analysis methods. Conclusions Despite the decreasing costs of genome-wide individual genotyping, the pooling approach may have applications in very large-scale case-control association studies. In such cases, our study suggests that high-quality DNA preparations and lower density platforms should be preferred.
Full Text Available Schizophrenia is a devastating neuropsychiatric disorder with genetically complex traits. Genetic variants should explain a considerable portion of the risk for schizophrenia, and genome-wide association study (GWAS is a potentially powerful tool for identifying the risk variants that underlie the disease. Here, we report the results of a three-stage analysis of three independent cohorts consisting of a total of 2,535 samples from Japanese and Chinese populations for searching schizophrenia susceptibility genes using a GWAS approach. Firstly, we examined 115,770 single nucleotide polymorphisms (SNPs in 120 patient-parents trio samples from Japanese schizophrenia pedigrees. In stage II, we evaluated 1,632 SNPs (1,159 SNPs of p<0.01 and 473 SNPs of p<0.05 that located in previously reported linkage regions. The second sample consisted of 1,012 case-control samples of Japanese origin. The most significant p value was obtained for the SNP in the ELAVL2 [(embryonic lethal, abnormal vision, Drosophila-like 2] gene located on 9p21.3 (p = 0.00087. In stage III, we scrutinized the ELAVL2 gene by genotyping gene-centric tagSNPs in the third sample set of 293 family samples (1,163 individuals of Chinese descent and the SNP in the gene showed a nominal association with schizophrenia in Chinese population (p = 0.026. The current data in Asian population would be helpful for deciphering ethnic diversity of schizophrenia etiology.
Willour, Virginia L.; Seifuddin, Fayaz; Mahon, Pamela B.; Jancic, Dubravka; Pirooznia, Mehdi; Steele, Jo; Schweizer, Barbara; Goes, Fernando S.; Mondimore, Francis M.; MacKinnon, Dean F.; Perlis, Roy H.; Lee, Phil Hyoun; Huang, Jie; Kelsoe, John R.; Shilling, Paul D.; Rietschel, Marcella; Nöthen, Markus; Cichon, Sven; Gurling, Hugh; Purcell, Shaun; Smoller, Jordan W.; Craddock, Nicholas; DePaulo, J. Raymond; Schulze, Thomas G.; McMahon, Francis J.; Zandi, Peter P.; Potash, James B.
The heritable component to attempted and completed suicide is partly related to psychiatric disorders and also partly independent of them. While attempted suicide linkage regions have been identified on 2p11–12 and 6q25–26, there are likely many more such loci, the discovery of which will require a much higher resolution approach, such as the genome-wide association study (GWAS). With this in mind, we conducted an attempted suicide GWAS that compared the single nucleotide polymorphism (SNP) genotypes of 1,201 bipolar (BP) subjects with a history of suicide attempts to the genotypes of 1,497 BP subjects without a history of suicide attempts. 2,507 SNPs with evidence for association at p<0.001 were identified. These associated SNPs were subsequently tested for association in a large and independent BP sample set. None of these SNPs were significantly associated in the replication sample after correcting for multiple testing, but the combined analysis of the two sample sets produced an association signal on 2p25 (rs300774) at the threshold of genome-wide significance (p= 5.07 × 10−8). The associated SNPs on 2p25 fall in a large linkage disequilibrium block containing the ACP1 gene, a gene whose expression is significantly elevated in BP subjects who have completed suicide. Furthermore, the ACP1 protein is a tyrosine phosphatase that influences Wnt signaling, a pathway regulated by lithium, making ACP1 a functional candidate for involvement in the phenotype. Larger GWAS sample sets will be required to confirm the signal on 2p25 and to identify additional genetic risk factors increasing susceptibility for attempted suicide. PMID:21423239
Sahana, Goutam; Kadlecová, Veronika; Hornshøj, Henrik
Feed conversion ratio (FCR) is an economically important trait in pigs and feed accounts for a significant proportion of the costs involved in pig production. In this study we used a high density SNP chip panel, Porcine SNP60 BeadChip, to identify association between FCR and SNP markers and to st...
Peter M Visscher
Full Text Available We have recently developed analysis methods (GREML to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (covariation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases in particular when the traits (diseases are not measured on the same samples.
Full Text Available Peanut (Arachis hypogaea consists of two subspecies, hypogaea and fastigiata, and has been cultivated worldwide for hundreds of years. Here, 158 peanut accessions were selected to dissect the molecular footprint of agronomic traits related to domestication using specific-locus amplified fragment sequencing (SLAF-seq method. Then, a total of 17,338 high-quality single nucleotide polymorphisms (SNPs in the whole peanut genome were revealed. Eleven agronomic traits in 158 peanut accessions were subsequently analyzed using genome-wide association studies (GWAS. Candidate genes responsible for corresponding traits were then analyzed in genomic regions surrounding the peak SNPs, and 1,429 genes were found within 200 kb windows centerd on GWAS-identified peak SNPs related to domestication. Highly differentiated genomic regions were observed between hypogaea and fastigiata accessions using FST values and sequence diversity (π ratios. Among the 1,429 genes, 662 were located on chromosome A3, suggesting the presence of major selective sweeps caused by artificial selection during long domestication. These findings provide a promising insight into the complicated genetic architecture of domestication-related traits in peanut, and reveal whole-genome SNP markers of beneficial candidate genes for marker-assisted selection (MAS in future breeding programs.
Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.
Zhan, Qimin; Hu, Zhibin; He, Zhonghu; Jia, Weihua; Zhou, Yifeng; Yu, Kai; Shu, Xiao-Ou; Yuan, Jian-Min; Zheng, Wei; Zhao, Xue-Ke; Gao, She-Gan; Yuan, Zhi-Qing; Zhou, Fu-You; Fan, Zong-Min; Cui, Ji-Li; Lin, Hong-Li; Han, Xue-Na; Li, Bei; Chen, Xi; Dawsey, Sanford M.; Liao, Linda; Lee, Maxwell P.; Ding, Ti; Qiao, You-Lin; Liu, Zhihua; Liu, Yu; Yu, Dianke; Chang, Jiang; Wei, Lixuan; Gao, Yu-Tang; Koh, Woon-Puay; Xiang, Yong-Bing; Tang, Ze-Zhong; Fan, Jin-Hu; Han, Jing-Jing; Zhou, Sheng-Li; Zhang, Peng; Zhang, Dong-Yun; Yuan, Yuan; Huang, Ying; Liu, Chunling; Zhai, Kan; Qiao, Yan; Jin, Guangfu; Guo, Chuanhai; Fu, Jianhua; Miao, Xiaoping; Lu, Changdong; Yang, Haijun; Wang, Chaoyu; Wheeler, William A.; Gail, Mitchell; Yeager, Meredith; Yuenger, Jeff; Guo, Er-Tao; Li, Ai-Li; Zhang, Wei; Li, Xue-Min; Sun, Liang-Dan; Ma, Bao-Gen; Li, Yan; Tang, Sa; Peng, Xiu-Qing; Liu, Jing; Hutchinson, Amy; Jacobs, Kevin; Giffen, Carol; Burdette, Laurie; Fraumeni, Joseph F.; Shen, Hongbing; Ke, Yang; Zeng, Yixin; Wu, Tangchun; Kraft, Peter; Chung, Charles C.; Tucker, Margaret A.; Hou, Zhi-Chao; Liu, Ya-Li; Hu, Yan-Long; Liu, Yu; Wang, Li; Yuan, Guo; Chen, Li-Sha; Liu, Xiao; Ma, Teng; Meng, Hui; Sun, Li; Li, Xin-Min; Li, Xiu-Min; Ku, Jian-Wei; Zhou, Ying-Fa; Yang, Liu-Qin; Wang, Zhou; Li, Yin; Qige, Qirenwang; Yang, Wen-Jun; Lei, Guang-Yan; Chen, Long-Qi; Li, En-Min; Yuan, Ling; Yue, Wen-Bin; Wang, Ran; Wang, Lu-Wen; Fan, Xue-Ping; Zhu, Fang-Heng; Zhao, Wei-Xing; Mao, Yi-Min; Zhang, Mei; Xing, Guo-Lan; Li, Ji-Lin; Han, Min; Ren, Jing-Li; Liu, Bin; Ren, Shu-Wei; Kong, Qing-Peng; Li, Feng; Sheyhidin, Ilyar; Wei, Wu; Zhang, Yan-Rui; Feng, Chang-Wei; Wang, Jin; Yang, Yu-Hua; Hao, Hong-Zhang; Bao, Qi-De; Liu, Bao-Chi; Wu, Ai-Qun; Xie, Dong; Yang, Wan-Cai; Wang, Liang; Zhao, Xiao-Hang; Chen, Shu-Qing; Hong, Jun-Yan; Zhang, Xue-Jun; Freedman, Neal D; Goldstein, Alisa M.; Lin, Dongxin; Taylor, Philip R.; Wang, Li-Dong; Chanock, Stephen J.
We conducted a joint (pooled) analysis of three genome-wide association studies (GWAS) 1-3 of esophageal squamous cell carcinoma (ESCC) in ethnic Chinese (5,337 ESCC cases and 5,787 controls) with 9,654 ESCC cases and 10,058 controls for follow-up. In a logistic regression model adjusted for age, sex, study, and two eigenvectors, two new loci achieved genome-wide significance, marked by rs7447927 at 5q31.2 (per-allele odds ratio (OR) = 0.85, 95% CI 0.82-0.88; P=7.72x10−20) and rs1642764 at 17p13.1 (per-allele OR= 0.88, 95% CI 0.85-0.91; P=3.10x10−13). rs7447927 is a synonymous single nucleotide polymorphism (SNP) in TMEM173 and rs1642764 is an intronic SNP in ATP1B2, near TP53. Furthermore, a locus in the HLA class II region at 6p21.32 (rs35597309) achieved genome-wide significance in the two populations at highest risk for ESSC (OR=1.33, 95% CI 1.22-1.46; P=1.99x10−10). Our joint analysis identified new ESCC susceptibility loci overall as well as a new locus unique to the ESCC high risk Taihang Mountain region. PMID:25129146
Scherrer, Daniel Zanetti; Zago, Vanessa Helena de Souza; Vieira, Isabela Calanca; Parra, Eliane Soler; Panzoldo, Natália Baratella; Alexandre, Fernanda; Secolin, Rodrigo; Baracat, Jamal; Quintão, Eder Carlos Rocha; de Faria, Eliana Cotta
Background Evidences suggest that paraoxonase 1 (PON1) confers important antioxidant and anti-inflammatory properties when associated with high-density lipoprotein (HDL). Objective To investigate the relationships between p.Q192R SNP of PON1, biochemical parameters and carotid atherosclerosis in an asymptomatic, normolipidemic Brazilian population sample. Methods We studied 584 volunteers (females n = 326, males n = 258; 19-75 years of age). Total genomic DNA was extracted and SNP was detected in the TaqMan® SNP OpenArray® genotyping platform (Applied Biosystems, Foster City, CA). Plasma lipoproteins and apolipoproteins were determined and PON1 activity was measured using paraoxon as a substrate. High-resolution β-mode ultrasonography was used to measure cIMT and the presence of carotid atherosclerotic plaques in a subgroup of individuals (n = 317). Results The presence of p.192Q was associated with a significant increase in PON1 activity (RR = 12.30 (11.38); RQ = 46.96 (22.35); QQ = 85.35 (24.83) μmol/min; p < 0.0001), HDL-C (RR= 45 (37); RQ = 62 (39); QQ = 69 (29) mg/dL; p < 0.001) and apo A-I (RR = 140.76 ± 36.39; RQ = 147.62 ± 36.92; QQ = 147.49 ± 36.65 mg/dL; p = 0.019). Stepwise regression analysis revealed that heterozygous and p.192Q carriers influenced by 58% PON1 activity towards paraoxon. The univariate linear regression analysis demonstrated that p.Q192R SNP was not associated with mean cIMT; as a result, in the multiple regression analysis, no variables were selected with 5% significance. In logistic regression analysis, the studied parameters were not associated with the presence of carotid plaques. Conclusion In low-risk individuals, the presence of the p.192Q variant of PON1 is associated with a beneficial plasma lipid profile but not with carotid atherosclerosis. PMID:26039660
Full Text Available The case rate of Q fever in Europe has increased dramatically in recent years, mainly because of an epidemic in the Netherlands in 2009. Consequently, there is a need for more extensive genetic characterization of the disease agent Coxiella burnetii in order to better understand the epidemiology and spread of this disease. Genome reference data are essential for this purpose, but only thirteen genome sequences are currently available. Current methods for typing C. burnetii are criticized for having problems in comparing results across laboratories, require the use of genomic control DNA, and/or rely on markers in highly variable regions. We developed in this work a method for single nucleotide polymorphism (SNP typing of C. burnetii isolates and tissue samples based on new assays targeting ten phylogenetically stable synonymous canonical SNPs (canSNPs. These canSNPs represent previously known phylogenetic branches and were here identified from sequence comparisons of twenty-one C. burnetii genomes, eight of which were sequenced in this work. Importantly, synthetic control templates were developed, to make the method useful to laboratories lacking genomic control DNA. An analysis of twenty-one C. burnetii genomes confirmed that the species exhibits high sequence identity. Most of its SNPs (7,493/7,559 shared by >1 genome follow a clonal inheritance pattern and are therefore stable phylogenetic typing markers. The assays were validated using twenty-six genetically diverse C. burnetii isolates and three tissue samples from small ruminants infected during the epidemic in the Netherlands. Each sample was assigned to a clade. Synthetic controls (vector and PCR amplified gave identical results compared to the corresponding genomic controls and are viable alternatives to genomic DNA. The results from the described method indicate that it could be useful for cheap and rapid disease source tracking at non-specialized laboratories, which requires accurate
Full Text Available Identification of single nucleotide polymorphisms (SNPs and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool, and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozygosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov.
Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R
Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.
Bigdeli, Tim B.; Ripke, Stephan; Bacanu, Silviu-Alin; Lee, Sang Hong; Wray, Naomi R.; Gejman, Pablo V.; Rietschel, Marcella; Cichon, Sven; St Clair, David; Corvin, Aiden; Kirov, George; McQuillin, Andrew; Gurling, Hugh; Rujescu, Dan; Andreassen, Ole A.; Werge, Thomas; Blackwood, Douglas H.R.; Pato, Carlos N.; Pato, Michele T.; Malhotra, Anil K.; O’Donovan, Michael C.; Kendler, Kenneth S.; Fanous, Ayman H.
Genome-wide association studies (GWAS) of schizophrenia have yielded more than 100 common susceptibility variants, and strongly support a substantial polygenic contribution of a large number of small allelic effects. It has been hypothesized that familial schizophrenia is largely a consequence of inherited rather than environmental factors. We investigated the extent to which familiality of schizophrenia is associated with enrichment for common risk variants detectable in a large GWAS. We analyzed single nucleotide polymorphism (SNP) data for cases reporting a family history of psychotic illness (N = 978), cases reporting no such family history (N = 4,503), and unscreened controls (N = 8,285) from the Psychiatric Genomics Consortium (PGC1) study of schizophrenia. We used a multinomial logistic regression approach with model-fitting to detect allelic effects specific to either family history subgroup. We also considered a polygenic model, in which we tested whether family history positive subjects carried more schizophrenia risk alleles than family history negative subjects, on average. Several individual SNPs attained suggestive but not genome-wide significant association with either family history subgroup. Comparison of genome-wide polygenic risk scores based on GWAS summary statistics indicated a significant enrichment for SNP effects among family history positive compared to family history negative cases (Nagelkerke’s R2 = 0.0021; P = 0.00331; P-value threshold history positive compared to family history negative cases (0.32 and 0.22, respectively; P = 0.031).We found suggestive evidence of allelic effects detectable in large GWAS of schizophrenia that might be specific to particular family history subgroups. However, consideration of a polygenic risk score indicated a significant enrichment among family history positive cases for common allelic effects. Familial illness might, therefore, represent a more heritable form of schizophrenia, as suggested by
Dutra, Roberta L; Piazzon, Flavia B; Zanardo, Évelin A; Costa, Thais Virginia Moura Machado; Montenegro, Marília M; Novo-Filho, Gil M; Dias, Alexandre T; Nascimento, Amom M; Kim, Chong Ae; Kulikowski, Leslie D
Williams-Beuren syndrome (WBS) is caused by a hemizygous contiguous gene microdeletion of 1.55-1.84 Mb at 7q11.23 region. Approximately, 28 genes have been shown to contribute to classical phenotype of SWB with presence of dysmorphic facial features, supravalvular aortic stenosis (SVAS), intellectual disability, and overfriendliness. With the use of Microarray-based comparative genomic hybridization and other molecular cytogenetic techniques, is possible define with more accuracy partial or atypical deletion and refine the genotype-phenotype correlation. Here, we report on a rare genomic structural rearrangement in a boy with atypical deletion in 7q11.23 and XYY syndrome with characteristic clinical signs, but not sufficient for the diagnosis of WBS. Cytogenetic analysis of G-banding showed a karyotype 47,XYY. Analysis of DNA with the technique of MLPA (Multiplex Ligation-dependent Probe Amplification) using kits a combination of kits (P064, P036, P070, and P029) identified an atypical deletion on 7q11.23. In addition, high resolution SNP Oligonucleotide Microarray Analysis (SNP-array) confirmed the alterations found by MLPA and revealed others pathogenic CNVs, in the chromosomes 7 and X. The present report demonstrates an association not yet described in literature, between Williams-Beuren syndrome and 47,XYY. The identification of atypical deletion in 7q11.23 concomitant to additional pathogenic CNVs in others genomic regions allows a better comprehension of clinical consequences of atypical genomic rearrangements. © 2015 Wiley Periodicals, Inc.
This study was conducted as an initial assessment of a newly available genotyping assay containing about 34,000 common SNP included on previous SNP chips, and 199,000 sequence variants predicted to affect gene function. Objectives were to identify functional variants associated with birth weight in...
Full Text Available Sorghum [ (L Moench], an important grain and forage crop, is receiving significant attention as a lignocellulosic feedstock because of its water-use efficiency and high biomass yield potential. Because of the advancement of genotyping and sequencing technologies, genome-wide association study (GWAS has become a routinely used method to investigate the genetic mechanisms underlying natural phenotypic variation. In this study, we performed a GWAS for nine grain and biomass-related plant architecture traits to determine their overall genetic architecture and the specific association of allelic variants in gibberellin (GA biosynthesis and signaling genes with these phenotypes. A total of 101 single-nucleotide polymorphism (SNP representative regions were associated with at least one of the nine traits, and two of the significant markers correspond to GA candidate genes, ( and (, affecting plant height and seed number, respectively. The resolution of a previously reported quantitative trait loci (QTL for leaf angle on chromosome 7 was increased to a 1.67 Mb region containing seven candidate genes with good prospects for further investigation. This study provides new knowledge of the association of GA genes with plant architecture traits and the genomic regions controlling variation in leaf angle, stem circumference, internode number, tiller number, seed number, panicle exsertion, and panicle length. The GA gene affecting seed number variation ( and the genomic region on chromosome 7 associated with variation in leaf angle are also important outcomes of this study and represent the foundation of future validation studies needed to apply this knowledge in breeding programs.
Full Text Available Migraine is associated with an increased risk for cardiovascular disease (CVD. Both migraine and CVD are highly heritable. However, the genetic liability for CVD among migraineurs is unclear.We performed a genome-wide association study for incident CVD events during 12 years of follow-up among 5,122 migraineurs participating in the population-based Women's Genome Health Study. Migraine was self-reported and CVD events were confirmed after medical records review. We calculated odds ratios (OR and 95% confidence intervals (CI and considered a genome-wide p-value <5×10(-8 as significant.Among the 5,122 women with migraine 164 incident CVD events occurred during follow-up. No SNP was associated with major CVD, ischemic stroke, myocardial infarction, or CVD death at the genome-wide level; however, five SNPs showed association with p<5×10(-6. Among migraineurs with aura rs7698623 in MEPE (OR = 6.37; 95% CI 3.15-12.90; p = 2.7×10(-7 and rs4975709 in IRX4 (OR = 5.06; 95% CI 2.66-9.62; p = 7.7×10(-7 appeared to be associated with ischemic stroke, rs2143678 located close to MDF1 with major CVD (OR = 3.05; 95% CI 1.98-4.69; p = 4.3×10(-7, and the intergenic rs1406961 with CVD death (OR = 12.33; 95% CI 4.62-32.87; p = 5.2×10(-7. Further, rs1047964 in BACE1 appeared to be associated with CVD death among women with any migraine (OR = 4.67; 95% CI 2.53-8.62; p = 8.0×10(-7.Our results provide some suggestion for an association of five SNPs with CVD events among women with migraine; none of the results was genome-wide significant. Four associations appeared among migraineurs with aura, two of those with ischemic stroke. Although our population is among the largest with migraine and incident CVD information, these results must be treated with caution, given the limited number of CVD events among women with migraine and the low minor allele frequencies for three of the SNPs. Our results await independent replication
Gimode, Davis; Odeny, Damaris A; de Villiers, Etienne P; Wanyonyi, Solomon; Dida, Mathews M; Mneney, Emmarold E; Muchugi, Alice; Machuka, Jesse; de Villiers, Santie M
Finger millet is an important cereal crop in eastern Africa and southern India with excellent grain storage quality and unique ability to thrive in extreme environmental conditions. Since negligible attention has been paid to improving this crop to date, the current study used Next Generation Sequencing (NGS) technologies to develop both Simple Sequence Repeat (SSR) and Single Nucleotide Polymorphism (SNP) markers. Genomic DNA from cultivated finger millet genotypes KNE755 and KNE796 was sequenced using both Roche 454 and Illumina technologies. Non-organelle sequencing reads were assembled into 207 Mbp representing approximately 13% of the finger millet genome. We identified 10,327 SSRs and 23,285 non-homeologous SNPs and tested 101 of each for polymorphism across a diverse set of wild and cultivated finger millet germplasm. For the 49 polymorphic SSRs, the mean polymorphism information content (PIC) was 0.42, ranging from 0.16 to 0.77. We also validated 92 SNP markers, 80 of which were polymorphic with a mean PIC of 0.29 across 30 wild and 59 cultivated accessions. Seventy-six of the 80 SNPs were polymorphic across 30 wild germplasm with a mean PIC of 0.30 while only 22 of the SNP markers showed polymorphism among the 59 cultivated accessions with an average PIC value of 0.15. Genetic diversity analysis using the polymorphic SNP markers revealed two major clusters; one of wild and another of cultivated accessions. Detailed STRUCTURE analysis confirmed this grouping pattern and further revealed 2 sub-populations within wild E. coracana subsp. africana. Both STRUCTURE and genetic diversity analysis assisted with the correct identification of the new germplasm collections. These polymorphic SSR and SNP markers are a significant addition to the existing 82 published SSRs, especially with regard to the previously reported low polymorphism levels in finger millet. Our results also reveal an unexploited finger millet genetic resource that can be included in the regional
Full Text Available Finger millet is an important cereal crop in eastern Africa and southern India with excellent grain storage quality and unique ability to thrive in extreme environmental conditions. Since negligible attention has been paid to improving this crop to date, the current study used Next Generation Sequencing (NGS technologies to develop both Simple Sequence Repeat (SSR and Single Nucleotide Polymorphism (SNP markers. Genomic DNA from cultivated finger millet genotypes KNE755 and KNE796 was sequenced using both Roche 454 and Illumina technologies. Non-organelle sequencing reads were assembled into 207 Mbp representing approximately 13% of the finger millet genome. We identified 10,327 SSRs and 23,285 non-homeologous SNPs and tested 101 of each for polymorphism across a diverse set of wild and cultivated finger millet germplasm. For the 49 polymorphic SSRs, the mean polymorphism information content (PIC was 0.42, ranging from 0.16 to 0.77. We also validated 92 SNP markers, 80 of which were polymorphic with a mean PIC of 0.29 across 30 wild and 59 cultivated accessions. Seventy-six of the 80 SNPs were polymorphic across 30 wild germplasm with a mean PIC of 0.30 while only 22 of the SNP markers showed polymorphism among the 59 cultivated accessions with an average PIC value of 0.15. Genetic diversity analysis using the polymorphic SNP markers revealed two major clusters; one of wild and another of cultivated accessions. Detailed STRUCTURE analysis confirmed this grouping pattern and further revealed 2 sub-populations within wild E. coracana subsp. africana. Both STRUCTURE and genetic diversity analysis assisted with the correct identification of the new germplasm collections. These polymorphic SSR and SNP markers are a significant addition to the existing 82 published SSRs, especially with regard to the previously reported low polymorphism levels in finger millet. Our results also reveal an unexploited finger millet genetic resource that can be included
Aguiar, Derek; Halldórsson, Bjarni V.; Morrow, Eric M.; Istrail, Sorin
Motivation: The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A considerable portion of autism appears to be correlated with copy number variation, which is not directly probed by single nucleotide polymorphism (SNP) array or sequencing technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem partly due to the inability of algorithms to detect them. Results: In this article, we present an algorithmic framework, which we term DELISHUS, that implements three exact algorithms for inferring regions of hemizygosity containing genomic deletions of all sizes and frequencies in SNP genotype data. We implement an efficient backtracking algorithm—that processes a 1 billion entry genome-wide association study SNP matrix in a few minutes—to compute all inherited deletions in a dataset. We further extend our model to give an efficient algorithm for detecting de novo deletions. Finally, given a set of called deletions, we also give a polynomial time algorithm for computing the critical regions of recurrent deletions. DELISHUS achieves significantly lower false-positive rates and higher power than previously published algorithms partly because it considers all individuals in the sample simultaneously. DELISHUS may be applied to SNP array or sequencing data to identify the deletion spectrum for family-based association studies. Availability: DELISHUS is available at http://www.brown.edu/Research/Istrail_Lab/. Contact: Eric_Morrow@brown.edu and Sorin_Istrail@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22689755
Do, Duy Ngoc; Strathe, Anders Bjerring; Ostersen, Tage
per visit (TPV), mean feed intake per visit(FPV) and mean feed intake rate (FR) were available on 1130 boars. All boars weregenotyped using the Illumina Porcine SNP60 BeadChip. The association analyseswere performed using the GenABEL package in R. Sixteen SNPs had moderategenome-wide significant (p...... association with feeding behavior traits. Locus M1GA0016584 located close to theMSI2 gene on chromosome (SSC) 14 was very strongly associated with NVD (p =9.6E-07). Thirty six SNPs were located in genome regions where QTLs havepreviously been reported......, dephosphorylation and positive regulation of peptide secretiongenes were found highly significantly associated with feeding behavior traits byfunctional annotation. This is the first GWAS to identify genetic variants and biologicalmechanisms for feeding behavior in pigs and these results are important...
Jones, David B; Jerry, Dean R; Khatkar, Mehar S; Raadsma, Herman W; Zenger, Kyall R
The silver-lipped pearl oyster, Pinctada maxima, is an important tropical aquaculture species extensively farmed for the highly sought "South Sea" pearls. Traditional breeding programs have been initiated for this species in order to select for improved pearl quality, but many economic traits under selection are complex, polygenic and confounded with environmental factors, limiting the accuracy of selection. The incorporation of a marker-assisted selection (MAS) breeding approach would greatly benefit pearl breeding programs by allowing the direct selection of genes responsible for pearl quality. However, before MAS can be incorporated, substantial genomic resources such as genetic linkage maps need to be generated. The construction of a high-density genetic linkage map for P. maxima is not only essential for unravelling the genomic architecture of complex pearl quality traits, but also provides indispensable information on the genome structure of pearl oysters. A total of 1,189 informative genome-wide single nucleotide polymorphisms (SNPs) were incorporated into linkage map construction. The final linkage map consisted of 887 SNPs in 14 linkage groups, spans a total genetic distance of 831.7 centimorgans (cM), and covers an estimated 96% of the P. maxima genome. Assessment of sex-specific recombination across all linkage groups revealed limited overall heterochiasmy between the sexes (i.e. 1.15:1 F/M map length ratio). However, there were pronounced localised differences throughout the linkage groups, whereby male recombination was suppressed near the centromeres compared to female recombination, but inflated towards telomeric regions. Mean values of LD for adjacent SNP pairs suggest that a higher density of markers will be required for powerful genome-wide association studies. Finally, numerous nacre biomineralization genes were localised providing novel positional information for these genes. This high-density SNP genetic map is the first comprehensive linkage
Chasman, Daniel I; Fuchsberger, Christian; Pattaro, Cristian; Teumer, Alexander; Böger, Carsten A; Endlich, Karlhans; Olden, Matthias; Chen, Ming-Huei; Tin, Adrienne; Taliun, Daniel; Li, Man; Gao, Xiaoyi; Gorski, Mathias; Yang, Qiong; Hundertmark, Claudia; Foster, Meredith C; O'Seaghdha, Conall M; Glazer, Nicole; Isaacs, Aaron; Liu, Ching-Ti; Smith, Albert V; O'Connell, Jeffrey R; Struchalin, Maksim; Tanaka, Toshiko; Li, Guo; Johnson, Andrew D; Gierman, Hinco J; Feitosa, Mary F; Hwang, Shih-Jen; Atkinson, Elizabeth J; Lohman, Kurt; Cornelis, Marilyn C; Johansson, Asa; Tönjes, Anke; Dehghan, Abbas; Lambert, Jean-Charles; Holliday, Elizabeth G; Sorice, Rossella; Kutalik, Zoltan; Lehtimäki, Terho; Esko, Tõnu; Deshmukh, Harshal; Ulivi, Sheila; Chu, Audrey Y; Murgia, Federico; Trompet, Stella; Imboden, Medea; Coassin, Stefan; Pistis, Giorgio; Harris, Tamara B; Launer, Lenore J; Aspelund, Thor; Eiriksdottir, Gudny; Mitchell, Braxton D; Boerwinkle, Eric; Schmidt, Helena; Cavalieri, Margherita; Rao, Madhumathi; Hu, Frank; Demirkan, Ayse; Oostra, Ben A; de Andrade, Mariza; Turner, Stephen T; Ding, Jingzhong; Andrews, Jeanette S; Freedman, Barry I; Giulianini, Franco; Koenig, Wolfgang; Illig, Thomas; Meisinger, Christa; Gieger, Christian; Zgaga, Lina; Zemunik, Tatijana; Boban, Mladen; Minelli, Cosetta; Wheeler, Heather E; Igl, Wilmar; Zaboli, Ghazal; Wild, Sarah H; Wright, Alan F; Campbell, Harry; Ellinghaus, David; Nöthlings, Ute; Jacobs, Gunnar; Biffar, Reiner; Ernst, Florian; Homuth, Georg; Kroemer, Heyo K; Nauck, Matthias; Stracke, Sylvia; Völker, Uwe; Völzke, Henry; Kovacs, Peter; Stumvoll, Michael; Mägi, Reedik; Hofman, Albert; Uitterlinden, Andre G; Rivadeneira, Fernando; Aulchenko, Yurii S; Polasek, Ozren; Hastie, Nick; Vitart, Veronique; Helmer, Catherine; Wang, Jie Jin; Stengel, Bénédicte; Ruggiero, Daniela; Bergmann, Sven; Kähönen, Mika; Viikari, Jorma; Nikopensius, Tiit; Province, Michael; Ketkar, Shamika; Colhoun, Helen; Doney, Alex; Robino, Antonietta; Krämer, Bernhard K; Portas, Laura; Ford, Ian; Buckley, Brendan M; Adam, Martin; Thun, Gian-Andri; Paulweber, Bernhard; Haun, Margot; Sala, Cinzia; Mitchell, Paul; Ciullo, Marina; Kim, Stuart K; Vollenweider, Peter; Raitakari, Olli; Metspalu, Andres; Palmer, Colin; Gasparini, Paolo; Pirastu, Mario; Jukema, J Wouter; Probst-Hensch, Nicole M; Kronenberg, Florian; Toniolo, Daniela; Gudnason, Vilmundur; Shuldiner, Alan R; Coresh, Josef; Schmidt, Reinhold; Ferrucci, Luigi; Siscovick, David S; van Duijn, Cornelia M; Borecki, Ingrid B; Kardia, Sharon L R; Liu, Yongmei; Curhan, Gary C; Rudan, Igor; Gyllensten, Ulf; Wilson, James F; Franke, Andre; Pramstaller, Peter P; Rettig, Rainer; Prokopenko, Inga; Witteman, Jacqueline; Hayward, Caroline; Ridker, Paul M; Parsa, Afshin; Bochud, Murielle; Heid, Iris M; Kao, W H Linda; Fox, Caroline S; Köttgen, Anna
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Akkelies E Dijkstra
Full Text Available Chronic mucus hypersecretion (CMH is associated with an increased frequency of respiratory infections, excess lung function decline, and increased hospitalisation and mortality rates in the general population. It is associated with smoking, but it is unknown why only a minority of smokers develops CMH. A plausible explanation for this phenomenon is a predisposing genetic constitution. Therefore, we performed a genome wide association (GWA study of CMH in Caucasian populations.GWA analysis was performed in the NELSON-study using the Illumina 610 array, followed by replication and meta-analysis in 11 additional cohorts. In total 2,704 subjects with, and 7,624 subjects without CMH were included, all current or former heavy smokers (≥20 pack-years. Additional studies were performed to test the functional relevance of the most significant single nucleotide polymorphism (SNP.A strong association with CMH, consistent across all cohorts, was observed with rs6577641 (p = 4.25×10(-6, OR = 1.17, located in intron 9 of the special AT-rich sequence-binding protein 1 locus (SATB1 on chromosome 3. The risk allele (G was associated with higher mRNA expression of SATB1 (4.3×10(-9 in lung tissue. Presence of CMH was associated with increased SATB1 mRNA expression in bronchial biopsies from COPD patients. SATB1 expression was induced during differentiation of primary human bronchial epithelial cells in culture.Our findings, that SNP rs6577641 is associated with CMH in multiple cohorts and is a cis-eQTL for SATB1, together with our additional observation that SATB1 expression increases during epithelial differentiation provide suggestive evidence that SATB1 is a gene that affects CMH.
Full Text Available Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs in Chinese.Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs. We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height.We detected 10 significant association signals for height (p<0.05 in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41 × 10(-4 was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048. Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L, was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators.Our findings suggest the important genetic variants underlying height variation in Chinese.
Matarin, Mar; Simon-Sanchez, Javier; Fung, Hon-Chung; Scholz, Sonja; Gibbs, J. Raphael; Hernandez, Dena G.; Crews, Cynthia; Britton, Angela; Wavrant De Vrieze, Fabienne; Brott, Thomas G.; Brown, Robert D.; Worrall, Bradford B.; Silliman, Scott; Case, L. Douglas; Hardy, John A.; Rich, Stephen S.; Meschia, James F.; Singleton, Andrew B.
Technological advances in molecular genetics allow rapid and sensitive identification of genomic copy number variants (CNVs). This, in turn, has sparked interest in the function such variation may play in disease. While a role for copy number mutations as a cause of Mendelian disorders is well established, it is unclear whether CNVs may affect risk for common complex disorders. We sought to investigate whether CNVs may modulate risk for ischemic stroke (IS) and to provide a catalog of CNVs in patients with this disorder by analyzing copy number metrics produced as a part of our previous genome-wide single-nucleotide polymorphism (SNP)-based association study of ischemic stroke in a North American white population. We examined CNVs in 263 patients with ischemic stroke (IS). Each identified CNV was compared with changes identified in 275 neurologically normal controls. Our analysis identified 247 CNVs, corresponding to 187 insertions (76%; 135 heterozygous; 25 homozygous duplications or triplications; 2 heterosomic) and 60 deletions (24%; 40 heterozygous deletions;3 homozygous deletions; 14 heterosomic deletions). Most alterations (81%) were the same as, or overlapped with, previously reported CNVs. We report here the first genome-wide analysis of CNVs in IS patients. In summary, our study did not detect any common genomic structural variation unequivocally linked to IS, although we cannot exclude that smaller CNVs or CNVs in genomic regions poorly covered by this methodology may confer risk for IS. The application of genome-wide SNP arrays now facilitates the evaluation of structural changes through the entire genome as part of a genome-wide genetic association study. PMID:18288507
Newman, Anne B; Walter, Stefan; Lunetta, Kathryn L; Garcia, Melissa E; Slagboom, P Eline; Christensen, Kaare; Arnold, Alice M; Aspelund, Thor; Aulchenko, Yurii S; Benjamin, Emelia J; Christiansen, Lene; D'Agostino, Ralph B; Fitzpatrick, Annette L; Franceschini, Nora; Glazer, Nicole L; Gudnason, Vilmundur; Hofman, Albert; Kaplan, Robert; Karasik, David; Kelly-Hayes, Margaret; Kiel, Douglas P; Launer, Lenore J; Marciante, Kristin D; Massaro, Joseph M; Miljkovic, Iva; Nalls, Michael A; Hernandez, Dena; Psaty, Bruce M; Rivadeneira, Fernando; Rotter, Jerome; Seshadri, Sudha; Smith, Albert V; Taylor, Kent D; Tiemeier, Henning; Uh, Hae-Won; Uitterlinden, André G; Vaupel, James W; Walston, Jeremy; Westendorp, Rudi G J; Harris, Tamara B; Lumley, Thomas; van Duijn, Cornelia M; Murabito, Joanne M
Genome-wide association studies (GWAS) may yield insights into longevity. We performed a meta-analysis of GWAS in Caucasians from four prospective cohort studies: the Age, Gene/Environment Susceptibility-Reykjavik Study, the Cardiovascular Health Study, the Framingham Heart Study, and the Rotterdam Study participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. Longevity was defined as survival to age 90 years or older (n = 1,836); the comparison group comprised cohort members who died between the ages of 55 and 80 years (n = 1,955). In a second discovery stage, additional genotyping was conducted in the Leiden Longevity Study cohort and the Danish 1905 cohort. There were 273 single-nucleotide polymorphism (SNP) associations with p < .0001, but none reached the prespecified significance level of 5 x 10(-8). Of the most significant SNPs, 24 were independent signals, and 16 of these SNPs were successfully genotyped in the second discovery stage, with one association for rs9664222, reaching 6.77 x 10(-7) for the combined meta-analysis of CHARGE and the stage 2 cohorts. The SNP lies in a region near MINPP1 (chromosome 10), a well-conserved gene involved in regulation of cellular proliferation. The minor allele was associated with lower odds of survival past age 90 (odds ratio = 0.82). Associations of interest in a homologue of the longevity assurance gene (LASS3) and PAPPA2 were not strengthened in the second stage. Survival studies of larger size or more extreme or specific phenotypes may support or refine these initial findings.
Demirkan, A; Lahti, J; Direk, N; Viktorin, A; Lunetta, K L; Terracciano, A; Nalls, M A; Tanaka, T; Hek, K; Fornage, M; Wellmann, J; Cornelis, M C; Ollila, H M; Yu, L; Smith, J A; Pilling, L C; Isaacs, A; Palotie, A; Zhuang, W V; Zonderman, A; Faul, J D; Sutin, A; Meirelles, O; Mulas, A; Hofman, A; Uitterlinden, A; Rivadeneira, F; Perola, M; Zhao, W; Salomaa, V; Yaffe, K; Luik, A I; Liu, Y; Ding, J; Lichtenstein, P; Landén, M; Widen, E; Weir, D R; Llewellyn, D J; Murray, A; Kardia, S L R; Eriksson, J G; Koenen, K; Magnusson, P K E; Ferrucci, L; Mosley, T H; Cucca, F; Oostra, B A; Bennett, D A; Paunio, T; Berger, K; Harris, T B; Pedersen, N L; Murabito, J M; Tiemeier, H; van Duijn, C M; Räikkönen, K
Major depressive disorder (MDD) is moderately heritable, however genome-wide association studies (GWAS) for MDD, as well as for related continuous outcomes, have not shown consistent results. Attempts to elucidate the genetic basis of MDD may be hindered by heterogeneity in diagnosis. The Center for Epidemiological Studies Depression (CES-D) scale provides a widely used tool for measuring depressive symptoms clustered in four different domains which can be combined together into a total score but also can be analysed as separate symptom domains. We performed a meta-analysis of GWAS of the CES-D symptom clusters. We recruited 12 cohorts with the 20- or 10-item CES-D scale (32 528 persons). One single nucleotide polymorphism (SNP), rs713224, located near the brain-expressed melatonin receptor (MTNR1A) gene, was associated with the somatic complaints domain of depression symptoms, with borderline genome-wide significance (p discovery = 3.82 × 10-8). The SNP was analysed in an additional five cohorts comprising the replication sample (6813 persons). However, the association was not consistent among the replication sample (p discovery+replication = 1.10 × 10-6) with evidence of heterogeneity. Despite the effort to harmonize the phenotypes across cohorts and participants, our study is still underpowered to detect consistent association for depression, even by means of symptom classification. On the contrary, the SNP-based heritability and co-heritability estimation results suggest that a very minor part of the variation could be captured by GWAS, explaining the reason of sparse findings.
Gurwitz, David; Bregman-Eschet, Yael
New companies offering personal whole-genome information services over the internet are dynamic and highly visible players in the personal genomics field. For fees currently ranging from US$399 to US$2500 and a vial of saliva, individuals can now purchase online access to their individual genetic information regarding susceptibility to a range of chronic diseases and phenotypic traits based on a genome-wide SNP scan. Most of the companies offering such services are based in the United States, but their clients may come from nearly anywhere in the world. Although the scientific validity, clinical utility and potential future implications of such services are being hotly debated, several ethical and regulatory questions related to direct-to-consumer (DTC) marketing strategies of genetic tests have not yet received sufficient attention. For example, how can we minimize the risk of unauthorized third parties from submitting other people's DNA for testing? Another pressing question concerns the ownership of (genotypic and phenotypic) information, as well as the unclear legal status of customers regarding their own personal information. Current legislation in the US and Europe falls short of providing clear answers to these questions. Until the regulation of personal genomics services catches up with the technology, we call upon commercial providers to self-regulate and coordinate their activities to minimize potential risks to individual privacy. We also point out some specific steps, along the trustee model, that providers of DTC personal genomics services as well as regulators and policy makers could consider for addressing some of the concerns raised below.
Full Text Available Abstract Background Each of the human genes or transcriptional units is likely to contain single nucleotide polymorphisms that may give rise to sequence variation between individuals and tissues on the level of RNA. Based on recent studies, differential expression of the two alleles of heterozygous coding single nucleotide polymorphisms (SNPs may be frequent for human genes. Methods with high accuracy to be used in a high throughput setting are needed for systematic surveys of expressed sequence variation. In this study we evaluated two formats of multiplexed, microarray based minisequencing for quantitative detection of imbalanced expression of SNP alleles. We used a panel of ten SNPs located in five genes known to be expressed in two endothelial cell lines as our model system. Results The accuracy and sensitivity of quantitative detection of allelic imbalance was assessed for each SNP by constructing regression lines using a dilution series of mixed samples from individuals of different genotype. Accurate quantification of SNP alleles by both assay formats was evidenced for by R2 values > 0.95 for the majority of the regression lines. According to a two sample t-test, we were able to distinguish 1–9% of a minority SNP allele from a homozygous genotype, with larger variation between SNPs than between assay formats. Six of the SNPs, heterozygous in either of the two cell lines, were genotyped in RNA extracted from the endothelial cells. The coefficient of variation between the fluorescent signals from five parallel reactions was similar for cDNA and genomic DNA. The fluorescence signal intensity ratios measured in the cDNA samples were compared to those in genomic DNA to determine the relative expression levels of the two alleles of each SNP. Four of the six SNPs tested displayed a higher than 1.4-fold difference in allelic ratios between cDNA and genomic DNA. The results were verified by allele-specific oligonucleotide hybridisation and
Zhu, Caiye; Fan, Hongying; Yuan, Zehu; Hu, Shijin; Ma, Xiaomeng; Xuan, Junli; Wang, Hongwei; Zhang, Li; Wei, Caihong; Zhang, Qin; Zhao, Fuping; Du, Lixin
Chinese indigenous sheep can be classified into three types based on tail morphology: fat-tailed, fat-rumped, and thin-tailed sheep, of which the typical breeds are large-tailed Han sheep, Altay sheep, and Tibetan sheep, respectively. To unravel the genetic mechanisms underlying the phenotypic differences among Chinese indigenous sheep with tails of three different types, we used ovine high-density 600K SNP arrays to detect genome-wide copy number variation (CNV). In large-tailed Han sheep, A...
Bauchet, Guillaume; Grenier, Stéphane; Samson, Nicolas; Bonnet, Julien; Grivet, Laurent; Causse, Mathilde
A panel of 300 tomato accessions including breeding materials was built and characterized with >11,000 SNP. A population structure in six subgroups was identified. Strong heterogeneity in linkage disequilibrium and recombination landscape among groups and chromosomes was shown. GWAS identified several associations for fruit weight, earliness and plant growth. Genome-wide association studies (GWAS) have become a method of choice in quantitative trait dissection. First limited to highly polymorphic and outcrossing species, it is now applied in horticultural crops, notably in tomato. Until now GWAS in tomato has been performed on panels of heirloom and wild accessions. Using modern breeding materials would be of direct interest for breeding purpose. To implement GWAS on a large panel of 300 tomato accessions including 168 breeding lines, this study assessed the genetic diversity and linkage disequilibrium decay and revealed the population structure and performed GWA experiment. Genetic diversity and population structure analyses were based on molecular markers (>11,000 SNP) covering the whole genome. Six genetic subgroups were revealed and associated to traits of agronomical interest, such as fruit weight and disease resistance. Estimates of linkage disequilibrium highlighted the heterogeneity of its decay among genetic subgroups. Haplotype definition allowed a fine characterization of the groups and their recombination landscape revealing the patterns of admixture along the genome. Selection footprints showed results in congruence with introgressions. Taken together, all these elements refined our knowledge of the genetic material included in this panel and allowed the identification of several associations for fruit weight, plant growth and earliness, deciphering the genetic architecture of these complex traits and identifying several new loci useful for tomato breeding.
Qin, Sisi; Ingle, James N; Liu, Mohan; Yu, Jia; Wickerham, D Lawrence; Kubo, Michiaki; Weinshilboum, Richard M; Wang, Liewei
We previously performed a case-control genome-wide association study in women treated with selective estrogen receptor modulators (SERMs) for breast cancer prevention and identified single nucleotide polymorphisms (SNPs) in ZNF423 as potential biomarkers for response to SERM therapy. The ZNF423rs9940645 SNP, which is approximately 200 bp away from the estrogen response elements, resulted in the SNP, estrogen, and SERM-dependent regulation of ZNF423 expression and, "downstream", that of BRCA1. Electrophoretic mobility shift assay-mass spectrometry was performed to identify proteins binding to the ZNF423 SNP and coordinating with estrogen receptor alpha (ERα). Clustered, regularly interspaced short palindromic repeats (CRISPR)/Cas9 genome editing was applied to generate ZR75-1 breast cancer cells with different ZNF423 SNP genotypes. Both cultured cells and mouse xenograft models with different ZNF423 SNP genotypes were used to study the cellular responses to SERMs and poly(ADP-ribose) polymerase (PARP) inhibitors. We identified calmodulin-like protein 3 (CALML3) as a key sensor of this SNP and a coregulator of ERα, which contributes to differential gene transcription regulation in an estrogen and SERM-dependent fashion. Furthermore, using CRISPR/Cas9-engineered ZR75-1 breast cancer cells with different ZNF423 SNP genotypes, striking differences in cellular responses to SERMs and PARP inhibitors, alone or in combination, were observed not only in cells but also in a mouse xenograft model. Our results have demonstrated the mechanism by which the ZNF423 rs9940645 SNP might regulate gene expression and drug response as well as its potential role in achieving more highly individualized breast cancer therapy.
Akanno, Everestus C; Plastow, Graham; Fitzsimmons, Carolyn; Miller, Stephen P; Baron, Vern; Ominski, Kimberly; Basarab, John A
The aim of this study was to identify SNP markers that associate with variation in beef heifer reproduction and performance of their calves. A genome-wide association study was performed by means of the generalized quasi-likelihood score (GQLS) method using heifer genotypes from the BovineSNP50 BeadChip and estimated breeding values for pre-breeding body weight (PBW), pregnancy rate (PR), calving difficulty (CD), age at first calving (AFC), calf birth weight (BWT), calf weaning weight (WWT), and calf pre-weaning average daily gain (ADG). Data consisted of 785 replacement heifers from three Canadian research herds, namely Brandon Research Centre, Brandon, Manitoba, University of Alberta Roy Berg Kinsella Ranch, Kinsella, Alberta, and Lacombe Research Centre, Lacombe, Alberta. After applying a false discovery rate correction at a 5% significance level, a total of 4, 3, 3, 9, 6, 2, and 1 SNPs were significantly associated with PBW, PR, CD, AFC, BWT, WWT, and ADG, respectively. These SNPs were located on chromosomes 1, 5-7, 9, 13-16, 19-21, 24, 25, and 27-29. Chromosomes 1, 5, and 24 had SNPs with pleiotropic effects. New significant SNPs that impact functional traits were detected, many of which have not been previously reported. The results of this study support quantitative genetic studies related to the inheritance of these traits, and provides new knowledge regarding beef cattle quantitative trait loci effects. The identification of these SNPs provides a starting point to identify genes affecting heifer reproduction traits and performance of their calves (BWT, WWT, and ADG). They also contribute to a better understanding of the biology underlying these traits and will be potentially useful in marker- and genome-assisted selection and management.
Full Text Available Brachial circumference (BC, also known as upper arm or mid arm circumference, can be used as an indicator of muscle mass and fat tissue, which are distributed differently in men and women. Analysis of anthropometric measures of peripheral fat distribution such as BC could help in understanding the complex pathophysiology behind overweight and obesity. The purpose of this study is to identify genetic variants associated with BC through a large-scale genome-wide association scan (GWAS meta-analysis. We used fixed-effects meta-analysis to synthesise summary results across 14 GWAS discovery and 4 replication cohorts comprising overall 22,376 individuals (12,031 women and 10,345 men of European ancestry. Individual analyses were carried out for men, women, and combined across sexes using linear regression and an additive genetic model: adjusted for age and adjusted for age and BMI. We prioritised signals for follow-up in two-stages. We did not detect any signals reaching genome-wide significance. The FTO rs9939609 SNP showed nominal evidence for association (p<0.05 in the age-adjusted strata for men and across both sexes. In this first GWAS meta-analysis for BC to date, we have not identified any genome-wide significant signals and do not observe robust association of previously established obesity loci with BC. Large-scale collaborations will be necessary to achieve higher power to detect loci underlying BC.
Qi, Peng; Gimode, Davis; Saha, Dipnarayan; Schröder, Stephan; Chakraborty, Debkanta; Wang, Xuewen; Dida, Mathews M; Malmberg, Russell L; Devos, Katrien M
Research on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Nevertheless, most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Furthermore, bioinformatics pipelines often lack the flexibility to deal with paired-end reads or to be applied in polyploid species. UGbS-Flex combines publicly available software with in-house python and perl scripts to efficiently call SNPs from genotyping-by-sequencing reads irrespective of the species' ploidy level, breeding system and availability of a reference genome. Noteworthy features of the UGbS-Flex pipeline are an ability to use paired-end reads as input, an effective approach to cluster reads across samples with enhanced outputs, and maximization of SNP calling. We demonstrate use of the pipeline for the identification of several thousand high-confidence SNPs with high representation across samples in an F 3 -derived F 2 population in the allotetraploid finger millet. Robust high-density genetic maps were constructed using the time-tested mapping program MAPMAKER which we upgraded to run efficiently and in a semi-automated manner in a Windows Command Prompt Environment. We exploited comparative GBS with one of the diploid ancestors of finger millet to assign linkage groups to subgenomes and demonstrate the presence of chromosomal rearrangements. The paper combines GBS protocol modifications, a novel flexible GBS analysis pipeline, UGbS-Flex, recommendations to maximize SNP identification, updated genetic mapping software, and the first high-density maps of finger millet. The modules used in the UGbS-Flex pipeline and for genetic mapping were applied to finger millet, an allotetraploid selfing species
Verweij, K.J.H.; Vinkhuyzen, A.A.E.; Benyamin, B.; Lynskey, M.T.; Quaye, L.; Agrawal, A.; Gordon, S.D.; Montgomery, G.W.; Madden, P.A.F.; Heath, A.C.; Spector, T.D.; Martin, N.G.; Medland, S.E.
While initiation of cannabis use is around 40% heritable, not much is known about the underlying genetic aetiology. Here, we meta-analysed two genome-wide association studies of initiation of cannabis use with >10000 individuals. None of the genetic variants reached genome-wide significance. We also
Verweij, K.J.H.; Vinkhuyzen, A.A.E.; Benyamin, B.; Lynskey, M.T.; Quaye, L.; Agrawal, A.; Gordon, S.D.; Montgomery, G.W.; Madden, P.A.F.; Heath, A.C.; Spector, T.D.; Martin, N.G.; Medland, S.E.
While initiation of cannabis use is around 40% heritable, not much is known about the underlying genetic aetiology. Here, we meta-analysed two genome-wide association studies of initiation of cannabis use with > 10 000 individuals. None of the genetic variants reached genome-wide significance. We
Santos, Carla; Phillips, Christopher; Fondevila, Manuel; Daniel, Runa; van Oorschot, Roland A H; Burchard, Esteban G; Schanfield, Moses S; Souto, Luis; Uacyisrael, Jolame; Via, Marc; Carracedo, Ángel; Lareu, Maria V
The analysis of human population variation is an area of considerable interest in the forensic, medical genetics and anthropological fields. Several forensic single nucleotide polymorphism (SNP) assays provide ancestry-informative genotypes in sensitive tests designed to work with limited DNA samples, including a 34-SNP multiplex differentiating African, European and East Asian ancestries. Although assays capable of differentiating Oceanian ancestry at a global scale have become available, this study describes markers compiled specifically for differentiation of Oceanian populations. A sensitive multiplex assay, termed Pacifiplex, was developed and optimized in a small-scale test applicable to forensic analyses. The Pacifiplex assay comprises 29 ancestry-informative marker SNPs (AIM-SNPs) selected to complement the 34-plex test, that in a combined set distinguish Africans, Europeans, East Asians and Oceanians. Nine Pacific region study populations were genotyped with both SNP assays, then compared to four reference population groups from the HGDP-CEPH human diversity panel. STRUCTURE analyses estimated population cluster membership proportions that aligned with the patterns of variation suggested for each study population's currently inferred demographic histories. Aboriginal Taiwanese and Philippine samples indicated high East Asian ancestry components, Papua New Guinean and Aboriginal Australians samples were predominantly Oceanian, while other populations displayed cluster patterns explained by the distribution of divergence amongst Melanesians, Polynesians and Micronesians. Genotype data from Pacifiplex and 34-plex tests is particularly well suited to analysis of Australian Aboriginal populations and when combined with Y and mitochondrial DNA variation will provide a powerful set of markers for ancestry inference applied to modern Australian demographic profiles. On a broader geographic scale, Pacifiplex adds highly informative data for inferring the ancestry
Kerns, Sarah L.; Ostrer, Harry; Stock, Richard; Li, William; Moore, Julian; Pearlman, Alexander; Campbell, Christopher; Shao Yongzhao; Stone, Nelson; Kusnetz, Lynda; Rosenstein, Barry S.
Purpose: To identify single nucleotide polymorphisms (SNPs) associated with erectile dysfunction (ED) among African-American prostate cancer patients treated with external beam radiation therapy. Methods and Materials: A cohort of African-American prostate cancer patients treated with external beam radiation therapy was observed for the development of ED by use of the five-item Sexual Health Inventory for Men (SHIM) questionnaire. Final analysis included 27 cases (post-treatment SHIM score ≤7) and 52 control subjects (post-treatment SHIM score ≥16). A genome-wide association study was performed using approximately 909,000 SNPs genotyped on Affymetrix 6.0 arrays (Affymetrix, Santa Clara, CA). Results: We identified SNP rs2268363, located in the follicle-stimulating hormone receptor (FSHR) gene, as significantly associated with ED after correcting for multiple comparisons (unadjusted p = 5.46 x 10 -8 , Bonferroni p = 0.028). We identified four additional SNPs that tended toward a significant association with an unadjusted p value -6 . Inference of population substructure showed that cases had a higher proportion of African ancestry than control subjects (77% vs. 60%, p = 0.005). A multivariate logistic regression model that incorporated estimated ancestry and four of the top-ranked SNPs was a more accurate classifier of ED than a model that included only clinical variables. Conclusions: To our knowledge, this is the first genome-wide association study to identify SNPs associated with adverse effects resulting from radiotherapy. It is important to note that the SNP that proved to be significantly associated with ED is located within a gene whose encoded product plays a role in male gonad development and function. Another key finding of this project is that the four SNPs most strongly associated with ED were specific to persons of African ancestry and would therefore not have been identified had a cohort of European ancestry been screened. This study demonstrates
Full Text Available Single-nucleotide polymorphisms (SNPs are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout (Oncorhynchus mykiss, SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD libraries, reduced representation libraries (RRL and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1 which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup, followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs and multi-sequence variants (MSVs. Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25. The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and
Full Text Available Abstract MicroRNAs (miRNAs are a newly discovered type of small non-protein coding RNA that function in the inhibition of effective mRNA translation, and may serve as susceptibility genes for various disease developments. The SNP rs12416605, located in human type 1 diabetes IDDM10 locus, changes the seeding sequence (UGU[G/A]CCC of miRNA miR-938 and potentially alters miR-938 targets, including IL-16 and IL-17A. In an attempt to test whether miR-938 may be a susceptibility gene for IDDM10, we assessed the possible association of the miR-938 SNP with T1D in an American Caucasian cohort of 622 patients and 723 healthy controls by TaqMan assay. Our current data do not support the association between the SNP in miR-938 and type 1 diabetes.
Goudey, Benjamin; Abedini, Mani; Hopper, John L; Inouye, Michael; Makalic, Enes; Schmidt, Daniel F; Wagner, John; Zhou, Zeyu; Zobel, Justin; Reumann, Matthias
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
Li, Xiujin; Buitenhuis, Albert Johannes; Lund, Mogens Sandø
is highly consistent between the Chinese and Danish Holstein populations, such that a joint genome-wide association study (GWAS) can be performed. In this study, a joint GWAS was performed for 16 milk FA traits based on data of 784 Chinese and 371 Danish Holstein cows genotyped by a high-density bovine...... different effects in the 2 populations. Ten FA were influenced by a quantitative trait loci (QTL) region including DGAT1. Both C14:1 and the C14 index were influenced by a QTL region including SCD1 in the combined population. Other QTL regions also showed significant associations with the studied FA....... A large region (14.9–24.9 Mbp) in BTA26 significantly influenced C14:1 and the C14 index in both populations, mostly likely due to the SNP in SCD1. A QTL region (69.97–73.69 Mbp) on BTA9 showed a significantly different effect on C18:0 between the 2 populations. Detection of these important SNP...
Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong
For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809
Ramensky Vasily E
Full Text Available Abstract Background The mapping of quantitative trait loci in rat and mouse has been extremely successful in identifying chromosomal regions associated with human disease-related phenotypes. However, identifying the specific phenotype-causing DNA sequence variations within a quantitative trait locus has been much more difficult. The recent availability of genomic sequence from several mouse inbred strains (including C57BL/6J, 129X1/SvJ, 129S1/SvImJ, A/J, and DBA/2J has made it possible to catalog DNA sequence differences within a quantitative trait locus derived from crosses between these strains. However, even for well-defined quantitative trait loci ( Description To help identify functional DNA sequence variations within quantitative trait loci we have used the Ensembl annotated genome sequence to compile a database of mouse single nucleotide polymorphisms (SNPs that are predicted to cause missense, nonsense, frameshift, or splice site mutations (available at http://bioinfo.embl.it/SnpApplet/. For missense mutations we have used the PolyPhen and PANTHER algorithms to predict whether amino acid changes are likely to disrupt protein function. Conclusion We have developed a database of mouse SNPs predicted to cause missense, nonsense, frameshift, and splice-site mutations. Our analysis revealed that 20% and 14% of missense SNPs are likely to be deleterious according to PolyPhen and PANTHER, respectively, and 6% are considered deleterious by both algorithms. The database also provides gene expression and functional annotations from the Symatlas, Gene Ontology, and OMIM databases to further assess candidate phenotype-causing mutations. To demonstrate its utility, we show that Mouse SNP Miner successfully finds a previously identified candidate SNP in the taste receptor, Tas1r3, that underlies sucrose preference in the C57BL/6J strain. We also use Mouse SNP Miner to derive a list of candidate phenotype-causing mutations within a previously
Marie A. Chattaway
Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.
Zeng, Bing; Yan, Haidong; Liu, Xinchun; Zang, Wenjing; Zhang, Ailing; Zhou, Sifan; Huang, Linkai; Liu, Jinping
While orchardgrass ( Dactylis glomerata L.) is a well-known perennial forage species, rust diseases cause serious reductions in the yield and quality of orchardgrass; however, genetic mechanisms of rust resistance are not well understood in orchardgrass. In this study, a genome-wide association study (GWAS) was performed using specific-locus amplified fragment sequencing (SLAF-seq) technology in orchardgrass. A total of 2,334,889 SLAF tags were generated to produce 2,309,777 SNPs. ADMIXTURE analysis revealed unstructured subpopulations for 33 accessions, indicating that this orchardgrass population could be used for association analysis. Linkage disequilibrium (LD) analysis revealed an average r 2 of 0.4 across all SNP pairs, indicating a high extent of LD in these samples. Through GWAS, a total of 4,604 SNPs were found to be significantly ( P rust trait. The bulk analysis discovered a number of 5,211 SNPs related to rust trait. Two candidate genes, including cytochrome P450, and prolamin were implicated in disease resistance through prediction of functional genes surrounding each high-quality SNP ( P rust traits based on GWAS analysis and bulk analysis. The large number of SNPs associated with rust traits and these two candidate genes may provide the basis for further research on rust resistance mechanisms and marker-assisted selection (MAS) for rust-resistant lineages.
Nicholls, Andrew W.; Salek, Reza M.; Marques-Vidal, Pedro; Morya, Edgard; Sameshima, Koichi; Montoliu, Ivan; Da Silva, Laeticia; Collino, Sebastiano; Martin, François-Pierre; Rezzi, Serge; Steinbeck, Christoph; Waterworth, Dawn M.; Waeber, Gérard; Vollenweider, Peter; Beckmann, Jacques S.; Le Coutre, Johannes; Mooser, Vincent; Bergmann, Sven; Genick, Ulrich K.; Kutalik, Zoltán
Metabolic traits are molecular phenotypes that can drive clinical phenotypes and may predict disease progression. Here, we report results from a metabolome- and genome-wide association study on 1H-NMR urine metabolic profiles. The study was conducted within an untargeted approach, employing a novel method for compound identification. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5×10−8) and independent associations between single nucleotide polymorphisms (SNP) and metabolome features. Fifty-six of these associations replicated in the TasteSensomics cohort, comprising 601 individuals from São Paulo of vastly diverse ethnic background. They correspond to eleven gene-metabolite associations, six of which had been previously identified in the urine metabolome and three in the serum metabolome. Our key novel findings are the associations of two SNPs with NMR spectral signatures pointing to fucose (rs492602, P = 6.9×10−44) and lysine (rs8101881, P = 1.2×10−33), respectively. Fine-mapping of the first locus pinpointed the FUT2 gene, which encodes a fucosyltransferase enzyme and has previously been associated with Crohn's disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within the SLC7A9 gene, rare mutations of which have been linked to severe kidney damage. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers. PMID:24586186
Martinez, Pierre; Kimberley, Christopher; Birkbak, Nicolai Juul
Intra-tumour genetic heterogeneity (ITH) fosters drug resistance and is a critical hurdle to clinical treatment. ITH can be well-measured using multi-region sampling but this is costly and challenging to implement. There is therefore a need for tools to estimate ITH in individual samples, using...... standard genomic data such as SNP-arrays, that could be implemented routinely. We designed two novel scores S and R, respectively based on the Shannon diversity index and Ripley's L statistic of spatial homogeneity, to quantify ITH in single SNP-array samples. We created in-silico and in-vitro mixtures...... sequencing data but heterogeneity in the fraction of tumour cells present across samples hampered accurate quantification. The prognostic potential of both scores was moderate but significantly predictive of survival in several tumour types (corrected p = 0.03). Our work thus shows how individual SNP...
Núñez-Acuña, Gustavo; Aguilar-Espinoza, Andrea; Chávez-Mardones, Jacqueline; Gallardo-Escárate, Cristian
Ubiquitin-conjugated E2 enzyme (UBE2) is one of the main components of the proteasome degradation cascade. Previous studies have shown an increase of expression levels in individuals challenged to some pathogen organism such as virus and bacteria. The study was to characterize the immune response of UBE2 gene in the gastropod Concholepas concholepas through expression analysis and single nucleotide polymorphisms (SNP) discovery. Hence, UBE2 was identified from a cDNA library by 454 pyrosequencing, while SNP identification and validation were performed using De novo assembly and high resolution melting analysis. Challenge trials with Vibrio anguillarum was carried out to evaluate the relative transcript abundance of UBE2 gene from two to thirty-three hours post-treatment. The results showed a partial UBE2 sequence of 889 base pair (bp) with a partial coding region of 291 bp. SNP variation (A/C) was observed at the 546th position. Individuals challenged by V. anguillarum showed an overexpression of the UBE2 gene, the expression being significantly higher in homozygous individuals (AA) than (CC) or heterozygous individuals (A/C). This study contributes useful information relating to the UBE2 gene and its association with innate immune response in marine invertebrates. Copyright © 2012 Elsevier Ltd. All rights reserved.
Ying SU,Yi LONG,Xinjun LIAO,Huashui AI,Zhiyan ZHANG,Bin YANG,Shijun XIAO,Jianhong TANG,Wenshui XIN,Lusheng HUANG,Jun REN,Nengshui DING
Full Text Available Hair provides thermal regulation for mammals and protects the skin from wounds, bites and ultraviolet (UV radiation, and is important in adaptation to volatile environments. Pigs in nature are divided into hairy and hairless, which provide a good model for deciphering the molecular mechanisms of hairlessness. We conducted a genomic scan for genetically differentiated regions between hairy and hairless pigs using 60K SNP data, with the aim to better understand the genetic basis for the hairless phenotype in pigs. A total of 38405 SNPs in 498 animals from 36 diverse breeds were used to detect genomic signatures for pig hairlessness by estimating between-population (FST values. Seven diversifying signatures between Yucatan hairless pig and hairy pigs were identified on pig chromosomes (SSC 1, 3, 7, 8, 10, 11 and 16, and the biological functions of two notable genes, RGS17 and RB1, were revealed. When Mexican hairless pigs were contrasted with hairypigs, strong signatures were detected on SSC1 and SSC10, which harbor two functionally plausible genes, REV3L and BAMBI. KEGG pathway analysis showed a subset of overrepresented genes involved in the T cell receptor signaling pathway, MAPK signaling pathway and the tight junction pathways. All of these pathways may be important in local adaptability of hairless pigs. The potential mechanisms underlying the hairless phenotype in pigs are reported for the first time. RB1 and BAMBI are interesting candidate genes for the hairless phenotype in Yucatan hairless and Mexico hairless pigs, respectively. RGS17, REV3L, ICOS and RASGRP1 as well as other genes involved in the MAPK and T cell receptor signaling pathways may be important in environmental adaption by improved tolerance to UV damage in hairless pigs. These findings improve our understanding of the genetic basis for inherited hairlessness in pigs.
Tluczek, Audrey; Twal, Marie E; Beamer, Laura Curr; Burton, Candace W; Darmofal, Leslie; Kracun, Mary; Zanni, Karen L; Turner, Martha
Members of the Ethics and Public Policy Committee of the International Society of Nurses in Genetics prepared this article to assist nurses in interpreting the American Nurses Association (2015) Code of Ethics for Nurses with Interpretive Statements (Code) within the context of genetics/genomics. The Code explicates the nursing profession's norms and responsibilities in managing ethical issues. The nearly ubiquitous application of genetic/genomic technologies in healthcare poses unique ethical challenges for nursing. Therefore, authors conducted literature searches that drew from various professional resources to elucidate implications of the code in genetic/genomic nursing practice, education, research, and public policy. We contend that the revised Code coupled with the application of genomic technologies to healthcare creates moral obligations for nurses to continually refresh their knowledge and capacities to translate genetic/genomic research into evidence-based practice, assure the ethical conduct of scientific inquiry, and continually develop or revise national/international guidelines that protect the rights of individuals and populations within the context of genetics/genomics. Thus, nurses have an ethical responsibility to remain knowledgeable about advances in genetics/genomics and incorporate emergent evidence into their work.
Full Text Available Of all the meat quality traits, tenderness is considered the most important with regard to eating quality and market value. In this study we have utilised genome wide association studies (GWAS for peak shear force (PSF of loin muscle as a measure of tenderness for 1,976 crossbred commercial pigs, genotyped for 42,721 informative SNPs using the Illumina PorcineSNP60 Beadchip. Four 1 Mb genomic regions, three on SSC2 (at 4 Mb, 5 Mb and 109 Mb and one on SSC17 (at 20 Mb, were detected which collectively explained about 15.30% and 3.07% of the total genetic and phenotypic variance for PSF respectively. Markers ASGA0008566, ASGA0008695, DRGA0003285 and ASGA0075615 in the four regions were strongly associated with the effects. Analysis of the reference genome sequence in the region with the most important SNPs for SSC2_5 identified FRMD8, SLC25A45 and LTBP3 as potential candidate genes for meat tenderness on the basis of functional annotation of these genes. The region SSC2_109 was close to a previously reported candidate gene CAST; however, the very weak LD between DRGA0003285 (the best marker representing region SSC2_109 and CAST indicated the potential for additional genes which are distinct from, or interact with, CAST to affect meat tenderness. Limited information of known genes in regions SSC2_109 and SSC17_20 restricts further analysis. Re-sequencing of these regions for informative animals may help to resolve the molecular architecture and identify new candidate genes and causative mutations affecting this trait. These findings contribute significantly to our knowledge of the genomic regions affecting pork shear force and will potentially lead to new insights into the molecular mechanisms regulating meat tenderness.
Full Text Available Deciphering the genetic control of flowering and ripening periods in apple is essential for breeding cultivars adapted to their growing environments. We implemented a large Genome-Wide Association Study (GWAS at the European level using an association panel of 1,168 different apple genotypes distributed over six locations and phenotyped for these phenological traits. The panel was genotyped at a high-density of SNPs using the Axiom®Apple 480 K SNP array. We ran GWAS with a multi-locus mixed model (MLMM, which handles the putatively confounding effect of significant SNPs elsewhere on the genome. Genomic regions were further investigated to reveal candidate genes responsible for the phenotypic variation. At the whole population level, GWAS retained two SNPs as cofactors on chromosome 9 for flowering period, and six for ripening period (four on chromosome 3, one on chromosome 10 and one on chromosome 16 which, together accounted for 8.9 and 17.2% of the phenotypic variance, respectively. For both traits, SNPs in weak linkage disequilibrium were detected nearby, thus suggesting the existence of allelic heterogeneity. The geographic origins and relationships of apple cultivars accounted for large parts of the phenotypic variation. Variation in genotypic frequency of the SNPs associated with the two traits was connected to the geographic origin of the genotypes (grouped as North+East, West and South Europe, and indicated differential selection in different growing environments. Genes encoding transcription factors containing either NAC or MADS domains were identified as major candidates within the small confidence intervals computed for the associated genomic regions. A strong microsynteny between apple and peach was revealed in all the four confidence interval regions. This study shows how association genetics can unravel the genetic control of important horticultural traits in apple, as well as reduce the confidence intervals of the associated
Full Text Available The obesity epidemic is responsible for a substantial economic burden in developed countries and is a major risk factor for type 2 diabetes and cardiovascular disease. The disease is the result not only of several environmental risk factors, but also of genetic predisposition. To take advantage of recent advances in gene-mapping technology, we executed a genome-wide association scan to identify genetic variants associated with obesity-related quantitative traits in the genetically isolated population of Sardinia. Initial analysis suggested that several SNPs in the FTO and PFKP genes were associated with increased BMI, hip circumference, and weight. Within the FTO gene, rs9930506 showed the strongest association with BMI (p = 8.6 x10(-7, hip circumference (p = 3.4 x 10(-8, and weight (p = 9.1 x 10(-7. In Sardinia, homozygotes for the rare "G" allele of this SNP (minor allele frequency = 0.46 were 1.3 BMI units heavier than homozygotes for the common "A" allele. Within the PFKP gene, rs6602024 showed very strong association with BMI (p = 4.9 x 10(-6. Homozygotes for the rare "A" allele of this SNP (minor allele frequency = 0.12 were 1.8 BMI units heavier than homozygotes for the common "G" allele. To replicate our findings, we genotyped these two SNPs in the GenNet study. In European Americans (N = 1,496 and in Hispanic Americans (N = 839, we replicated significant association between rs9930506 in the FTO gene and BMI (p-value for meta-analysis of European American and Hispanic American follow-up samples, p = 0.001, weight (p = 0.001, and hip circumference (p = 0.0005. We did not replicate association between rs6602024 and obesity-related traits in the GenNet sample, although we found that in European Americans, Hispanic Americans, and African Americans, homozygotes for the rare "A" allele were, on average, 1.0-3.0 BMI units heavier than homozygotes for the more common "G" allele. In summary, we have completed a whole genome-association scan for
Full Text Available Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel included 3,213,401 single nucleotide polymorphisms (SNPs, 53,823 block substitutions (2-206 bp, 292,102 heterozygous insertion/deletion events (indels(1-571 bp, 559,473 homozygous indels (1-82,711 bp, 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
You, Na; Murillo, Gabriel; Su, Xiaoquan; Zeng, Xiaowei; Xu, Jian; Ning, Kang; Zhang, ShouDong; Zhu, Jian-Kang; Cui, Xinping
calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts
Dash, S; Singh, A; Bhatia, A K; Jayakumar, S; Sharma, A; Singh, S; Ganguly, I; Dixit, S P
In total 52 samples of Sahiwal ( 19 ), Tharparkar ( 17 ), and Gir ( 16 ) were genotyped by using BovineHD SNP chip to analyze minor allele frequency (MAF), genetic diversity, and linkage disequilibrium among these cattle. The common SNPs of BovineHD and 54K SNP Chips were also extracted and evaluated for their performance. Only 40%-50% SNPs of these arrays was found informative for genetic analysis in these cattle breeds. The overall mean of MAF for SNPs of BovineHD SNPChip was 0.248 ± 0.006, 0.241 ± 0.007, and 0.242 ± 0.009 in Sahiwal, Tharparkar and Gir, respectively, while that for 54K SNPs was on lower side. The average Reynold's genetic distance between breeds ranged from 0.042 to 0.055 based on BovineHD Beadchip, and from 0.052 to 0.084 based on 54K SNP Chip. The estimates of genetic diversity based on HD and 54K chips were almost same and, hence, low density chip seems to be good enough to decipher genetic diversity of these cattle breeds. The linkage disequilibrium started decaying (r 2 < 0.2) at 140 kb inter-marker distance and, hence, a 20K low density customized SNP array from HD chip could be designed for genomic selection in these cattle else the 54K Bead Chip as such will be useful.
Biernacka, Joanna M.; Geske, Jennifer; Jenkins, Gregory D.; Colby, Colin; Rider, David N.; Karpyak, Victor M.; Choi, Doo-Sup; Fridley, Brooke L.
It is believed that multiple genetic variants with small individual effects contribute to the risk of alcohol dependence. Such polygenic effects are difficult to detect in genome-wide association studies that test for association of the phenotype with each single nucleotide polymorphism (SNP) individually. To overcome this challenge, gene set analysis (GSA) methods that jointly test for the effects of pre-defined groups of genes have been proposed. Rather than testing for association between the phenotype and individual SNPs, these analyses evaluate the global evidence of association with a set of related genes enabling the identification of cellular or molecular pathways or biological processes that play a role in development of the disease. It is hoped that by aggregating the evidence of association for all available SNPs in a group of related genes, these approaches will have enhanced power to detect genetic associations with complex traits. We performed GSA using data from a genome-wide study of 1165 alcohol dependent cases and 1379 controls from the Study of Addiction: Genetics and Environment (SAGE), for all 200 pathways listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results demonstrated a potential role of the “Synthesis and Degradation of Ketone Bodies” pathway. Our results also support the potential involvement of the “Neuroactive Ligand Receptor Interaction” pathway, which has previously been implicated in addictive disorders. These findings demonstrate the utility of GSA in the study of complex disease, and suggest specific directions for further research into the genetic architecture of alcohol dependence. PMID:22717047
Andersen, Vibeke; Ernst, Anja; Sventoraityte, Jurgita; Kupcinskas, Limas; Jacobsen, Bent A; Krarup, Henrik B; Vogel, Ulla; Jonaitis, Laimas; Denapiene, Goda; Kiudelis, Gediminas; Balschun, Tobias; Franke, Andre
Differences in the genetic architecture of inflammatory bowel disease between different European countries and ethnicities have previously been reported. In the present study, we wanted to assess the role of 11 newly identified UC risk variants, derived from a recent European UC genome wide association study (GWAS) (Franke et al., 2010), for 1) association with UC in the Nordic countries, 2) for population heterogeneity between the Nordic countries and the rest of Europe, and, 3) eventually, to drive some of the previous findings towards overall genome-wide significance. Eleven SNPs were replicated in a Danish sample consisting of 560 UC patients and 796 controls and nine missing SNPs of the German GWAS study were successfully genotyped in the Baltic sample comprising 441 UC cases and 1156 controls. The independent replication data was then jointly analysed with the original data and systematic comparisons of the findings between ethnicities were made. Pearson's χ2, Breslow-Day (BD) and Cochran-Mantel-Haenszel (CMH) tests were used for association analyses and heterogeneity testing. The rs5771069 (IL17REL) SNP was not associated with UC in the Danish panel. The rs5771069 (IL17REL) SNP was significantly associated with UC in the combined Baltic, Danish and Norwegian UC study sample driven by the Norwegian panel (OR = 0.89, 95% CI: 0.79-0.98, P = 0.02). No association was found between rs7809799 (SMURF1/KPNA7) and UC (OR = 1.20, 95% CI: 0.95-1.52, P = 0.10) or between UC and all other remaining SNPs. We had 94% chance of detecting an association for rs7809799 (SMURF1/KPNA7) in the combined replication sample, whereas the power were 55% or lower for the remaining SNPs.Statistically significant PBD was found for OR heterogeneity between the combined Baltic, Danish, and Norwegian panel versus the combined German, British, Belgian, and Greek panel (rs7520292 (P = 0.001), rs12518307 (P = 0.007), and rs2395609 (TCP11) (P = 0.01), respectively).No SNP reached genome
Joaquim Manoel da Silva
Full Text Available High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production.
Blanca E Himes
Full Text Available Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR. The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify asthma-related genes by integrating AHR associations in mouse with human genome-wide association study (GWAS data. We used Efficient Mixed Model Association (EMMA analysis to conduct a GWAS of baseline AHR measures from males and females of 31 mouse strains. Genes near or containing SNPs with EMMA p-values <0.001 were selected for further study in human GWAS. The results of the previously reported EVE consortium asthma GWAS meta-analysis consisting of 12,958 diverse North American subjects from 9 study centers were used to select a subset of homologous genes with evidence of association with asthma in humans. Following validation attempts in three human asthma GWAS (i.e., Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG and two human AHR GWAS (i.e., SHARP, DAG, the Kv channel interacting protein 4 (KCNIP4 gene was identified as nominally associated with both asthma and AHR at a gene- and SNP-level. In EVE, the smallest KCNIP4 association was at rs6833065 (P-value 2.9e-04, while the strongest associations for Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG were 1.5e-03, 1.0e-03, 3.1e-03 at rs7664617, rs4697177, rs4696975, respectively. At a SNP level, the strongest association across all asthma GWAS was at rs4697177 (P-value 1.1e-04. The smallest P-values for association with AHR were 2.3e-03 at rs11947661 in SHARP and 2.1e-03 at rs402802 in DAG. Functional studies are required to validate the potential involvement of KCNIP4 in modulating asthma susceptibility and/or AHR. Our results suggest that a useful approach to identify genes associated with human asthma is to leverage mouse AHR association data.
Nelson, George W.; Lautenberger, James A.; Chinn, Leslie; McIntosh, Carl; Johnson, Randall C.; Sezgin, Efe; Kessing, Bailey; Malasky, Michael; Hendrickson, Sher L.; Pontius, Joan; Tang, Minzhong; An, Ping; Winkler, Cheryl A.; Limou, Sophie; Le Clerc, Sigrid; Delaneau, Olivier; Zagury, Jean-François; Schuitemaker, Hanneke; van Manen, Daniëlle; Bream, Jay H.; Gomperts, Edward D.; Buchbinder, Susan; Goedert, James J.; Kirk, Gregory D.; O'Brien, Stephen J.
Background. Host genetic variation influences human immunodeficiency virus (HIV) infection and progression to AIDS. Here we used clinically well-characterized subjects from 5 pretreatment HIV/AIDS cohorts for a genome-wide association study to identify gene associations with rate of AIDS progression. Methods. European American HIV seroconverters (n = 755) were interrogated for single-nucleotide polymorphisms (SNPs) (n = 700,022) associated with progression to AIDS 1987 (Cox proportional hazards regression analysis, co-dominant model). Results. Association with slower progression was observed for SNPs in the gene PARD3B. One of these, rs11884476, reached genome-wide significance (relative hazard = 0.3; P =3. 370 × 10−9) after statistical correction for 700,022 SNPs and contributes 4.52% of the overall variance in AIDS progression in this study. Nine of the top-ranked SNPs define a PARD3B haplotype that also displays significant association with progression to AIDS (hazard ratio, 0.3; P = 3.220 × 10−8). One of these SNPs, rs10185378, is a predicted exonic splicing enhancer; significant alteration in the expression profile of PARD3B splicing transcripts was observed in B cell lines with alternate rs10185378 genotypes. This SNP was typed in European cohorts of rapid progressors and was found to be protective for AIDS 1993 definition (odds ratio, 0.43, P = .025). Conclusions. These observations suggest a potential unsuspected pathway of host genetic influence on the dynamics of AIDS progression. PMID:21502085
Full Text Available A recent chordoma cancer genotyping study reveals that the rs2305089, a single nucleotide polymorphism (SNP located in brachyury gene and a key gene in the development of notochord, is significantly associated with chordoma risk. The brachyury gene is believed to be one of the key genes involved in the pathogenesis of chordoma, a rare primary bone tumor originating along the spinal column or at the base of the skull. The association between the brachyury Gly177Asp single nucleotide polymorphism (SNP and the risk of skull base chordoma in Chinese populations is currently unknown. We investigated the genotype distribution of this SNP in 65 skull-base chordoma cases and 120 healthy subjects. Comparisons of the genotype distributions and allele frequencies did not reveal any significant difference between the groups. Our data suggest that the brachyury Gly177Asp SNP is not involved in the risks of skull-base chordoma, at least in the Chinese population.
Meredith Brian K
Full Text Available Abstract Background Contemporary dairy breeding goals have broadened to include, along with milk production traits, a number of non-production-related traits in an effort to improve the overall functionality of the dairy cow. Increased indirect selection for resistance to mastitis, one of the most important production-related diseases in the dairy sector, via selection for reduced somatic cell count has been part of these broadened goals. A number of genome-wide association studies have identified genetic variants associated with milk production traits and mastitis resistance, however the majority of these studies have been based on animals which were predominantly kept in confinement and fed a concentrate-based diet (i.e. high-input production systems. This genome-wide association study aims to detect associations using genotypic and phenotypic data from Irish Holstein-Friesian cattle fed predominantly grazed grass in a pasture-based production system (low-input. Results Significant associations were detected for milk yield, fat yield, protein yield, fat percentage, protein percentage and somatic cell score using separate single-locus, frequentist and multi-locus, Bayesian approaches. These associations were detected using two separate populations of Holstein-Friesian sires and cows. In total, 1,529 and 37 associations were detected in the sires using a single SNP regression and a Bayesian method, respectively. There were 103 associations in common between the sires and cows across all the traits. As well as detecting associations within known QTL regions, a number of novel associations were detected; the most notable of these was a region of chromosome 13 associated with milk yield in the population of Holstein-Friesian sires. Conclusions A total of 276 of novel SNPs were detected in the sires using a single SNP regression approach. Although obvious candidate genes may not be initially forthcoming, this study provides a preliminary framework
Background Contemporary dairy breeding goals have broadened to include, along with milk production traits, a number of non-production-related traits in an effort to improve the overall functionality of the dairy cow. Increased indirect selection for resistance to mastitis, one of the most important production-related diseases in the dairy sector, via selection for reduced somatic cell count has been part of these broadened goals. A number of genome-wide association studies have identified genetic variants associated with milk production traits and mastitis resistance, however the majority of these studies have been based on animals which were predominantly kept in confinement and fed a concentrate-based diet (i.e. high-input production systems). This genome-wide association study aims to detect associations using genotypic and phenotypic data from Irish Holstein-Friesian cattle fed predominantly grazed grass in a pasture-based production system (low-input). Results Significant associations were detected for milk yield, fat yield, protein yield, fat percentage, protein percentage and somatic cell score using separate single-locus, frequentist and multi-locus, Bayesian approaches. These associations were detected using two separate populations of Holstein-Friesian sires and cows. In total, 1,529 and 37 associations were detected in the sires using a single SNP regression and a Bayesian method, respectively. There were 103 associations in common between the sires and cows across all the traits. As well as detecting associations within known QTL regions, a number of novel associations were detected; the most notable of these was a region of chromosome 13 associated with milk yield in the population of Holstein-Friesian sires. Conclusions A total of 276 of novel SNPs were detected in the sires using a single SNP regression approach. Although obvious candidate genes may not be initially forthcoming, this study provides a preliminary framework upon which to identify the
Delgado, Dayana A; Zhang, Chenan; Chen, Lin S; Gao, Jianjun; Roy, Shantanu; Shinkle, Justin; Sabarinathan, Mekala; Argos, Maria; Tong, Lin; Ahmed, Alauddin; Islam, Tariqul; Rakibuz-Zaman, Muhammad; Sarwar, Golam; Shahriar, Hasan; Rahman, Mahfuzar; Yunus, Mohammad; Jasmine, Farzana; Kibriya, Muhammad G; Ahsan, Habibul; Pierce, Brandon L
Leucocyte telomere length (TL) is a potential biomarker of ageing and risk for age-related disease. Leucocyte TL is heritable and shows substantial differences by race/ethnicity. Recent genome-wide association studies (GWAS) report ~10 loci harbouring SNPs associated with leucocyte TL, but these studies focus primarily on populations of European ancestry. This study aims to enhance our understanding of genetic determinants of TL across populations. We performed a GWAS of TL using data on 5075 Bangladeshi adults. We measured TL using one of two technologies (qPCR or a Luminex-based method) and used standardised variables as TL phenotypes. Our results replicate previously reported associations in the TERC and TERT regions (P=2.2×10 -8 and P=6.4×10 -6 , respectively). We observed a novel association signal in the RTEL1 gene (intronic SNP rs2297439; P=2.82×10 -7 ) that is independent of previously reported TL-associated SNPs in this region. The minor allele for rs2297439 is common in South Asian populations (≥0.25) but at lower frequencies in other populations (eg, 0.07 in Northern Europeans). Among the eight other previously reported association signals, all were directionally consistent with our study, but only rs8105767 ( ZNF208 ) was nominally significant (P=0.003). SNP-based heritability estimates were as high as 44% when analysing close relatives but much lower when analysing distant relatives only. In this first GWAS of TL in a South Asian population, we replicate some, but not all, of the loci reported in prior GWAS of individuals of European ancestry, and we identify a novel second association signal at the RTEL1 locus. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Dimitrijevic, Aleksandra; Horn, Renate
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi , or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches
Full Text Available In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare
Dimitrijevic, Aleksandra; Horn, Renate
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches
Hibar, Derrek P; Stein, Jason L; Ryles, April B; Kohannim, Omid; Jahanshad, Neda; Medland, Sarah E; Hansell, Narelle K; McMahon, Katie L; de Zubicaray, Greig I; Montgomery, Grant W; Martin, Nicholas G; Wright, Margaret J; Saykin, Andrew J; Jack, Clifford R; Weiner, Michael W; Toga, Arthur W; Thompson, Paul M
Deficits in lentiform nucleus volume and morphometry are implicated in a number of genetically influenced disorders, including Parkinson's disease, schizophrenia, and ADHD. Here we performed genome-wide searches to discover common genetic variants associated with differences in lentiform nucleus volume in human populations. We assessed structural MRI scans of the brain in two large genotyped samples: the Alzheimer's Disease Neuroimaging Initiative (ADNI; N = 706) and the Queensland Twin Imaging Study (QTIM; N = 639). Statistics of association from each cohort were combined meta-analytically using a fixed-effects model to boost power and to reduce the prevalence of false positive findings. We identified a number of associations in and around the flavin-containing monooxygenase (FMO) gene cluster. The most highly associated SNP, rs1795240, was located in the FMO3 gene; after meta-analysis, it showed genome-wide significant evidence of association with lentiform nucleus volume (P MA = 4.79 × 10(-8)). This commonly-carried genetic variant accounted for 2.68 % and 0.84 % of the trait variability in the ADNI and QTIM samples, respectively, even though the QTIM sample was on average 50 years younger. Pathway enrichment analysis revealed significant contributions of this gene to the cytochrome P450 pathway, which is involved in metabolizing numerous therapeutic drugs for pain, seizures, mania, depression, anxiety, and psychosis. The genetic variants we identified provide replicated, genome-wide significant evidence for the FMO gene cluster's involvement in lentiform nucleus volume differences in human populations.
Marjolein van Gent
Full Text Available To monitor changes in Bordetella pertussis populations, mainly two typing methods are used; Pulsed-Field Gel Electrophoresis (PFGE and Multiple-Locus Variable-Number Tandem Repeat Analysis (MLVA. In this study, a single nucleotide polymorphism (SNP typing method, based on 87 SNPs, was developed and compared with PFGE and MLVA. The discriminatory indices of SNP typing, PFGE and MLVA were found to be 0.85, 0.95 and 0.83, respectively. Phylogenetic analysis, using SNP typing as Gold Standard, revealed false homoplasies in the PFGE and MLVA trees. Further, in contrast to the SNP-based tree, the PFGE- and MLVA-based trees did not reveal a positive correlation between root-to-tip distance and the isolation year of strains. Thus PFGE and MLVA do not allow an estimation of the relative age of the selected strains. In conclusion, SNP typing was found to be phylogenetically more informative than PFGE and more discriminative than MLVA. Further, in contrast to PFGE, it is readily standardized allowing interlaboratory comparisons. We applied SNP typing to study strains with a novel allele for the pertussis toxin promoter, ptxP3, which have a worldwide distribution and which have replaced the resident ptxP1 strains in the last 20 years. Previously, we showed that ptxP3 strains showed increased pertussis toxin expression and that their emergence was associated with increased notification in The Netherlands. SNP typing showed that the ptxP3 strains isolated in the Americas, Asia, Australia and Europe formed a monophyletic branch which recently diverged from ptxP1 strains. Two predominant ptxP3 SNP types were identified which spread worldwide. The widespread use of SNP typing will enhance our understanding of the evolution and global epidemiology of B. pertussis.
van der Heide, Han G. J.; Heuvelman, Kees J.; Kallonen, Teemu; He, Qiushui; Mertsola, Jussi; Advani, Abdolreza; Hallander, Hans O.; Janssens, Koen; Hermans, Peter W.; Mooi, Frits R.
To monitor changes in Bordetella pertussis populations, mainly two typing methods are used; Pulsed-Field Gel Electrophoresis (PFGE) and Multiple-Locus Variable-Number Tandem Repeat Analysis (MLVA). In this study, a single nucleotide polymorphism (SNP) typing method, based on 87 SNPs, was developed and compared with PFGE and MLVA. The discriminatory indices of SNP typing, PFGE and MLVA were found to be 0.85, 0.95 and 0.83, respectively. Phylogenetic analysis, using SNP typing as Gold Standard, revealed false homoplasies in the PFGE and MLVA trees. Further, in contrast to the SNP-based tree, the PFGE- and MLVA-based trees did not reveal a positive correlation between root-to-tip distance and the isolation year of strains. Thus PFGE and MLVA do not allow an estimation of the relative age of the selected strains. In conclusion, SNP typing was found to be phylogenetically more informative than PFGE and more discriminative than MLVA. Further, in contrast to PFGE, it is readily standardized allowing interlaboratory comparisons. We applied SNP typing to study strains with a novel allele for the pertussis toxin promoter, ptxP3, which have a worldwide distribution and which have replaced the resident ptxP1 strains in the last 20 years. Previously, we showed that ptxP3 strains showed increased pertussis toxin expression and that their emergence was associated with increased notification in the Netherlands. SNP typing showed that the ptxP3 strains isolated in the Americas, Asia, Australia and Europe formed a monophyletic branch which recently diverged from ptxP1 strains. Two predominant ptxP3 SNP types were identified which spread worldwide. The widespread use of SNP typing will enhance our understanding of the evolution and global epidemiology of B. pertussis. PMID:21647370
Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C
The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Full Text Available European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA showed the largest division/principal component (PC differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans.
Full Text Available Sika deer are an economically valuable species owing to their use in traditional Chinese medicine, particularly their velvet antlers. Sika deer in northeast China are mostly farmed in enclosure. Therefore, genetic management of farmed sika deer would benefit from detailed knowledge of their genetic diversity. In this study, we generated over 1.45 billion high-quality paired-end reads (288 Gbp across 42 unrelated individuals using double-digest restriction site-associated DNA sequencing (ddRAD-seq. A total of 96,188 (29.63% putative biallelic SNP loci were identified with an average sequencing depth of 23×. Based on the analysis, we found that the majority of the loci had a deficit of heterozygotes (FIS >0 and low values of Hobs, which could be due to inbreeding and Wahlund effects. We also developed a collection of high-quality SNP probes that will likely be useful in a variety of applications in genotyping for cervid species in the future.
Ba, Hengxing; Jia, Boyin; Wang, Guiwu; Yang, Yifeng; Kedem, Gilead; Li, Chunyi
Sika deer are an economically valuable species owing to their use in traditional Chinese medicine, particularly their velvet antlers. Sika deer in northeast China are mostly farmed in enclosure. Therefore, genetic management of farmed sika deer would benefit from detailed knowledge of their genetic diversity. In this study, we generated over 1.45 billion high-quality paired-end reads (288 Gbp) across 42 unrelated individuals using double-digest restriction site-associated DNA sequencing (ddRAD-seq). A total of 96,188 (29.63%) putative biallelic SNP loci were identified with an average sequencing depth of 23×. Based on the analysis, we found that the majority of the loci had a deficit of heterozygotes (F IS >0) and low values of H obs , which could be due to inbreeding and Wahlund effects. We also developed a collection of high-quality SNP probes that will likely be useful in a variety of applications in genotyping for cervid species in the future. Copyright © 2017 Ba et al.
Full Text Available The success of genome-wide association studies (GWASs has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR, least absolute shrinkage and selection operator (LASSO, and Elastic-Net (EN. We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.
Armour, J AL; Davison, A; McManus, I C
Handedness is a human behavioural phenotype that appears to be congenital, and is often assumed to be inherited, but for which the developmental origin and underlying causation(s) have been elusive. Models of the genetic basis of variation in handedness have been proposed that fit different features of the observed resemblance between relatives, but none has been decisively tested or a corresponding causative locus identified. In this study, we applied data from well-characterised individuals studied at the London Twin Research Unit. Analysis of genome-wide SNP data from 3940 twins failed to identify any locus associated with handedness at a genome-wide level of significance. The most straightforward interpretation of our analyses is that they exclude the simplest formulations of the ‘right-shift' model of Annett and the ‘dextral/chance' model of McManus, although more complex modifications of those models are still compatible with our observations. For polygenic effects, our study is inadequately powered to reliably detect alleles with effect sizes corresponding to an odds ratio of 1.2, but should have good power to detect effects at an odds ratio of 2 or more. PMID:24065183
Full Text Available Management of insects that cause economic damage to yields of soybean mainly rely on insecticide applications. Sources of resistance in soybean plant introductions (PIs to different insect pests have been reported, and some of these sources, like for the soybean aphid (SBA, have been used to develop resistant soybean cultivars. With the availability of SoySNP50K and the statistical power of genome-wide association studies, we integrated phenotypic data for beet armyworm, Mexican bean beetle (MBB, potato leafhopper (PLH, SBA, soybean looper (SBL, velvetbean caterpillar (VBC, and chewing damage caused by unspecified insects for a comprehensive understanding of insect resistance in the United States Department of Agriculture Soybean Germplasm Collection. We identified significant single nucleotide (SNP polymorphic markers for MBB, PLH, SBL, and VBC, and we highlighted several leucine-rich repeat-containing genes and myeloblastosis transcription factors within the high linkage disequilibrium region surrounding significant SNP markers. Specifically for soybean resistance to PLH, we found the PLH locus is close but distinct to a locus for soybean pubescence density on chromosome 12. The results provide genetic support that pubescence density may not directly link to PLH resistance. This study offers a novel insight of soybean resistance to four insect pests and reviews resistance mapping studies for major soybean insects.
Long, Yi; Su, Ying; Ai, Huashui; Zhang, Zhiyan; Yang, Bin; Ruan, Guorong; Xiao, Shijun; Liao, Xinjun; Ren, Jun; Huang, Lusheng; Ding, Nengshui
Umbilical hernia (UH) is one of the most common congenital defects in pigs, leading to considerable economic loss and serious animal welfare problems. To test whether copy number variations (CNVs) contribute to pig UH, we performed a case-control genome-wide CNV association study on 905 pigs from the Duroc, Landrace and Yorkshire breeds using the Porcine SNP60 BeadChip and penncnv algorithm. We first constructed a genomic map comprising 6193 CNVs that pertain to 737 CNV regions. Then, we identified eight CNVs significantly associated with the risk for UH in the three pig breeds. Six of seven significantly associated CNVs were validated using quantitative real-time PCR. Notably, a rare CNV (CNV14:13030843-13059455) encompassing the NUGGC gene was strongly associated with UH (permutation-corrected P = 0.0015) in Duroc pigs. This CNV occurred exclusively in seven Duroc UH-affected individuals. SNPs surrounding the CNV did not show association signals, indicating that rare CNVs may play an important role in complex pig diseases such as UH. The NUGGC gene has been implicated in human omphalocele and inguinal hernia. Our finding supports that CNVs, including the NUGGC CNV, contribute to the pathogenesis of pig UH. © 2016 Stichting International Foundation for Animal Genetics.