Basinger, Scott A.; Bikkannavar, Siddarayappa; Cohen, David; Green, Joseph J.; Lou, John; Ohara, Catherine; Redding, David; Shi, Fang
Adaptive MGS Phase Retrieval software uses the Modified Gerchberg-Saxton (MGS) algorithm, an image-based sensing method that can turn any focal plane science instrument into a wavefront sensor, avoiding the need to use external metrology equipment. Knowledge of the wavefront enables intelligent control of active optical systems.
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Hayes, Ben; Goddard, Mike
Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.
Esfandyari, Hadi; Sørensen, Anders Christian; Bijma, Piter
Background In livestock production, many animals are crossbred, with two distinct advantages: heterosis and breed complementarity. Genomic selection (GS) can be used to select purebred parental lines for crossbred performance (CP). Dominance being the likely genetic basis of heterosis, explicitly...
Esfandyari, Hadi; Sørensen, Anders Christian; Bijma, Pieter
Genomic selection (GS) can be used to select purebreds for crossbred performance (CP). As dominance is the likely genetic basis of heterosis, explicitly including dominance in the GS model may be beneficial for selection of purebreds for CP, when estimating allelic effects from pure line data...
Daetwyler, H.D.; Villanueva, B.; Bijma, P.; Woolliams, J.A.
Traditional selection methods, such as sib and best linear unbiased prediction (BLUP) selection, which increased genetic gain by increasing accuracy of evaluation have also led to an increased rate of inbreeding per generation (¿FG). This is not necessarily the case with genome-wide selection, which
Esfandyari, H.; Sorensen, A.C.; Bijma, P.
Background: In livestock production, many animals are crossbred, with two distinct advantages: heterosis and breed complementarity. Genomic selection (GS) can be used to select purebred parental lines for crossbred performance (CP). Dominance being the likely genetic basis of heterosis, explicitly i
McCauley, Stephen; de Groot, Saskia; Mailund, Thomas
Motivation: Viral genomes tend to code in overlapping reading frames to maximize information content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra......- and intergenomic regions. The presence of multiple coding regions complicates the concept of Ka/Ks ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley & Hein (2006), we develop a method for annotating a viral genome coding in overlapping...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...
Thomasen, Jørn Rind
Genomic selection provides more accurate estimation of genetic merit for breeding candidates without own recordings and is now an integrated part of most dairy breeding schemes. However, the method has turned out to be less efficient in the numerically smaler breeds. This thesis focuses on optimi......Genomic selection provides more accurate estimation of genetic merit for breeding candidates without own recordings and is now an integrated part of most dairy breeding schemes. However, the method has turned out to be less efficient in the numerically smaler breeds. This thesis focuses...... on optimization of genomc selction for a small dairy cattle breed such as Danish Jersey. Implementing genetic superior breeding schemes thus requires more accurate genomc predictions. Besides international collaboration, genotyping of cows is an efficient way to obtain more accurate genomic predictions...
Daniel J Gaffney
Full Text Available Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids.
Esfandyari, Hadi; Sørensen, Anders Christian; Bijma, Piter
Background In livestock production, many animals are crossbred, with two distinct advantages: heterosis and breed complementarity. Genomic selection (GS) can be used to select purebred parental lines for crossbred performance (CP). Dominance being the likely genetic basis of heterosis, explicitly...... to select purebred animals for CP, based on purebred phenotypic and genotypic information. A second objective was to compare the use of two separate pure line reference populations to that of a single reference population that combines both pure lines. These objectives were investigated under two conditions......, i.e. either a low or a high correlation of linkage disequilibrium (LD) phase between the pure lines. Results The results demonstrate that the gain in CP was higher when parental lines were selected for CP, rather than purebred performance, both with a low and a high correlation of LD phase...
Ofria, C A; Collier, T C; Ofria, Charles; Adami, Christoph; Collier, Travis C.
We describe the evolution of macromolecules as an information transmission process and apply tools from Shannon information theory to it. This allows us to isolate three independent, competing selective pressures that we term compression, transmission, and neutrality selection. The first two affect genome length: the pressure to conserve resources by compressing the code, and the pressure to acquire additional information that improves the channel, increasing the rate of information transmission into each offspring. Noisy transmission channels (replication with mutations) gives rise to a third pressure that acts on the actual encoding of information; it maximizes the fraction of mutations that are neutral with respect to the phenotype. This neutrality selection has important implications for the evolution of evolvability. We demonstrate each selective pressure in experiments with digital organisms.
Vallender, Eric J; Lahn, Bruce T
Positive selection has undoubtedly played a critical role in the evolution of Homo sapiens. Of the many phenotypic traits that define our species--notably the enormous brain, advanced cognitive abilities, complex vocal organs, bipedalism and opposable thumbs--most (if not all) are likely the product of strong positive selection. Many other aspects of human biology not necessarily related to the 'branding' of our species, such as host-pathogen interactions, reproduction, dietary adaptation and physical appearance, have also been the substrate of varying levels of positive selection. Comparative genetics/genomics studies in recent years have uncovered a growing list of genes that might have experienced positive selection during the evolution of human and/or primates. These genes offer valuable inroads into understanding the biological processes specific to humans, and the evolutionary forces that gave rise to them. Here, we present a comprehensive review of these genes, and their implications for human evolution.
Full Text Available Over the past four decades, the predominant view of molecular evolution saw little connection between natural selection and genome evolution, assuming that the functionally constrained fraction of the genome is relatively small and that adaptation is sufficiently infrequent to play little role in shaping patterns of variation within and even between species. Recent evidence from Drosophila, reviewed here, suggests that this view may be invalid. Analyses of genetic variation within and between species reveal that much of the Drosophila genome is under purifying selection, and thus of functional importance, and that a large fraction of coding and noncoding differences between species are adaptive. The findings further indicate that, in Drosophila, adaptations may be both common and strong enough that the fate of neutral mutations depends on their chance linkage to adaptive mutations as much as on the vagaries of genetic drift. The emerging evidence has implications for a wide variety of fields, from conservation genetics to bioinformatics, and presents challenges to modelers and experimentalists alike.
Gois, I B; Borém, A; Cristofani-Yaly, M; de Resende, M D V; Azevedo, C F; Bastianel, M; Novelli, V M; Machado, M A
Genome wide selection (GWS) is essential for the genetic improvement of perennial species such as Citrus because of its ability to increase gain per unit time and to enable the efficient selection of characteristics with low heritability. This study assessed GWS efficiency in a population of Citrus and compared it with selection based on phenotypic data. A total of 180 individual trees from a cross between Pera sweet orange (Citrus sinensis Osbeck) and Murcott tangor (Citrus sinensis Osbeck x Citrus reticulata Blanco) were evaluated for 10 characteristics related to fruit quality. The hybrids were genotyped using 5287 DArT_seq(TM) (diversity arrays technology) molecular markers and their effects on phenotypes were predicted using the random regression - best linear unbiased predictor (rr-BLUP) method. The predictive ability, prediction bias, and accuracy of GWS were estimated to verify its effectiveness for phenotype prediction. The proportion of genetic variance explained by the markers was also computed. The heritability of the traits, as determined by markers, was 16-28%. The predictive ability of these markers ranged from 0.53 to 0.64, and the regression coefficients between predicted and observed phenotypes were close to unity. Over 35% of the genetic variance was accounted for by the markers. Accuracy estimates with GWS were lower than those obtained by phenotypic analysis; however, GWS was superior in terms of genetic gain per unit time. Thus, GWS may be useful for Citrus breeding as it can predict phenotypes early and accurately, and reduce the length of the selection cycle. This study demonstrates the feasibility of genomic selection in Citrus.
Ariöz, Candan; Ye, Weihua; Bakali, Amin; Ge, Changrong; Liebau, Jobst; Götzke, Hansjörg; Barth, Andreas; Wieslander, Ake; Mäler, Lena
Certain membrane proteins involved in lipid synthesis can induce formation of new intracellular membranes in Escherichia coli, i.e., intracellular vesicles. Among those, the foreign monotopic glycosyltransferase MGS from Acholeplasma laidlawii triggers such massive lipid synthesis when overexpressed. To examine the mechanism behind the increased lipid synthesis, we investigated the lipid binding properties of MGS in vivo together with the correlation between lipid synthesis and MGS overexpression levels. A good correlation between produced lipid quantities and overexpressed MGS protein was observed when standard LB medium was supplemented with four different lipid precursors that have significant roles in the lipid biosynthesis pathway. Interestingly, this correlation was highest concerning anionic lipid production and at the same time dependent on the selective binding of anionic lipid molecules by MGS. A selective interaction with anionic lipids was also observed in vitro by (31)P NMR binding studies using bicelles prepared with E. coli lipids. The results clearly demonstrate that the discriminative withdrawal of anionic lipids, especially phosphatidylglycerol, from the membrane through MGS binding triggers an in vivo signal for cells to create a "feed-forward" stimulation of lipid synthesis in E. coli. By this mechanism, cells can produce more membrane surface in order to accommodate excessively produced MGS molecules, which results in an interdependent cycle of lipid and MGS protein synthesis.
Iwata, Hiroyoshi; Hayashi, Takeshi; Terakami, Shingo; Takada, Norio; Sawamura, Yutaka; Yamamoto, Toshiya
Although the potential of marker-assisted selection (MAS) in fruit tree breeding has been reported, bi-parental QTL mapping before MAS has hindered the introduction of MAS to fruit tree breeding programs. Genome-wide association studies (GWAS) are an alternative to bi-parental QTL mapping in long-lived perennials. Selection based on genomic predictions of breeding values (genomic selection: GS) is another alternative for MAS. This study examined the potential of GWAS and GS in pear breeding w...
Pespeni, Melissa H.; Garfield, David A.; Manier, Mollie K; Palumbi, Stephen R.
Natural selection can act on all the expressed genes of an individual, leaving signatures of genetic differentiation or diversity at many loci across the genome. New power to assay these genome-wide effects of selection comes from associating multi-locus patterns of polymorphism with gene expression and function. Here, we performed one of the first genome-wide surveys in a marine species, comparing purple sea urchins, Strongylocentrotus purpuratus, from two distant locations along the species...
Anna M Johansson
Full Text Available To understand the genetic mechanisms leading to phenotypic differentiation, it is important to identify genomic regions under selection. We scanned the genome of two chicken lines from a single trait selection experiment, where 50 generations of selection have resulted in a 9-fold difference in body weight. Analyses of nearly 60,000 SNP markers showed that the effects of selection on the genome are dramatic. The lines were fixed for alternative alleles in more than 50 regions as a result of selection. Another 10 regions displayed strong evidence for ongoing differentiation during the last 10 generations. Many more regions across the genome showed large differences in allele frequency between the lines, indicating that the phenotypic evolution in the lines in 50 generations is the result of an exploitation of standing genetic variation at 100s of loci across the genome.
Nishio, Motohide; Satoh, Masahiro
The present study investigated the parameter settings for obtaining a simulated genome at steady state of allele frequency (mutation-drift equilibrium) and linkage disequilibrium (LD), and evaluated the impact of whether or not the simulated genome reached steady state of allele frequency and LD on the accuracy of genomic estimated breeding values (GEBVs). After 500 to 50,000 historical generations, the base population and subsequent seven generations were generated as recent populations. The allele frequency distribution of the last generations of the historical population and LD in the base population were calculated when varying the values of five parameters: initial minor allele frequency, mutation rate, effective population size, number of markers and chromosome length. The accuracies of GEBVs in the last generation of the recent population were calculated by genomic best linear unbiased prediction. The number of historical generations required to reach mutation-drift equilibrium depended on the initial allele frequency and mutation rate. Regardless of the parameters, LD reached a steady state before allele frequency distribution reached mutation-drift equilibrium. The accuracies of GEBVs largely reflect the extent of linkage disequilibrium with the exception of varying chromosome length, although there were no associations between the accuracies of GEBVs and allele frequency distribution. © 2014 Japanese Society of Animal Science.
Anthony T. Slater
Full Text Available Potato ( L. breeders consider a large number of traits during cultivar development and progress in conventional breeding can be slow. There is accumulating evidence that some of these traits, such as yield, are affected by a large number of genes with small individual effects. Recently, significant efforts have been applied to the development of genomic resources to improve potato breeding, culminating in a draft genome sequence and the identification of a large number of single nucleotide polymorphisms (SNPs. The availability of these genome-wide SNPs is a prerequisite for implementing genomic selection for improvement of polygenic traits such as yield. In this review, we investigate opportunities for the application of genomic selection to potato, including novel breeding program designs. We have considered a number of factors that will influence this process, including the autotetraploid and heterozygous genetic nature of potato, the rate of decay of linkage disequilibrium, the number of required markers, the design of a reference population, and trait heritability. Based on estimates of the effective population size derived from a potato breeding program, we have calculated the expected accuracy of genomic selection for four key traits of varying heritability and propose that it will be reasonably accurate. We compared the expected genetic gain from genomic selection with the expected gain from phenotypic and pedigree selection, and found that genetic gain can be substantially improved by using genomic selection.
Brain, D.; Luhmann, J.; Halekas, J.; Frahm, R.; Winningham, D.; Barabash, S.
Since late 2003, Mars Express (MEX) and Mars Global Surveyor (MGS) have been making complementary in situ measurements (in terms of both instrument and orbit) of the Martian plasma environment. Study of MGS and MEX data in tandem provides an opportunity to mitigate the shortcomings of each dataset and increase our overall understanding of the Martian solar wind interaction and atmospheric escape. Close passes of spacecraft (conjunctions) are one particularly powerful means of increasing the utility of measurements, as evidenced by the Cluster mission at Earth. At Mars, conjunctions might be used to obtain more complete simultaneous and/or co-located plasma measurements, which can be used to study a variety of phenomena, including measurements of auroral-like particle acceleration near crustal fields and the three-dimensional motion and shape of plasma boundaries. We will present an analysis of approximately forty conjunctions (instances with instantaneous spacecraft separation smaller than 400 km) of MEX and MGS identified between January 2004 and February 2006. The closest pass was ~40~km, near the South Pole. Conjunctions occur both at mid-latitudes (when the surface-projected orbit tracks of the two spacecraft nearly overlap), and at the poles. We will present comparisons of MEX Analyzer of Space Plasmas and Energetic Atoms (ASPERA-3) data with MGS Magnetometer and Electron Reflectometer (MAG/ER) data for these events. Our case studies include intercomparison of MEX and MGS electron data, the addition of MGS magnetic field and MEX ion data, and the inclusion of solar wind proxy information to establish context. In addition to these close conjunctions, we will present the preliminary results of a search for times when MEX and MGS pass through the same region of space separated by a delay (for time evolution of plasma populations in certain regions), and times when they occupy the same flux tube (for spatial evolution of particle distributions). Continued study of
Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua
Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.
Calus, M.P.L.; Meuwissen, T.H.E.; Roos, de S.; Veerkamp, R.F.
Genomic selection uses total breeding values for juvenile animals, predicted from a large number of estimated marker haplotype effects across the whole genome. In this study the accuracy of predicting breeding values is compared for four different models including a large number of markers, at diffe
Bouquet, A; Juga, J
Extensive genetic progress has been achieved in dairy cattle populations on many traits of economic importance because of efficient breeding programmes. Success of these programmes has relied on progeny testing of the best young males to accurately assess their genetic merit and hence their potential for breeding. Over the last few years, the integration of dense genomic information into statistical tools used to make selection decisions, commonly referred to as genomic selection, has enabled gains in predicting accuracy of breeding values for young animals without own performance. The possibility to select animals at an early stage allows defining new breeding strategies aimed at boosting genetic progress while reducing costs. The first objective of this article was to review methods used to model and optimize breeding schemes integrating genomic selection and to discuss their relative advantages and limitations. The second objective was to summarize the main results and perspectives on the use of genomic selection in practical breeding schemes, on the basis of the example of dairy cattle populations. Two main designs of breeding programmes integrating genomic selection were studied in dairy cattle. Genomic selection can be used either for pre-selecting males to be progeny tested or for selecting males to be used as active sires in the population. The first option produces moderate genetic gains without changing the structure of breeding programmes. The second option leads to large genetic gains, up to double those of conventional schemes because of a major reduction in the mean generation interval, but it requires greater changes in breeding programme structure. The literature suggests that genomic selection becomes more attractive when it is coupled with embryo transfer technologies to further increase selection intensity on the dam-to-sire pathway. The use of genomic information also offers new opportunities to improve preservation of genetic variation. However
Nielsen, Rasmus; Hellmann, Ines; Hubisz, Melissa
The recent availability of genome-scale genotyping data has led to the identification of regions of the human genome that seem to have been targeted by selection. These findings have increased our understanding of the evolutionary forces that affect the human genome, have augmented our knowledge...... of gene function and promise to increase our understanding of the genetic basis of disease. However, inferences of selection are challenged by several confounding factors, especially the complex demographic history of human populations, and concordance between studies is variable. Although such studies...
Serra, François; Arbiza, Leonardo; Dopazo, Joaquín; Dopazo, Hernán
Classically, the functional consequences of natural selection over genomes have been analyzed as the compound effects of individual genes. The current paradigm for large-scale analysis of adaptation is based on the observed significant deviations of rates of individual genes from neutral evolutionary expectation. This approach, which assumed independence among genes, has not been able to identify biological functions significantly enriched in positively selected genes in individual species. Alternatively, pooling related species has enhanced the search for signatures of selection. However, grouping signatures does not allow testing for adaptive differences between species. Here we introduce the Gene-Set Selection Analysis (GSSA), a new genome-wide approach to test for evidences of natural selection on functional modules. GSSA is able to detect lineage specific evolutionary rate changes in a notable number of functional modules. For example, in nine mammal and Drosophilae genomes GSSA identifies hundreds of functional modules with significant associations to high and low rates of evolution. Many of the detected functional modules with high evolutionary rates have been previously identified as biological functions under positive selection. Notably, GSSA identifies conserved functional modules with many positively selected genes, which questions whether they are exclusively selected for fitting genomes to environmental changes. Our results agree with previous studies suggesting that adaptation requires positive selection, but not every mutation under positive selection contributes to the adaptive dynamical process of the evolution of species.
Full Text Available Classically, the functional consequences of natural selection over genomes have been analyzed as the compound effects of individual genes. The current paradigm for large-scale analysis of adaptation is based on the observed significant deviations of rates of individual genes from neutral evolutionary expectation. This approach, which assumed independence among genes, has not been able to identify biological functions significantly enriched in positively selected genes in individual species. Alternatively, pooling related species has enhanced the search for signatures of selection. However, grouping signatures does not allow testing for adaptive differences between species. Here we introduce the Gene-Set Selection Analysis (GSSA, a new genome-wide approach to test for evidences of natural selection on functional modules. GSSA is able to detect lineage specific evolutionary rate changes in a notable number of functional modules. For example, in nine mammal and Drosophilae genomes GSSA identifies hundreds of functional modules with significant associations to high and low rates of evolution. Many of the detected functional modules with high evolutionary rates have been previously identified as biological functions under positive selection. Notably, GSSA identifies conserved functional modules with many positively selected genes, which questions whether they are exclusively selected for fitting genomes to environmental changes. Our results agree with previous studies suggesting that adaptation requires positive selection, but not every mutation under positive selection contributes to the adaptive dynamical process of the evolution of species.
Full Text Available Abstract Background Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. Results This paper presents a new probe selection algorithm (PanArray that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. Conclusion PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on
Gezan, Salvador A; Osorio, Luis F; Verma, Sujeet; Whitaker, Vance M
The primary goal of genomic selection is to increase genetic gains for complex traits by predicting performance of individuals for which phenotypic data are not available. The objective of this study was to experimentally evaluate the potential of genomic selection in strawberry breeding and to define a strategy for its implementation. Four clonally replicated field trials, two in each of 2 years comprised of a total of 1628 individuals, were established in 2013–2014 and 2014–2015. Five complex yield and fruit quality traits with moderate to low heritability were assessed in each trial. High-density genotyping was performed with the Affymetrix Axiom IStraw90 single-nucleotide polymorphism array, and 17 479 polymorphic markers were chosen for analysis. Several methods were compared, including Genomic BLUP, Bayes B, Bayes C, Bayesian LASSO Regression, Bayesian Ridge Regression and Reproducing Kernel Hilbert Spaces. Cross-validation within training populations resulted in higher values than for true validations across trials. For true validations, Bayes B gave the highest predictive abilities on average and also the highest selection efficiencies, particularly for yield traits that were the lowest heritability traits. Selection efficiencies using Bayes B for parent selection ranged from 74% for average fruit weight to 34% for early marketable yield. A breeding strategy is proposed in which advanced selection trials are utilized as training populations and in which genomic selection can reduce the breeding cycle from 3 to 2 years for a subset of untested parents based on their predicted genomic breeding values. PMID:28090334
Full Text Available Genome-wide scanning for signals of recent positive selection is essential for a comprehensive and systematic understanding of human adaptation. Here, we present a genomic survey of recent local selective sweeps, especially aimed at those nearly or recently completed. A novel approach was developed for such signals, based on contrasting the extended haplotype homozygosity (EHH profiles between populations. We applied this method to the genome single nucleotide polymorphism (SNP data of both the International HapMap Project and Perlegen Sciences, and detected widespread signals of recent local selection across the genome, consisting of both complete and partial sweeps. A challenging problem of genomic scans of recent positive selection is to clearly distinguish selection from neutral effects, given the high sensitivity of the test statistics to departures from neutral demographic assumptions and the lack of a single, accurate neutral model of human history. We therefore developed a new procedure that is robust across a wide range of demographic and ascertainment models, one that indicates that certain portions of the genome clearly depart from neutrality. Simulations of positive selection showed that our tests have high power towards strong selection sweeps that have undergone fixation. Gene ontology analysis of the candidate regions revealed several new functional groups that might help explain some important interpopulation differences in phenotypic traits.
Genomic selection (GS) increases genetic gain by reducing the length of the selection cycle, as has been exemplified in maize using rapid cycling recombination of biparental populations. However, no results of GS applied to maize multi-parental populations have been reported so far. This study is th...
Kosiol, Carolin; Vinar, Tomás; da Fonseca, Rute R
are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.......Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small...... several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR
Jonas, Elisabeth; de Koning, Dirk-Jan
Plant breeding largely depends on phenotypic selection in plots and only for some, often disease-resistance-related traits, uses genetic markers. The more recently developed concept of genomic selection, using a black box approach with no need of prior knowledge about the effect or function of individual markers, has also been proposed as a great opportunity for plant breeding. Several empirical and theoretical studies have focused on the possibility to implement this as a novel molecular method across various species. Although we do not question the potential of genomic selection in general, in this Opinion, we emphasize that genomic selection approaches from dairy cattle breeding cannot be easily applied to complex plant breeding.
Full Text Available High-throughput computing (HTC uses computer clusters to solve advanced computational problems, with the goal of accomplishing high throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general purpose computation on a graphics processing unit (GPU provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin – Madison, which can be leveraged for genomic selection, in terms of central processing unit (CPU capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of
Zhang, X.; Misztal, I.; Heidaritabar, M.; Bastiaansen, J.W.M.; Borg, R.; Okimoto, R.
Background The objective of this study is to investigate if selection on similar traits in different populations progress from selection on similar genes. With the aid of high-density genome wide single-nucleotide polymorphism (SNP) genotyping, it is possible to directly assess changes in allelic
Full Text Available Genome-wide scans for positively selected genes (PSGs in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05, according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for
Wu, Xiao-Lin; Beissinger, Timothy M; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J M; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin-Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized
Jennifer R Mandel
Full Text Available The combination of large-scale population genomic analyses and trait-based mapping approaches has the potential to provide novel insights into the evolutionary history and genome organization of crop plants. Here, we describe the detailed genotypic and phenotypic analysis of a sunflower (Helianthus annuus L. association mapping population that captures nearly 90% of the allelic diversity present within the cultivated sunflower germplasm collection. We used these data to characterize overall patterns of genomic diversity and to perform association analyses on plant architecture (i.e., branching and flowering time, successfully identifying numerous associations underlying these agronomically and evolutionarily important traits. Overall, we found variable levels of linkage disequilibrium (LD across the genome. In general, islands of elevated LD correspond to genomic regions underlying traits that are known to have been targeted by selection during the evolution of cultivated sunflower. In many cases, these regions also showed significantly elevated levels of differentiation between the two major sunflower breeding groups, consistent with the occurrence of divergence due to strong selection. One of these regions, which harbors a major branching locus, spans a surprisingly long genetic interval (ca. 25 cM, indicating the occurrence of an extended selective sweep in an otherwise recombinogenic interval.
Mandel, Jennifer R; Nambeesan, Savithri; Bowers, John E; Marek, Laura F; Ebert, Daniel; Rieseberg, Loren H; Knapp, Steven J; Burke, John M
The combination of large-scale population genomic analyses and trait-based mapping approaches has the potential to provide novel insights into the evolutionary history and genome organization of crop plants. Here, we describe the detailed genotypic and phenotypic analysis of a sunflower (Helianthus annuus L.) association mapping population that captures nearly 90% of the allelic diversity present within the cultivated sunflower germplasm collection. We used these data to characterize overall patterns of genomic diversity and to perform association analyses on plant architecture (i.e., branching) and flowering time, successfully identifying numerous associations underlying these agronomically and evolutionarily important traits. Overall, we found variable levels of linkage disequilibrium (LD) across the genome. In general, islands of elevated LD correspond to genomic regions underlying traits that are known to have been targeted by selection during the evolution of cultivated sunflower. In many cases, these regions also showed significantly elevated levels of differentiation between the two major sunflower breeding groups, consistent with the occurrence of divergence due to strong selection. One of these regions, which harbors a major branching locus, spans a surprisingly long genetic interval (ca. 25 cM), indicating the occurrence of an extended selective sweep in an otherwise recombinogenic interval.
Calus, M.P.L.; Bastiaansen, J.W.M.; Meuwissen, T.H.E.; Veerkamp, R.F.
10 years ago it was still a futuristic dream. Today, genomic selection is the hot topic in the world of animal breeding. But what precisely does it involve? Dutch researchers outline the background to this new technology.
Mathieson, Iain; Lazaridis, Iosif; Rohland, Nadin; Mallick, Swapan; Patterson, Nick; Roodenberg, Songül Alpaslan; Harney, Eadaoin; Stewardson, Kristin; Fernandes, Daniel; Novak, Mario; Sirak, Kendra; Gamba, Cristina; Jones, Eppie R.; Llamas, Bastien; Dryomov, Stanislav; Pickrel, Joseph; Arsuaga, Juan Luís; de Castro, José María Bermúdez; Carbonell, Eudald; Gerritsen, Fokke; Khokhlov, Aleksandr; Kuznetsov, Pavel; Lozano, Marina; Meller, Harald; Mochalov, Oleg; Moiseyev, Vayacheslav; Rojo Guerra, Manuel A.; Roodenberg, Jacob; Vergès, Josep Maria; Krause, Johannes; Cooper, Alan; Alt, Kurt W.; Brown, Dorcas; Anthony, David; Lalueza-Fox, Carles; Haak, Wolfgang; Pinhasi, Ron; Reich, David
Ancient DNA makes it possible to directly witness natural selection by analyzing samples from populations before, during and after adaptation events. Here we report the first scan for selection using ancient DNA, capitalizing on the largest genome-wide dataset yet assembled: 230 West Eurasians dating to between 6500 and 1000 BCE, including 163 with newly reported data. The new samples include the first genome-wide data from the Anatolian Neolithic culture whose genetic material we extracted from the DNA-rich petrous bone and who we show were members of the population that was the source of Europe’s first farmers. We also report a complete transect of the steppe region in Samara between 5500 and 1200 BCE that allows us to recognize admixture from at least two external sources into steppe populations during this period. We detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height. PMID:26595274
Andrés, Aida M; Hubisz, Melissa J; Indap, Amit
to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set......, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment...... in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting...
Taheri, Ali; Robinson, Stephen J; Parkin, Isobel; Gruber, Margaret Y
A new method to improve the efficiency of flanking sequence identification by genome walking was developed based on an expanded, sequential list of criteria for selecting candidate enzymes, plus several other optimization steps. These criteria include: step (1) initially choosing the most appropriate restriction enzyme according to the average fragment size produced by each enzyme determined using in silico digestion of genomic DNA, step (2) evaluating the in silico frequency of fragment size distribution between individual chromosomes, step (3) selecting those enzymes that generate fragments with the majority between 100 bp and 3,000 bp, step (4) weighing the advantages and disadvantages of blunt-end sites vs. cohesive-end sites, step (5) elimination of methylation sensitive enzymes with methylation-insensitive isoschizomers, and step (6) elimination of enzymes with recognition sites within the binary vector sequence (T-DNA and plasmid backbone). Step (7) includes the selection of a second restriction enzyme with highest number of recognition sites within regions not covered by the first restriction enzyme. Step (8) considers primer and adapter sequence optimization, selecting the best adapter-primer pairs according to their hairpin/dimers and secondary structure. In step (9), the efficiency of genomic library development was improved by column-filtration of digested DNA to remove restriction enzyme and phosphatase enzyme, and most important, to remove small genomic fragments (enzymes, NsiI and NdeI, fit these criteria for the Arabidopsis thaliana genome. Their efficiency was assessed using 54 T(3) lines from an Arabidopsis SK enhancer population. Over 70% success rate was achieved in amplifying the flanking sequences of these lines. This strategy was also tested with Brachypodium distachyon to demonstrate its applicability to other larger genomes.
Rubin, C.J.; Megens, H.J.W.C.; Barrio, del J.M.G.; Maqbol, K.; Sayyab, S.; Groenen, M.A.M.
Domestication of wild boar (Sus scrofa) and subsequent selection have resulted in dramatic phenotypic changes in domestic pigs for a number of traits, including behavior, body composition, reproduction, and coat color. Here we have used whole-genome resequencing to reveal some of the loci that under
Schierup, Mikkel Heide; Vekemans, Xavier
Frequency-dependent selection at plant self-incompatibility systems is inherent and well understood theoretically. A self-incompatibility locus leads to a strong peak of diversity in the genome, to a unique distribution of diversity across the species and possibly to increased introgression between...
Genomic selection has revolutionized dairy cattle breeding. Since 2000, assays have been developed to genotype large numbers of single nucleotide polymorphisms (SNP) at relatively low cost. The first commercial SNP genotyping chip was released with a set of 54,001 SNP in December 2007. Over 15,000 ...
Villumsen, Trine Michelle; Janss, Luc
Breeding values for animals with marker data are estimated using a genomic selection approach where data is analyzed using Bayesian multi-marker association models. Fourteen model scenarios with varying haplotype lengths, hyper parameter and prior distributions were compared to find the scenario ...
Garner, J. B.; Douglas, M. L.; Williams, S. R. O; Wales, W. J.; Marett, L. C.; Nguyen, T. T. T.; Reich, C. M.; Hayes, B. J.
Dairy products are a key source of valuable proteins and fats for many millions of people worldwide. Dairy cattle are highly susceptible to heat-stress induced decline in milk production, and as the frequency and duration of heat-stress events increases, the long term security of nutrition from dairy products is threatened. Identification of dairy cattle more tolerant of heat stress conditions would be an important progression towards breeding better adapted dairy herds to future climates. Breeding for heat tolerance could be accelerated with genomic selection, using genome wide DNA markers that predict tolerance to heat stress. Here we demonstrate the value of genomic predictions for heat tolerance in cohorts of Holstein cows predicted to be heat tolerant and heat susceptible using controlled-climate chambers simulating a moderate heatwave event. Not only was the heat challenge stimulated decline in milk production less in cows genomically predicted to be heat-tolerant, physiological indicators such as rectal and intra-vaginal temperatures had reduced increases over the 4 day heat challenge. This demonstrates that genomic selection for heat tolerance in dairy cattle is a step towards securing a valuable source of nutrition and improving animal welfare facing a future with predicted increases in heat stress events. PMID:27682591
Li, Hengde; Wang, Jingwei; Bao, Zhenmin
Genetic prediction of quantitative traits is a critical task in plant and animal breeding. Genomic selection is an accurate and efficient method of estimating genetic merits by using high-density genome-wide single nucleotide polymorphisms (SNP). In the framework of linear mixed models, we extended genomic best linear unbiased prediction (GBLUP) by including additional quantitative trait locus (QTL) information that was extracted from high-throughput SNPs by using least absolute shrinkage selection operator (LASSO). GBLUP was combined with three LASSO methods-standard LASSO (SLGBLUP), adaptive LASSO (ALGBLUP), and elastic net (ENGBLUP)-that were used for detecting QTLs, and these QTLs were fitted as fixed effects; the remaining SNPs were fitted using a realized genetic relationship matrix. Simulations performed under distinct scenarios revealed that (1) the prediction accuracy of SLGBLUP was the lowest; (2) the prediction accuracies of ALGBLUP and ENGBLUP were equivalent to or higher than that of GBLUP, except under scenarios in which the number of QTLs was large; and (3) the persistence of prediction accuracy over generations was strongest in the case of ENGBLUP. Building on the favorable computational characteristics of GBLUP, ENGBLUP enables robust modeling and efficient computation to be performed for genomic selection.
Biazzi, Elisa; Nazzicari, Nelson; Pecetti, Luciano; Brummer, E Charles; Palmonari, Alberto; Tava, Aldo; Annicchiarico, Paolo
Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3-0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits
Full Text Available Hyunsoo Kim,1 Markus Bredel2 1Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA; 2Department of Radiation Oncology, and Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, AL, USA Purpose: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. Patients and methods: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1 1-nearest neighbor (1-NN survival prediction method; (2 random patient selection method and a Cox-based regression method with nested cross-validation; (3 least absolute shrinkage and selection operator (LASSO optimization using whole-genome gene expression profiles; or (4 gene expression profiles of cancer pathway genes. Results: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. Conclusion: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Keywords: brain, feature selection
Valen, Eivind; Sandelin, Albin
A central question in cellular biology is how the cell regulates transcription and discerns when and where to initiate it. Locating transcription start sites (TSSs), the signals that specify them, and ultimately elucidating the mechanisms of regulated initiation has therefore been a recurrent theme. In recent years substantial progress has been made towards this goal, spurred by the possibility of applying genome-wide, sequencing-based analysis. We now have a large collection of high-resolution datasets identifying locations of TSSs, protein-DNA interactions, and chromatin features over whole genomes; the field is now faced with the daunting challenge of translating these descriptive maps into quantitative and predictive models describing the underlying biology. We review here the genomic and chromatin features that underlie TSS selection and usage, focusing on the differences between the major classes of core promoters. Copyright © 2011 Elsevier Ltd. All rights reserved.
Full Text Available Selective signatures in whole genome can help us understand the mechanisms of selection and target causal variants for breeding program. In present study, we performed Extended Haplotype Homozygosity (EHH tests to identify significant core regions harboring such signals in Chinese Holstein, and then verified the biological significance of these identified regions based on commonly-used bioinformatics analyses. Results showed a total of 125 significant regions in entire genome containing some of important functional genes such as LEP, ABCG2, CSN1S1, CSN3 and TNF based on the Gene Ontology database. Some of these annotated genes involved in the core regions overlapped with those identified in our previous GWAS as well as those involved in a recently constructed candidate gene database for cattle, further indicating these genes under positive selection maybe underlie milk production traits and other important traits in Chinese Holstein. Furthermore, in the enrichment analyses for the second level GO terms and pathways, we observed some significant terms over represented in these identified regions as compared to the entire bovine genome. This indicates that some functional genes associated with milk production traits, as reflected by GO terms, could be clustered in core regions, which provided promising evidence for the exploitability of the core regions identified by EHH tests. Findings in our study could help detect functional candidate genes under positive selection for further genetic and breeding research in Chinese Holstein.
Barnes, Helen E; Liu, Guohong; Weston, Christopher Q; King, Paula; Pham, Long K; Waltz, Shannon; Helzer, Kimberly T; Day, Laura; Sphar, Dan; Yamamoto, Robert T; Forsyth, R Allyn
To improve the metagenomic analysis of complex microbiomes, we have repurposed restriction endonucleases as methyl specific DNA binding proteins. As an example, we use DpnI immobilized on magnetic beads. The ten minute extraction technique allows specific binding of genomes containing the DpnI Gm6ATC motif common in the genomic DNA of many bacteria including γ-proteobacteria. Using synthetic genome mixtures, we demonstrate 80% recovery of Escherichia coli genomic DNA even when only femtogram quantities are spiked into 10 µg of human DNA background. Binding is very specific with less than 0.5% of human DNA bound. Next Generation Sequencing of input and enriched synthetic mixtures results in over 100-fold enrichment of target genomes relative to human and plant DNA. We also show comparable enrichment when sequencing complex microbiomes such as those from creek water and human saliva. The technique can be broadened to other restriction enzymes allowing for the selective enrichment of trace and unculturable organisms from complex microbiomes and the stratification of organisms according to restriction enzyme enrichment.
Hohenlohe, Paul A.; Phillips, Patrick C.; Cresko, William A.
Natural selection shapes patterns of genetic variation among individuals, populations, and species, and it does so differentially across genomes. The field of population genomics provides a comprehensive genome-scale view of the action of selection, even beyond traditional model organisms. However, even with nearly complete genomic sequence information, our ability to detect the signature of selection on specific genomic regions depends on choosing experimental and analytical tools appropriat...
Nygaard, Sanne; Braunstein, Alexander; Malsen, Gareth;
a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome...
Full Text Available The genomic GC-content of bacteria varies dramatically, from less than 20% to more than 70%. This variation is generally ascribed to differences in the pattern of mutation between bacteria. Here we test this hypothesis by examining patterns of synonymous polymorphism using datasets from 149 bacterial species. We find a large excess of synonymous GC→AT mutations over AT→GC mutations segregating in all but the most AT-rich bacteria, across a broad range of phylogenetically diverse species. We show that the excess of GC→AT mutations is inconsistent with mutation bias, since it would imply that most GC-rich bacteria are declining in GC-content; such a pattern would be unsustainable. We also show that the patterns are probably not due to translational selection or biased gene conversion, because optimal codons tend to be AT-rich, and the excess of GC→AT SNPs is observed in datasets with no evidence of recombination. We therefore conclude that there is selection to increase synonymous GC-content in many species. Since synonymous GC-content is highly correlated to genomic GC-content, we further conclude that there is selection on genomic base composition in many bacteria.
Iwata, Hiroyoshi; Hayashi, Takeshi; Terakami, Shingo; Takada, Norio; Sawamura, Yutaka; Yamamoto, Toshiya
Although the potential of marker-assisted selection (MAS) in fruit tree breeding has been reported, bi-parental QTL mapping before MAS has hindered the introduction of MAS to fruit tree breeding programs. Genome-wide association studies (GWAS) are an alternative to bi-parental QTL mapping in long-lived perennials. Selection based on genomic predictions of breeding values (genomic selection: GS) is another alternative for MAS. This study examined the potential of GWAS and GS in pear breeding with 76 Japanese pear cultivars to detect significant associations of 162 markers with nine agronomic traits. We applied multilocus Bayesian models accounting for ordinal categorical phenotypes for GWAS and GS model training. Significant associations were detected at harvest time, black spot resistance and the number of spurs and two of the associations were closely linked to known loci. Genome-wide predictions for GS were accurate at the highest level (0.75) in harvest time, at medium levels (0.38-0.61) in resistance to black spot, firmness of flesh, fruit shape in longitudinal section, fruit size, acid content and number of spurs and at low levels (pear.
Watson, James D; Todd, Annabel E; Bray, James; Laskowski, Roman A; Edwards, Aled; Joachimiak, Andrzej; Orengo, Christine A; Thornton, Janet M
The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question.
Okeno, Tobias O; Henryon, Mark; Sørensen, Anders Christian
We used stochastic simulation to test hypotheses that, (i) phenotyping proportion of high ranking selection candidates based on estimated breeding values (EBV) before genotyping could realize as much genetic gains as phenotyping all candidates, and (ii) there is diminishing return to selection...... as more candidates are phenotyped in genomic breeding programs. Three phenotyping criteria, namely, random (RS), EBV and true breeding value (TBV) were investigated under two schemes (across-population and within-litter) using traditional-BLUP and genomic-BLUP models. The EBV ranked above RS and realized...
The correct models for quantitative trait locus mapping are the ones that simultaneously include all significant genetic effects. Such models are difficult to handle for high marker density. Improving statistical methods for high-dimensional data appears to have reached a plateau. Alternative approaches must be explored to break the bottleneck of genomic data analysis. The fact that all markers are located in a few chromosomes of the genome leads to linkage disequilibrium among markers. This suggests that dimension reduction can also be achieved through data manipulation. High-density markers are used to infer recombination breakpoints, which then facilitate construction of bins. The bins are treated as new synthetic markers. The number of bins is always a manageable number, on the order of a few thousand. Using the bin data of a recombinant inbred line population of rice, we demonstrated genetic mapping, using all bins in a simultaneous manner. To facilitate genomic selection, we developed a method to create user-defined (artificial) bins, in which breakpoints are allowed within bins. Using eight traits of rice, we showed that artificial bin data analysis often improves the predictability compared with natural bin data analysis. Of the eight traits, three showed high predictability, two had intermediate predictability, and two had low predictability. A binary trait with a known gene had predictability near perfect. Genetic mapping using bin data points to a new direction of genomic data analysis.
Full Text Available Studies using in-situ Auger electron spectroscopy and reflection high energy electron diffraction, and ex-situ high resolution X-ray diffraction and electron backscatter diffraction reveal that a MgS thin film grown directly on a GaAs (100 substrate by molecular beam epitaxy adopts its most stable phase, the rocksalt structure, with a lattice constant of 5.20 Å. A Au/MgS/n+-GaAs (100 Schottky-barrier photodiode was fabricated and its room temperature photoresponse was measured to have a sharp fall-off edge at 235 nm with rejection of more than three orders at 400 nm and higher than five orders at 500 nm, promising for various solar-blind UV detection applications.
Full Text Available Traditional breeding strategies for selecting superior genotypes depending on phenotypic traits have proven to be of limited success, as this direct selection is hindered by low heritability, genetic interactions such as epistasis, environmental-genotype interactions, and polygenic effects. With the advent of new genomic tools, breeders have paved a way for selecting superior breeds. Genomic selection (GS has emerged as one of the most important approaches for predicting genotype performance. Here, we tested the breeding values of 240 maize subtropical lines phenotyped for drought at different environments using 29,619 cured SNPs. Prediction accuracies of seven genomic selection models (ridge regression, LASSO, elastic net, random forest, reproducing kernel Hilbert space, Bayes A and Bayes B were tested for their agronomic traits. Though prediction accuracies of Bayes B, Bayes A and RKHS were comparable, Bayes B outperformed the other models by predicting highest Pearson correlation coefficient in all three environments. From Bayes B, a set of the top 1053 significant SNPs with higher marker effects was selected across all datasets to validate the genes and QTLs. Out of these 1053 SNPs, 77 SNPs associated with 10 drought-responsive transcription factors. These transcription factors were associated with different physiological and molecular functions (stomatal closure, root development, hormonal signaling and photosynthesis. Of several models, Bayes B has been shown to have the highest level of prediction accuracy for our data sets. Our experiments also highlighted several SNPs based on their performance and relative importance to drought tolerance. The result of our experiments is important for the selection of superior genotypes and candidate genes for breeding drought-tolerant maize hybrids.
Pecetti, Luciano; Brummer, E. Charles; Palmonari, Alberto; Tava, Aldo
Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3–0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits
Aslan, N.; Canbazoglu, M.; Ulusoy, U. [Cumhuriyet Universitesi, Sivas (Turkey). Maden Muhendisligi Bolumu
Washability and ash removal from Gemerek lignite in a multi gravity separator (MGS) were investigated. Experimental studies were carried out on -0.5 mm coal samples containing 37.75% ash in a laboratory C-900 type MGS. Drum speed, shake amplitude, tilt angle, shake frequency, wash water quantity and feed solid ratio were investigated. Optimum operating conditions were determined. 8 refs., 7 figs.
Benjamin F Voight
Full Text Available The identification of signals of very recent positive selection provides information about the adaptation of modern humans to local conditions. We report here on a genome-wide scan for signals of very recent positive selection in favor of variants that have not yet reached fixation. We describe a new analytical method for scanning single nucleotide polymorphism (SNP data for signals of recent selection, and apply this to data from the International HapMap Project. In all three continental groups we find widespread signals of recent positive selection. Most signals are region-specific, though a significant excess are shared across groups. Contrary to some earlier low resolution studies that suggested a paucity of recent selection in sub-Saharan Africans, we find that by some measures our strongest signals of selection are from the Yoruba population. Finally, since these signals indicate the existence of genetic variants that have substantially different fitnesses, they must indicate loci that are the source of significant phenotypic variation. Though the relevant phenotypes are generally not known, such loci should be of particular interest in mapping studies of complex traits. For this purpose we have developed a set of SNPs that can be used to tag the strongest approximately 250 signals of recent selection in each population.
Pérez-Rodríguez, Francisco J; D'Andrea, Lucía; de Castellarnau, Montserrat; Costafreda, Maria Isabel; Guix, Susana; Ribes, Enric; Quer, Josep; Gregori, Josep; Bosch, Albert; Pintó, Rosa M
Virus production still is a challenging issue in antigen manufacture, particularly with slow-growing viruses. Deep-sequencing of genomic regions indicative of efficient replication may be used to identify high-fitness minority individuals suppressed by the ensemble of mutants in a virus quasispecies. Molecular breeding of quasispecies containing colonizer individuals, under regimes allowing more than one replicative cycle, is a strategy to select the fittest competitors among the colonizers. A slow-growing cell culture-adapted hepatitis A virus strain was employed as a model for this strategy. Using genomic selection in two regions predictive of efficient translation, the internal ribosome entry site and the VP1-coding region, high-fitness minority colonizer individuals were identified in a population adapted to conditions of artificially-induced cellular transcription shut-off. Molecular breeding of this population with a second one, also adapted to transcription shut-off and showing an overall colonizer phenotype, allowed the selection of a fast-growing population of great biotechnological potential.
Boer, P.K. de; Groot, R.A. de
Electronic structure calculations for MgO, MgS and HfO2 are reported. It is shown that the conduction bands of MgO and MgS have predominantly anion character, contrary to the common picture of the conduction band being derived from cation states. In transition metal oxides, unoccupied anion states a
Usai, M Graziano; Goddard, Mike E; Hayes, Ben J
We used a least absolute shrinkage and selection operator (LASSO) approach to estimate marker effects for genomic selection. The least angle regression (LARS) algorithm and cross-validation were used to define the best subset of markers to include in the model. The LASSO-LARS approach was tested on two data sets: a simulated data set with 5865 individuals and 6000 Single Nucleotide Polymorphisms (SNPs); and a mouse data set with 1885 individuals genotyped for 10 656 SNPs and phenotyped for a number of quantitative traits. In the simulated data, three approaches were used to split the reference population into training and validation subsets for cross-validation: random splitting across the whole population; random sampling of validation set from the last generation only, either within or across families. The highest accuracy was obtained by random splitting across the whole population. The accuracy of genomic estimated breeding values (GEBVs) in the candidate population obtained by LASSO-LARS was 0.89 with 156 explanatory SNPs. This value was higher than those obtained by Best Linear Unbiased Prediction (BLUP) and a Bayesian method (BayesA), which were 0.75 and 0.84, respectively. In the mouse data, 1600 individuals were randomly allocated to the reference population. The GEBVs for the remaining 285 individuals estimated by LASSO-LARS were more accurate than those obtained by BLUP and BayesA for weight at six weeks and slightly lower for growth rate and body length. It was concluded that LASSO-LARS approach is a good alternative method to estimate marker effects for genomic selection, particularly when the cost of genotyping can be reduced by using a limited subset of markers.
Valen, Eivind; Sandelin, Albin Gustav
A central question in cellular biology is how the cell regulates transcription and discerns when and where to initiate it. Locating transcription start sites (TSSs), the signals that specify them, and ultimately elucidating the mechanisms of regulated initiation has therefore been a recurrent theme......; the field is now faced with the daunting challenge of translating these descriptive maps into quantitative and predictive models describing the underlying biology. We review here the genomic and chromatin features that underlie TSS selection and usage, focusing on the differences between the major classes...
Genomic selection (GS) simultaneously incorporates dense SNP marker genotypes with phenotypic data from related animals to predict animal-specific genomic breeding value (GEBV), which circumvents the need to measure the disease phenotype in potential breeders. Marker assisted selection (MAS) involv...
Full Text Available Thoroughbred horses have been selected for exceptional racing performance resulting in system-wide structural and functional adaptations contributing to elite athletic phenotypes. Because selection has been recent and intense in a closed population that stems from a small number of founder animals Thoroughbreds represent a unique population within which to identify genomic contributions to exercise-related traits. Employing a population genetics-based hitchhiking mapping approach we performed a genome scan using 394 autosomal and X chromosome microsatellite loci and identified positively selected loci in the extreme tail-ends of the empirical distributions for (1 deviations from expected heterozygosity (Ewens-Watterson test in Thoroughbred (n = 112 and (2 global differentiation among four geographically diverse horse populations (F(ST. We found positively selected genomic regions in Thoroughbred enriched for phosphoinositide-mediated signalling (3.2-fold enrichment; P<0.01, insulin receptor signalling (5.0-fold enrichment; P<0.01 and lipid transport (2.2-fold enrichment; P<0.05 genes. We found a significant overrepresentation of sarcoglycan complex (11.1-fold enrichment; P<0.05 and focal adhesion pathway (1.9-fold enrichment; P<0.01 genes highlighting the role for muscle strength and integrity in the Thoroughbred athletic phenotype. We report for the first time candidate athletic-performance genes within regions targeted by selection in Thoroughbred horses that are principally responsible for fatty acid oxidation, increased insulin sensitivity and muscle strength: ACSS1 (acyl-CoA synthetase short-chain family member 1, ACTA1 (actin, alpha 1, skeletal muscle, ACTN2 (actinin, alpha 2, ADHFE1 (alcohol dehydrogenase, iron containing, 1, MTFR1 (mitochondrial fission regulator 1, PDK4 (pyruvate dehydrogenase kinase, isozyme 4 and TNC (tenascin C. Understanding the genetic basis for exercise adaptation will be crucial for the identification of genes
Chandonia, John-Marc; Brenner, Steven E.
The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small
Harrison Paul M
Full Text Available Abstract Background Transcribed pseudogenes are copies of protein-coding genes that have accumulated indicators of coding-sequence decay (such as frameshifts and premature stop codons, but nonetheless remain transcribed. Recent experimental evidence indicates that transcribed pseudogenes may regulate the expression of homologous genes, through antisense interference, or generation of small interfering RNAs (siRNAs. Here, we assessed the genomic evidence for such transcribed pseudogenes of potential functional importance, in the human genome. The most obvious indicators of such functional importance are significant evidence of conservation and selection pressure. Results A variety of pseudogene annotations from multiple sources were pooled and filtered to obtain a subset of sequences that have significant mid-sequence disablements (frameshifts and premature stop codons, and that have clear evidence of full-length mRNA transcription. We found 1750 such transcribed pseudogene annotations (TPAs in the human genome (corresponding to ~11.5% of human pseudogene annotations. We checked for syntenic conservation of TPAs in other mammals (rhesus monkey, mouse, rat, dog and cow. About half of the human TPAs are conserved in rhesus monkey, but strikingly, very few in mouse (~3%. The TPAs conserved in rhesus monkey show evidence of selection pressure (relative to surrounding intergenic DNA on: (i their GC content, and (ii their rate of nucleotide substitution. This is in spite of distributions of Ka/Ks (ratios of non-synonymous to synonymous substitution rates, congruent with a lack of protein-coding ability. Furthermore, we have identified 68 human TPAs that are syntenically conserved in at least two other mammals. Interestingly, we observe three TPA sequences conserved in dog that have intermediate character (i.e., evidence of both protein-coding ability and pseudogenicity, and discuss the implications of this. Conclusion Through evolutionary analysis, we
Ma, Yansong; Reif, Jochen C; Jiang, Yong; Wen, Zixiang; Wang, Dechun; Liu, Zhangxiong; Guo, Yong; Wei, Shuhong; Wang, Shuming; Yang, Chunming; Wang, Huicai; Yang, Chunyan; Lu, Weiguo; Xu, Ran; Zhou, Rong; Wang, Ruizhen; Sun, Zudong; Chen, Huaizhu; Zhang, Wanhai; Wu, Jian; Hu, Guohua; Liu, Chunyan; Luan, Xiaoyan; Fu, Yashu; Guo, Tai; Han, Tianfu; Zhang, Mengchen; Sun, Bincheng; Zhang, Lei; Chen, Weiyuan; Wu, Cunxiang; Sun, Shi; Yuan, Baojun; Zhou, Xinan; Han, Dezhi; Yan, Hongrui; Li, Wenbin; Qiu, Lijuan
Genomic selection is a promising molecular breeding strategy enhancing genetic gain per unit time. The objectives of our study were to (1) explore the prediction accuracy of genomic selection for plant height and yield per plant in soybean [Glycine max (L.) Merr.], (2) discuss the relationship between prediction accuracy and numbers of markers, and (3) evaluate the effect of marker preselection based on different methods on the prediction accuracy. Our study is based on a population of 235 soybean varieties which were evaluated for plant height and yield per plant at multiple locations and genotyped by 5361 single nucleotide polymorphism markers. We applied ridge regression best linear unbiased prediction coupled with fivefold cross-validations and evaluated three strategies of marker preselection. For plant height, marker density and marker preselection procedure impacted prediction accuracy only marginally. In contrast, for grain yield, prediction accuracy based on markers selected with a haplotype block analyses-based approach increased by approximately 4 % compared with random or equidistant marker sampling. Thus, applying marker preselection based on haplotype blocks is an interesting option for a cost-efficient implementation of genomic selection for grain yield in soybean breeding.
Vijaykumar Yogesh Muley
Full Text Available BACKGROUND: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. METHODS: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. CONCLUSIONS: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling
Full Text Available Genotype by environment interactions (GxE are very common in livestock and hamper genetic improvement. On the other hand, GxE is a source of genetic variation: genetic variation in response to environment, e.g. environmental perturbations such as heat stress or disease. In livestock breeding, there is tendency to ignore GxE because of increased complexity of models for genetic evaluations and lack of accuracy in extreme environments. GxE, however, creates opportunities to increase resilience of animals towards environmental perturbations. The main aim of the paper is to investigate to which extent GxE can be exploited with traditional and genomic selection methods. Furthermore, we investigated the benefit of reaction norm models compared to conventional methods ignoring GxE. The questions were addressed with selection index theory. GxE was modelled according to a linear reaction norm model in which the environmental gradient is the contemporary group mean. Economic values were based on linear and non-linear profit equations.Accuracies of environment-specific (GEBV were highest in intermediate environments and lowest in extreme environments. Reaction norm models had higher accuracies of (GEBV in extreme environments than conventional models ignoring GxE. Genomic selection always resulted in higher response to selection in all environments than sib or progeny testing schemes. The increase in response was with genomic selection between 9% and 140% compared to sib testing and between 11% and 114% compared to progeny testing when the reference population consisted of 1 million animals across all environments. When the aim was to decrease environmental sensitivity, the response in slope of the reaction norm model with genomic selection was between 1.09 and 319 times larger than with sib or progeny testing and in the right direction in contrast to sib and progeny testing that still increased environmental sensitivity. This shows that genomic selection
Mulder, Han A.
Genotype by environment interactions (GxE) are very common in livestock and hamper genetic improvement. On the other hand, GxE is a source of genetic variation: genetic variation in response to environment, e.g., environmental perturbations such as heat stress or disease. In livestock breeding, there is tendency to ignore GxE because of increased complexity of models for genetic evaluations and lack of accuracy in extreme environments. GxE, however, creates opportunities to increase resilience of animals toward environmental perturbations. The main aim of the paper is to investigate to which extent GxE can be exploited with traditional and genomic selection methods. Furthermore, we investigated the benefit of reaction norm (RN) models compared to conventional methods ignoring GxE. The questions were addressed with selection index theory. GxE was modeled according to a linear RN model in which the environmental gradient is the contemporary group mean. Economic values were based on linear and non-linear profit equations. Accuracies of environment-specific (G)EBV were highest in intermediate environments and lowest in extreme environments. RN models had higher accuracies of (G)EBV in extreme environments than conventional models ignoring GxE. Genomic selection always resulted in higher response to selection in all environments than sib or progeny testing schemes. The increase in response was with genomic selection between 9 and 140% compared to sib testing and between 11 and 114% compared to progeny testing when the reference population consisted of 1 million animals across all environments. When the aim was to decrease environmental sensitivity, the response in slope of the RN model with genomic selection was between 1.09 and 319 times larger than with sib or progeny testing and in the right direction in contrast to sib and progeny testing that still increased environmental sensitivity. This shows that genomic selection with large reference populations offers great
Veerkamp Roel F
Full Text Available Abstract Background Genomic selection has become a very important tool in animal genetics and is rapidly emerging in plant genetics. It holds the promise to be particularly beneficial to select for traits that are difficult or expensive to measure, such as traits that are measured in one environment and selected for in another environment. The objective of this paper was to develop three models that would permit multi-trait genomic selection by combining scarcely recorded traits with genetically correlated indicator traits, and to compare their performance to single-trait models, using simulated datasets. Methods Three (SNP Single Nucleotide Polymorphism based models were used. Model G and BCπ0 assumed that contributed (covariances of all SNP are equal. Model BSSVS sampled SNP effects from a distribution with large (or small effects to model SNP that are (or not associated with a quantitative trait locus. For reasons of comparison, model A including pedigree but not SNP information was fitted as well. Results In terms of accuracies for animals without phenotypes, the models generally ranked as follows: BSSVS > BCπ0 > G > > A. Using multi-trait SNP-based models, the accuracy for juvenile animals without any phenotypes increased up to 0.10. For animals with phenotypes on an indicator trait only, accuracy increased up to 0.03 and 0.14, for genetic correlations with the evaluated trait of 0.25 and 0.75, respectively. Conclusions When the indicator trait had a genetic correlation lower than 0.5 with the trait of interest in our simulated data, the accuracy was higher if genotypes rather than phenotypes were obtained for the indicator trait. However, when genetic correlations were higher than 0.5, using an indicator trait led to higher accuracies for selection candidates. For different combinations of traits, the level of genetic correlation below which genotyping selection candidates is more effective than obtaining phenotypes for an indicator
We investigate the dust composition of detached shells around carbon stars, with a focus to understand the origin of the cool magnesium-sulfide (MgS) material around warm carbon stars, which has been detected around several of these objects. We build a radiative transfer model of a carbon star surrounded by an expanding detached shell of dust. The shell contains amorphous carbon grains and MgS grains. We find that a small fraction of MgS dust (2% of the dust mass) can give a significant contribution to the IRAS 25 micron flux. However, the presence of MgS in the detached shell cannot be inferred from the IRAS broadband photometry alone but requires infrared spectroscopy. We apply the model to the detached-shell sources R Scl and U Cam, both exhibiting a cool MgS feature in their ISO/SWS spectra. We use the shell parameters derived for the molecular shell, using the CO submillimetre maps. The models, with MgS grains located in the detached shell, explain the MgS grain temperature, as derived from their ISO spe...
Nygaard, Sanne; Braunstein, Alexander; Malsen, Gareth
Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ~23 Mb genomes encoding ~5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes have...... a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome...
Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;
in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...
Full Text Available Like other retroviruses, human immunodeficiency virus type 1 (HIV-1 selectively packages genomic RNA (gRNA during virus assembly. However, in the absence of the gRNA, cellular messenger RNAs (mRNAs are packaged. While the gRNA is selected because of its cis-acting packaging signal, the mechanism of this selection is not understood. The affinity of Gag (the viral structural protein for cellular RNAs at physiological ionic strength is not much higher than that for the gRNA. However, binding to the gRNA is more salt-resistant, implying that it has a higher non-electrostatic component. We have previously studied the spacer 1 (SP1 region of Gag and showed that it can undergo a concentration-dependent conformational transition. We proposed that this transition represents the first step in assembly, i.e., the conversion of Gag to an assembly-ready state. To explain selective packaging of gRNA, we suggest here that binding of Gag to gRNA, with its high non-electrostatic component, triggers this conversion more readily than binding to other RNAs; thus we predict that a Gag–gRNA complex will nucleate particle assembly more efficiently than other Gag–RNA complexes. New data shows that among cellular mRNAs, those with long 3′-untranslated regions (UTR are selectively packaged. It seems plausible that the 3′-UTR, a stretch of RNA not occupied by ribosomes, offers a favorable binding site for Gag.
Full Text Available Abstract Background The endosymbiont Wolbachia pipientis infects a broad range of arthropod and filarial nematode hosts. These diverse associations form an attractive model for understanding host:symbiont coevolution. Wolbachia's ubiquity and ability to dramatically alter host reproductive biology also form the foundation of research strategies aimed at controlling insect pests and vector-borne disease. The Wolbachia strains that infect nematodes are phylogenetically distinct, strictly vertically transmitted, and required by their hosts for growth and reproduction. Insects in contrast form more fluid associations with Wolbachia. In these taxa, host populations are most often polymorphic for infection, horizontal transmission occurs between distantly related hosts, and direct fitness effects on hosts are mild. Despite extensive interest in the Wolbachia system for many years, relatively little is known about the molecular mechanisms that mediate its varied interactions with different hosts. We have compared the genomes of the Wolbachia that infect Drosophila melanogaster, wMel and the nematode Brugia malayi, wBm to that of an outgroup Anaplasma marginale to identify genes that have experienced diversifying selection in the Wolbachia lineages. The goal of the study was to identify likely molecular mechanisms of the symbiosis and to understand the nature of the diverse association across different hosts. Results The prevalence of selection was far greater in wMel than wBm. Genes contributing to DNA metabolism, cofactor biosynthesis, and secretion were positively selected in both lineages. In wMel there was a greater emphasis on DNA repair, cell division, protein stability, and cell envelope synthesis. Conclusion Secretion pathways and outer surface protein encoding genes are highly affected by selection in keeping with host:parasite theory. If evidence of selection on various cofactor molecules reflects possible provisioning, then both insect as
Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren
GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often negle
Hohenlohe, Paul A; Phillips, Patrick C; Cresko, William A
Natural selection shapes patterns of genetic variation among individuals, populations, and species, and it does so differentially across genomes. The field of population genomics provides a comprehensive genome-scale view of the action of selection, even beyond traditional model organisms. However, even with nearly complete genomic sequence information, our ability to detect the signature of selection on specific genomic regions depends on choosing experimental and analytical tools appropriate to the biological situation. For example, processes that occur at different timescales, such as sorting of standing genetic variation, mutation-selection balance, or fixed interspecific divergence, have different consequences for genomic patterns of variation. Inappropriate experimental or analytical approaches may fail to detect even strong selection or falsely identify a signature of selection. Here we outline the conceptual framework of population genomics, relate genomic patterns of variation to evolutionary processes, and identify major biological factors to be considered in studies of selection. As data-gathering technology continues to advance, our ability to understand selection in natural populations will be limited more by conceptual and analytical weaknesses than by the amount of molecular data. Our aim is to bring critical biological considerations to the fore in population genomics research and to spur the development and application of analytical tools appropriate to diverse biological systems.
By using one-dimensional genome scanning, it is possible to directly identify the restricted genomic DNA fragment that reflects the site of genetic change. The subsequent strategies to obtain the molecular clones of the corresponding restriction fragment are usually as follows: (i) the restriction of a mass quantity of an appropriate genomic DNA, (ii) the size-fractionation of the restricted DNA on a preparative electrophoresis gel in order to enrich the corresponding restriction fragment, (iii) the construction of the size-selected libraries from the fractionated genomic DNA, and (iv) the screening of the library to obtain an objective clone which is identified on the analytical genome scanning gel. A knowledge of the size distribution pattern of restriction fragments of the genomic DNA makes it possible to calculate the heterogeneity or complexity of the restriction fragment in each size-fraction. This manuscript first describes the distribution of the restriction fragments with respect to their length. Some examples of the practical application of this theory to genome scanning is then discussed using presumptive genome scanning gels. The way to calculate such DNA complexities in the prepared size-fractionated samples is also demonstrated. Such information should greatly facilitate the design of experimental strategies for the cloning of a certain size of genomic DNA after digestion with restriction enzyme(s) as is the case with genome scanning.
Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries...... affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations....... these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination...
Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc
Cπ method and applied to 1,272 Duroc pigs with both genotypic and phenotypic records including residual (RFI) and daily feed intake (DFI), average daily gain (ADG) and back fat (BF)). Records were split into a training (968 pigs) and a validation dataset (304 pigs). SNPs were annotated by 14 different...... groups. Genomic prediction has accuracy comparable to an own phenotype and use of genomic prediction can be cost effective by replacing feed intake measurement. Use of genomic annotation of SNPs and QTL information had no largely significant impact on predictive accuracy for the current traits but may...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...
Full Text Available Abstract Background In future Best Linear Unbiased Prediction (BLUP evaluations of dairy cattle, genomic selection of young sires will cause evaluation biases and loss of accuracy once the selected ones get progeny. Methods To avoid such bias in the estimation of breeding values, we propose to include information on all genotyped bulls, including the culled ones, in BLUP evaluations. Estimated breeding values based on genomic information were converted into genomic pseudo-performances and then analyzed simultaneously with actual performances. Using simulations based on actual data from the French Holstein population, bias and accuracy of BLUP evaluations were computed for young sires undergoing progeny testing or genomic pre-selection. For bulls pre-selected based on their genomic profile, three different types of information can be included in the BLUP evaluations: (1 data from pre-selected genotyped candidate bulls with actual performances on their daughters, (2 data from bulls with both actual and genomic pseudo-performances, or (3 data from all the genotyped candidates with genomic pseudo-performances. The effects of different levels of heritability, genomic pre-selection intensity and accuracy of genomic evaluation were considered. Results Including information from all the genotyped candidates, i.e. genomic pseudo-performances for both selected and culled candidates, removed bias from genetic evaluation and increased accuracy. This approach was effective regardless of the magnitude of the initial bias and as long as the accuracy of the genomic evaluations was sufficiently high. Conclusions The proposed method can be easily and quickly implemented in BLUP evaluations at the national level, although some improvement is necessary to more accurately propagate genomic information from genotyped to non-genotyped animals. In addition, it is a convenient method to combine direct genomic, phenotypic and pedigree-based information in a multiple
Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Elferink, M.G.; Megens, H.J.W.C.; Vereijken, A.; Crooijmans, R.P.M.A.; Groenen, M.A.M.
Identifying genomics regions that are affected by selection is important to understand the domestication and selection history of the domesticated chicken, as well as understanding molecular pathways underlying phenotypic traits and breeding goals. While whole-genome approaches, either high-density
Full Text Available Genomic selection is a promising development in agriculture, aiming improved production by exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. It opens opportunities for research, as novel algorithms and lab methodologies are developed. Genomic selection can be applied in many breeds and species. Further research on the implementation of genomic selection in breeding programs is highly desirable not only for the common good, but also the private sector (breeding companies. It has been projected that this approach will improve selection routines, especially in species with long reproduction cycles, late or sex-limited or expensive trait recording and for complex traits. The task of integrating genomic selection into existing breeding programs is, however, not straightforward. Despite successful integration into breeding programs for dairy cattle, it has yet to be shown how much emphasis can be given to the genomic information and how much additional phenotypic information is needed from new selection candidates. Genomic selection is already part of future planning in many breeding companies of pigs and beef cattle among others, but further research is needed to fully estimate how effective the use of genomic information will be for the prediction of the performance of future breeding stock. Genomic prediction of production in crossbreeding and across-breed schemes, costs and choice of individuals for genotyping are reasons for a reluctance to fully rely on genomic information for selection decisions. Breeding objectives are highly dependent on the industry and the additional gain when using genomic information has to be considered carefully. This review synthesizes some of the suggested approaches in selected livestock species including cattle, pig, chicken and fish. It outlines tasks to help understanding possible consequences when applying genomic information in
Zhang, Hui; Wang, Shou-Zhi; Wang, Zhi-Peng; Da, Yang; Wang, Ning; Hu, Xiao-Xiang; Zhang, Yuan-Dan; Wang, Yu-Xiang; Leng, Li; Tang, Zhi-Quan; Li, Hui
Genomic regions controlling abdominal fatness (AF) were studied in the Northeast Agricultural University broiler line divergently selected for AF. In this study, the chicken 60KSNP chip and extended haplotype homozygosity (EHH) test were used to detect genome-wide signatures of AF. A total of 5357 and 5593 core regions were detected in the lean and fat lines, and 51 and 57 reached a significant level (Pchickens. We provide a genome-wide map of selection signatures in the chicken genome, and make a contribution to the better understanding the mechanisms of selection for AF content in chickens. The selection for low AF in commercial breeding using this information will accelerate the breeding progress.
Qu, Z.; Tamppari, L. K.; Smith, M. D.; Bass, Deborah; Hale, A. S.
Water-ice in the Martian atmosphere was first identified in the Mariner 9 Infrared Interferometer Spectrometer (IRIS) spectra. The Viking Imaging Subsystem (VIS) instruments aboard the Viking orbiter also observed water-ice clouds and hazes in the Martian atmosphere. The MGS TES instrument is an infrared inferometer/spectrometer which covers the spectral range 6-50 micron with a selectable sampling resolution of either 5 or 10 per cm. Using the relatively independent and distinct spectral signatures for dust and water-ice, these two retrieved quantities have been retrieved simultaneously. Although the interrelations among the two quantities have been analyzed by Smith et al. and the retrievals are thought to be robust, understanding the impact of each quantity on the other during their retrievals as well as the impact from the surface for retrievals is important for correctly interpreting the science, and therefore requires close examination. An understanding of the correlation or a-correlation between dust and water-ice would aid in understanding the physical processes responsible for the transport of aerosols in the Martian atmosphere. In this presentation, we present an investigation of the correlation between water-ice and dust in the MGS TES data set.
Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ~23 Mb genomes encoding ~5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes have a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome alignments of seven Plasmodium species, we show that protein-coding, intergenic and intronic regions are all subject to purifying selection and we identify 670 conserved non-genic elements. We then use genome-wide polymorphism data from P. falciparum to describe short-term selective processes in this species and identify some candidate genes for balancing (diversifying) selection. Our analyses suggest that there are many functional elements in the non-genic regions of these genomes and that adaptive evolution has occurred more frequently in the protein-coding regions of the genome. © 2010 Nygaard et al.
Albrechtsen, Anders; Moltke, Ida; Nielsen, Rasmus
There has recently been considerable interest in detecting natural selection in the human genome. Selection will usually tend to increase identity-by-descent (IBD) among individuals in a population, and many methods for detecting recent and ongoing positive selection indirectly take advantage......, we use a recently developed method for identifying IBD sharing among individuals from genome-wide data to scan populations from the new HapMap phase 3 project for regions with excess IBD sharing in order to identify regions in the human genome that have been under strong, very recent selection....... The HLA region is by far the region showing the most extreme signal, suggesting that much of the strong recent selection acting on the human genome has been immune related and acting on HLA loci. As equilibrium overdominance does not tend to increase IBD, we argue that this type of selection cannot...
Pavlidis, Pavlos; Jensen, Jeffrey D.; Stephan, Wolfgang
A major goal of population genomics is to reconstruct the history of natural populations and to infer the neutral and selective scenarios that can explain the present-day polymorphism patterns. However, the separation between neutral and selective hypotheses has proven hard, mainly because both may predict similar patterns in the genome. This study focuses on the development of methods that can be used to distinguish neutral from selective hypotheses in equilibrium and nonequilibrium populati...
Full Text Available Copy-number variations (CNV, loss of heterozygosity (LOH, and uniparental disomy (UPD are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS, is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs. In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.
Wang, Yu; Li, Wei; Xia, Yingying; Wang, Chongzhi; Tang, Y Tom; Guo, Wenying; Li, Jinliang; Zhao, Xia; Sun, Yepeng; Hu, Juan; Zhen, Hefu; Zhang, Xiandong; Chen, Chao; Shi, Yujian; Li, Lin; Cao, Hongzhi; Du, Hongli; Li, Jian
Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.
Fariello, María Inés; Boitard, Simon; Mercier, Sabine; Robelin, David; Faraut, Thomas; Arnould, Cécile; Recoquillay, Julien; Bouchez, Olivier; Salin, Gérald; Dehais, Patrice; Gourichon, David; Leroux, Sophie; Pitel, Frédérique; Leterrier, Christine; SanCristobal, Magali
Detecting genomic footprints of selection is an important step in the understanding of evolution. Accounting for linkage disequilibrium in genome scans increases detection power, but haplotype-based methods require individual genotypes and are not applicable on pool-sequenced samples. We propose to take advantage of the local score approach to account for linkage disequilibrium in genome scans for selection, cumulating (possibly small) signals from single markers over a genomic segment, to clearly pinpoint a selection signal. Using computer simulations, we demonstrate that this approach detects selection with higher power than several state-of-the-art single marker, windowing or haplotype-based approaches. We illustrate this on two benchmark data sets including individual genotypes, for which we obtain similar results with the local score and one haplotype-based approach. Finally, we apply the local score approach to Pool-Seq data obtained from a divergent selection experiment on behavior in quail, and obtain precise and biologically coherent selection signals: while competing methods fail to highlight any clear selection signature, our method detects several regions involving genes known to act on social responsiveness or autistic traits. Although we focus here on the detection of positive selection from multiple population data, the local score approach is general and can be applied to other genome scans for selection or other genome-wide analyses such as GWAS. This article is protected by copyright. All rights reserved.
Mandel, Jennifer R.; Savithri Nambeesan; Bowers, John E; Laura F Marek; Daniel Ebert; Loren H. Rieseberg; Knapp, Steven J.; Burke, John M.
The combination of large-scale population genomic analyses and trait-based mapping approaches has the potential to provide novel insights into the evolutionary history and genome organization of crop plants. Here, we describe the detailed genotypic and phenotypic analysis of a sunflower (Helianthus annuus L.) association mapping population that captures nearly 90% of the allelic diversity present within the cultivated sunflower germplasm collection. We used these data to characterize overall ...
Henryon, Mark; Berg, Peer; Sørensen, Anders Christian
We reasoned that there are diminishing marginal returns from genomic selection as the proportion of genotyped selection candidates is increased and breeding values based on a priori information are used to choose the candidates that are genotyped. We tested this premise by stochastic simulation...... of breeding schemes that resembled those used for pigs. We estimated rates of genetic gain and inbreeding realized by genomic selection in breeding schemes where candidates were phenotyped before genotyping and 0-100% of the candidates were genotyped based on predicted breeding values. Genotypings were...... allocated to male and female candidates at ratios of 100:0, 75:25, 50:50, 25:75, and 0:100. For genotyped candidates, a direct-genomic value (DGV) was sampled with reliabilities 0.10, 0.50, and 0.90. Ten sires and 300 dams with the highest breeding values after genotyping were selected at each generation...
Guosheng, Su; Madsen, Per; Nielsen, Ulrik Sander
This study investigated the accuracy of direct genomic breeding values (DGV) using a genomic BLUP model, genomic enhanced breeding values (GEBV) using a one-step blending approach, and GEBV using a selection index blending approach for 15 traits of Nordic Red Cattle. The data comprised 6,631 bull......-step blending approach is a good alternative to predict GEBV in practical genetic evaluation program....
Genomics offers new opportunities for the effective selection of novel traits. For traits such as mastitis resistance, hoof health, or the prediction of milk composition from mid-infrared (MIR) data, for example, enough records are usually available to carry out genomic evaluations using sire genoty...
Jones, Matthew R; Forester, Brenna R; Teufel, Ashley I; Adams, Rachael V; Anstett, Daniel N; Goodrich, Betsy A; Landguth, Erin L; Joost, Stéphane; Manel, Stéphanie
Uncovering the genetic basis of adaptation hinges on the ability to detect loci under selection. However, population genomics outlier approaches to detect selected loci may be inappropriate for clinal populations or those with unclear population structure because they require that individuals be clustered into populations. An alternate approach, landscape genomics, uses individual-based approaches to detect loci under selection and reveal potential environmental drivers of selection. We tested four landscape genomics methods on a simulated clinal population to determine their effectiveness at identifying a locus under varying selection strengths along an environmental gradient. We found all methods produced very low type I error rates across all selection strengths, but elevated type II error rates under "weak" selection. We then applied these methods to an AFLP genome scan of an alpine plant, Campanula barbata, and identified five highly supported candidate loci associated with precipitation variables. These loci also showed spatial autocorrelation and cline patterns indicative of selection along a precipitation gradient. Our results suggest that landscape genomics in combination with other spatial analyses provides a powerful approach for identifying loci potentially under selection and explaining spatially complex interactions between species and their environment.
Li, Zitong; Sillanpää, Mikko J
Quantitative trait loci (QTL)/association mapping aims at finding genomic loci associated with the phenotypes, whereas genomic selection focuses on breeding value prediction based on genomic data. Variable selection is a key to both of these tasks as it allows to (1) detect clear mapping signals of QTL activity, and (2) predict the genome-enhanced breeding values accurately. In this paper, we provide an overview of a statistical method called least absolute shrinkage and selection operator (LASSO) and two of its generalizations named elastic net and adaptive LASSO in the contexts of QTL mapping and genomic breeding value prediction in plants (or animals). We also briefly summarize the Bayesian interpretation of LASSO, and the inspired hierarchical Bayesian models. We illustrate the implementation and examine the performance of methods using three public data sets: (1) North American barley data with 127 individuals and 145 markers, (2) a simulated QTLMAS XII data with 5,865 individuals and 6,000 markers for both QTL mapping and genomic selection, and (3) a wheat data with 599 individuals and 1,279 markers only for genomic selection.
Legarra, Andres; Christensen, Ole Fredslund; Aguilar, Ignacio;
Genomic evaluation methods assume that the reference population is genotyped and phenotyped. This is most often false and the generation of pseudo-phenotypes is uncertain and inaccurate. However, markers obey transmission rules and therefore the covariances of marker genotypes across individuals ...
Mar 9, 2013 ... On a technological level the development of polymerase ... Human Genome project most farm animal species have been sequenced .... included in the model to account for genetic variation that is ..... available number of breeding animals with records for maternal, reproduction and growth efficiency traits is.
Shepherd, Ross K; Meuwissen, Theo H E; Woolliams, John A
The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time.
Gompert, Zachariah; Comeault, Aaron A; Farkas, Timothy E; Feder, Jeffrey L; Parchman, Thomas L; Buerkle, C Alex; Nosil, Patrik
Understanding natural selection's effect on genetic variation is a major goal in biology, but the genome-scale consequences of contemporary selection are not well known. In a release and recapture field experiment we transplanted stick insects to native and novel host plants and directly measured allele frequency changes within a generation at 186,576 genetic loci. We observed substantial, genome-wide allele frequency changes during the experiment, most of which could be attributed to random mortality (genetic drift). However, we also documented that selection affected multiple genetic loci distributed across the genome, particularly in transplants to the novel host. Host-associated selection affecting the genome acted on both a known colour-pattern trait as well as other (unmeasured) phenotypes. We also found evidence that selection associated with elevation affected genome variation, although our experiment was not designed to test this. Our results illustrate how genomic data can identify previously underappreciated ecological sources and phenotypic targets of selection. © 2013 The Authors. Ecology Letters published by John Wiley & Sons Ltd and CNRS.
Hannibal, Roberta L; Baker, Julie C
While most cells maintain a diploid state, polyploid cells exist in many organisms and are particularly prevalent within the mammalian placenta , where they can generate more than 900 copies of the genome . Polyploidy is thought to be an efficient method of increasing the content of the genome by avoiding the costly and slow process of cytokinesis [1, 3, 4]. Polyploidy can also affect gene regulation by amplifying a subset of genomic regions required for specific cellular function [1, 3, 4]. This mechanism is found in the fruit fly Drosophila melanogaster, where polyploid ovarian follicle cells amplify genomic regions containing chorion genes, which facilitate secretion of eggshell proteins . Here, we report that genomic amplification also occurs in mammals at selective regions of the genome in parietal trophoblast giant cells (p-TGCs) of the mouse placenta. Using whole-genome sequencing (WGS) and digital droplet PCR (ddPCR) of mouse p-TGCs, we identified five amplified regions, each containing a gene family known to be involved in mammalian placentation: the prolactins (two clusters), serpins, cathepsins, and the natural killer (NK)/C-type lectin (CLEC) complex [6-12]. We report here the first description of amplification at selective genomic regions in mammals and present evidence that this is an important mode of genome regulation in placental TGCs.
Simon-Loriere, Etienne; Rossolillo, Paola; Negroni, Matteo
Recombination is an evolutionary mechanism intrinsic to the evolution of many RNA viruses. In retroviruses and notably in the case of HIV, recombination is so frequent that it can be considered as part of its mode of replication. This process not only plays a central role in shaping HIV genetic diversity worldwide, but has also been involved in immune escape and development of resistance to antiviral treatments. Recombination does not create new mutations in the existing genetic repertoire of the virus, but creates new combinations of pre-existing polymorphisms. The simultaneous insertion of multiple substitutions in a single replication cycle leaves little room for the progressive coevolution of regions of proteins, RNA or, more in general, genomes, to accommodate these drastic sequence changes. Therefore, recombination, while allowing the virus to rapidly explore larger sequence space than the slow accumulation of point mutations, also runs the risk of generating non functional viruses. Recombination is the consequence of a switch in the template used during reverse transcription and is promoted by the presence of structured regions in the genomic RNA template. In this review, we discuss new observations suggesting that the distribution of RNA structures along the HIV genome may enhance recombination rates in regions where the resultant progeny is less likely to be impaired, and could therefore maximize the evolutionary value of this source of genetic diversity.
Lohmueller, Kirk E; Albrechtsen, Anders; Li, Yingrui;
throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has......A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries...... and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations...
Full Text Available DNA methylation at CpG islands (CGIs is one of the most intensively studied epigenetic mechanisms. It is fundamental for cellular differentiation and control of transcriptional potential. DNA methylation is involved also in several processes that are central to evolutionary biology, including phenotypic plasticity and evolvability. In this study, we explored the relationship between CpG islands methylation and signatures of selective pressure in Homo Sapiens, using a computational biology approach. By analyzing methylation data of 25 cell lines from the Encyclopedia of DNA Elements (ENCODE Consortium, we compared the DNA methylation of CpG islands in genomic regions under selective pressure with the methylation of CpG islands in the remaining part of the genome. To define genomic regions under selective pressure, we used three different methods, each oriented to provide distinct information about selective events. Independently of the method and of the cell type used, we found evidences of undermethylation of CGIs in human genomic regions under selective pressure. Additionally, by analyzing SNP frequency in CpG islands, we demonstrated that CpG islands in regions under selective pressure show lower genetic variation. Our findings suggest that the CpG islands in regions under selective pressure seem to be somehow more "protected" from methylation when compared with other regions of the genome.
Identifying targets of positive selection in farm animals has, until recently, been frustratingly slow, relying on the analysis of individual candidate genes. Genomics, however, has provided the necessary resources to systematically interrogate the entire genome for signatures of selection. This review described important recent results derived from the application of genome-wide scan to the study of genetic changes in farm animals. These included findings of regions of the genome that showed breed differentiation, evidence of selective sweeps within individual genomes and signatures of demographic events. Particular attention is focused on the study of the implications for domestication. To date, sixteen genome-wide scans for recent or ongoing positive selection have been performed in farm animals. A key challenge is to begin synthesizing these newly constructed maps of selection into a coherent narrative of animal breed evolutionary history and derive a deeper mechanistic understanding of how animal populations improve or evolve. The major insights from the surveyed studies are highlighted and directions for future study are suggested.
Northcutt Sally L
Full Text Available Abstract Background Molecular estimates of breeding value are expected to increase selection response due to improvements in the accuracy of selection and a reduction in generation interval, particularly for traits that are difficult or expensive to record or are measured late in life. Several statistical methods for incorporating molecular data into breeding value estimation have been proposed, however, most studies have utilized simulated data in which the generated linkage disequilibrium may not represent the targeted livestock population. A genomic relationship matrix was developed for 698 Angus steers and 1,707 Angus sires using 41,028 single nucleotide polymorphisms and breeding values were estimated using feed efficiency phenotypes (average daily feed intake, residual feed intake, and average daily gain recorded on the steers. The number of SNPs needed to accurately estimate a genomic relationship matrix was evaluated in this population. Results Results were compared to estimates produced from pedigree-based mixed model analysis of 862 Angus steers with 34,864 identified paternal relatives but no female ancestors. Estimates of additive genetic variance and breeding value accuracies were similar for AFI and RFI using the numerator and genomic relationship matrices despite fewer animals in the genomic analysis. Bootstrap analyses indicated that 2,500-10,000 markers are required for robust estimation of genomic relationship matrices in cattle. Conclusions This research shows that breeding values and their accuracies may be estimated for commercially important sires for traits recorded in experimental populations without the need for pedigree data to establish identity by descent between members of the commercial and experimental populations when at least 2,500 SNPs are available for the generation of a genomic relationship matrix.
Full Text Available Among primates, genome-wide analysis of recent positive selection is currently limited to the human species because it requires extensive sampling of genotypic data from many individuals. The extent to which genes positively selected in human also present adaptive changes in other primates therefore remains unknown. This question is important because a gene that has been positively selected independently in the human and in other primate lineages may be less likely to be involved in human specific phenotypic changes such as dietary habits or cognitive abilities. To answer this question, we analysed heterozygous Single Nucleotide Polymorphisms (SNPs in the genomes of single human, chimpanzee, orangutan, and macaque individuals using a new method aiming to identify selective sweeps genome-wide. We found an unexpectedly high number of orthologous genes exhibiting signatures of a selective sweep simultaneously in several primate species, suggesting the presence of hotspots of positive selection. A similar significant excess is evident when comparing genes positively selected during recent human evolution with genes subjected to positive selection in their coding sequence in other primate lineages and identified using a different test. These findings are further supported by comparing several published human genome scans for positive selection with our findings in non-human primate genomes. We thus provide extensive evidence that the co-occurrence of positive selection in humans and in other primates at the same genetic loci can be measured with only four species, an indication that it may be a widespread phenomenon. The identification of positive selection in humans alongside other primates is a powerful tool to outline those genes that were selected uniquely during recent human evolution.
We investigated geographic adaptation and human selection using high-density SNP data of five diverse cattle breeds. Based on allele frequency differences, we detected hundreds of candidate regions under positive selection across Holstein, Angus, Charolais, Brahman, and N'Dama. In addition to well-k...
Nielsen, Rasmus; Williamson, Scott; Kim, Yuseob
of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence...
Jalvingh, Kirsten Marjorie
While evolution is generally thought of as a slow process, acting at long time scales, selection can also cause rapid evolutionary changes in a population in just a few generations. This is especially true for the immune system, which is under constant selection by parasites and pathogens. In this t
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M
Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous
Full Text Available The smallest genomes of any photosynthetic organisms are found in a group of free-living marine cyanobacteria, Prochlorococcus. To determine the underlying evolutionary mechanisms, we developed a new method to reconstruct the steps leading to the Prochlorococcus genome reduction using 12 Prochlorococcus and 6 marine Synechococcus genomes. Our results reveal that small genome sizes within Prochlorococcus were largely determined shortly after the split of Prochlorococcus and Synechococcus (an early big shrink and thus for the first time decouple the genome reduction from Prochlorococcus diversification. A maximum likelihood approach was then used to estimate changes of nucleotide substitution rate and selection strength along Prochlorococcus evolution in a phylogenetic framework. Strong genome wide purifying selection was associated with the loss of many genes in the early evolutionary stage. The deleted genes were distributed around the genome, participated in many different functional categories and in general had been under relaxed selection pressure. We propose that shortly after Prochlorococcus diverged from its common ancestor with marine Synechococcus, its population size increased quickly thus increasing efficacy of selection. Due to limited nutrients and a relatively constant environment, selection favored a streamlined genome for maximum economy. Strong genome wide selection subsequently caused the loss of genes with small functional effect including the loss of some DNA repair genes. In summary, genome reduction in Prochlorococcus resulted in genome features that are similar to symbiotic bacteria and pathogens, however, the small genome sizes resulted from an increase in genome wide selection rather than a consequence of a reduced ecological niche or relaxed selection due to genetic drift.
Jarquín, D.; Crossa, J.; Lacaze, X.; Cheyron, du P.; Daucourt, J.; Lorgeou, J.; Piraux, F.; Guerreiro, L.; Pérez, P.; Calus, M.P.L.; Burgueno, J.; Campos, de los G.
In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (G × E). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of
Full Text Available Abstract Background Actinobacillus pleuropneumoniae is an economically important animal pathogen that causes contagious pleuropneumonia in pigs. Currently, the molecular evolutionary trajectories for this pathogenic bacterium remain to require a better elucidation under the help of comparative genomics data. For this reason, we employed a comparative phylogenomic approach to obtain a comprehensive understanding of roles of natural selective pressure and homologous recombination during adaptation of this pathogen to its swine host. Results In this study, 12 A. pleuropneumoniae genomes were used to carry out a phylogenomic analyses. We identified 1,587 orthologous core genes as an initial data set for the estimation of genetic recombination and positive selection. Based on the analyses of four recombination tests, 23% of the core genome of A. pleuropneumoniae showed strong signals for intragenic homologous recombination. Furthermore, the selection analyses indicated that 57 genes were undergoing significant positive selection. Extensive function properties underlying these positively selected genes demonstrated that genes coding for products relevant to bacterial surface structures and pathogenesis are prone to natural selective pressure, presumably due to their potential roles in the avoidance of the porcine immune system. Conclusions Overall, substantial genetic evidence was shown to indicate that recombination and positive selection indeed play a crucial role in the adaptive evolution of A. pleuropneumoniae. The genome-wide profile of positively selected genes and/or amino acid residues will provide valuable targets for further research into the mechanisms of immune evasion and host-pathogen interactions for this serious swine pathogen.
Toro Miguel A
Full Text Available Abstract Estimation of non-additive genetic effects in animal breeding is important because it increases the accuracy of breeding value prediction and the value of mate allocation procedures. With the advent of genomic selection these ideas should be revisited. The objective of this study was to quantify the efficiency of including dominance effects and practising mating allocation under a whole-genome evaluation scenario. Four strategies of selection, carried out during five generations, were compared by simulation techniques. In the first scenario (MS, individuals were selected based on their own phenotypic information. In the second (GSA, they were selected based on the prediction generated by the Bayes A method of whole-genome evaluation under an additive model. In the third (GSD, the model was expanded to include dominance effects. These three scenarios used random mating to construct future generations, whereas in the fourth one (GSD + MA, matings were optimized by simulated annealing. The advantage of GSD over GSA ranges from 9 to 14% of the expected response and, in addition, using mate allocation (GSD + MA provides an additional response ranging from 6% to 22%. However, mate selection can improve the expected genetic response over random mating only in the first generation of selection. Furthermore, the efficiency of genomic selection is eroded after a few generations of selection, thus, a continued collection of phenotypic data and re-evaluation will be required.
Liu, Huiming; Sørensen, Anders Christian; Meuwissen, Theo H E
Background Genomic selection makes it possible to reduce pedigree-based inbreeding over best linear unbiased prediction (BLUP) by increasing emphasis on own rather than family information. However, pedigree inbreeding might not accurately reflect the loss of genetic variation and the true level...... of inbreeding due to changes in allele frequencies and hitch-hiking. This study aimed at understanding the impact of using long-term genomic selection on changes in allele frequencies, genetic variation and the level of inbreeding. Methods Selection was performed in simulated scenarios with a population of 400......-BLUP, Genomic BLUP and Bayesian Lasso. Changes in allele frequencies at QTL, markers and linked neutral loci were investigated for the different selection criteria and different scenarios, along with the loss of favourable alleles and the rate of inbreeding measured by pedigree and runs of homozygosity. Results...
Gorjanc, Gregor; Bijma, Piter; Hickey, John M
Reliability is an important parameter in breeding. It measures the precision of estimated breeding values (EBV) and, thus, potential response to selection on those EBV. The precision of EBV is commonly measured by relating the prediction error variance (PEV) of EBV to the base population additive genetic variance (base PEV reliability), while the potential for response to selection is commonly measured by the squared correlation between the EBV and breeding values (BV) on selection candidates (reliability of selection). While these two measures are equivalent for unselected populations, they are not equivalent for selected populations. The aim of this study was to quantify the effect of selection on these two measures of reliability and to show how this affects comparison of breeding programs using pedigree-based or genomic evaluations. Two scenarios with random and best linear unbiased prediction (BLUP) selection were simulated, where the EBV of selection candidates were estimated using only pedigree, pedigree and phenotype, genome-wide marker genotypes and phenotype, or only genome-wide marker genotypes. The base PEV reliabilities of these EBV were compared to the corresponding reliabilities of selection. Realized genetic selection intensity was evaluated to quantify the potential of selection on the different types of EBV and, thus, to validate differences in reliabilities. Finally, the contribution of different underlying processes to changes in additive genetic variance and reliabilities was quantified. The simulations showed that, for selected populations, the base PEV reliability substantially overestimates the reliability of selection of EBV that are mainly based on old information from the parental generation, as is the case with pedigree-based prediction. Selection on such EBV gave very low realized genetic selection intensities, confirming the overestimation and importance of genotyping both male and female selection candidates. The two measures of
Zhao, Fuping; McParland, Sinead; Kearney, Francis; Du, Lixin; Berry, Donagh P
Artificial selection for economically important traits in cattle is expected to have left distinctive selection signatures on the genome. Access to high-density genotypes facilitates the accurate identification of genomic regions that have undergone positive selection. These findings help to better elucidate the mechanisms of selection and to identify candidate genes of interest to breeding programs. Information on 705 243 autosomal single nucleotide polymorphisms (SNPs) in 3122 dairy and beef male animals from seven cattle breeds (Angus, Belgian Blue, Charolais, Hereford, Holstein-Friesian, Limousin and Simmental) were used to detect selection signatures by applying two complementary methods, integrated haplotype score (iHS) and global fixation index (FST). To control for false positive results, we used false discovery rate (FDR) adjustment to calculate adjusted iHS within each breed and the genome-wide significance level was about 0.003. Using the iHS method, 83, 92, 91, 101, 85, 101 and 86 significant genomic regions were detected for Angus, Belgian Blue, Charolais, Hereford, Holstein-Friesian, Limousin and Simmental cattle, respectively. None of these regions was common to all seven breeds. Using the FST approach, 704 individual SNPs were detected across breeds. Annotation of the regions of the genome that showed selection signatures revealed several interesting candidate genes i.e. DGAT1, ABCG2, MSTN, CAPN3, FABP3, CHCHD7, PLAG1, JAZF1, PRKG2, ACTC1, TBC1D1, GHR, BMP2, TSG1, LYN, KIT and MC1R that play a role in milk production, reproduction, body size, muscle formation or coat color. Fifty-seven common candidate genes were found by both the iHS and global FST methods across the seven breeds. Moreover, many novel genomic regions and genes were detected within the regions that showed selection signatures; for some candidate genes, signatures of positive selection exist in the human genome. Multilevel bioinformatic analyses of the detected candidate genes
Nielsen, Rasmus; Bustamente, Carlos; Clark, Andrew G.
of these genes may be driven by genomic conflict due to apoptosis during spermatogenesis. Genes with maximal expression in the brain show little or no evidence for positive selection, while genes with maximal expression in the testis tend to be enriched with positively selected genes. Genes on the X chromosome...... such evolutionary changes to leave a noticeable signature throughout the genome. We here compare 13,731 annotated genes from humans to their chimpanzee orthologs to identify genes that show evidence of positive selection. Many of the genes that present a signature of positive selection tend to be involved...... in sensory perception or immune defenses. However, the group of genes that show the strongest evidence for positive selection also includes a surprising number of genes involved in tumor suppression and apoptosis, and of genes involved in spermatogenesis. We hypothesize that positive selection in some...
McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F
Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.
Full Text Available Abstract Background Current commercial high-density oligonucleotide microarrays can hold millions of probe spots on a single microscopic glass slide and are ideal for studying the transcriptome of microbial genomes using a tiling probe design. This paper describes a comprehensive computational pipeline implemented specifically for designing tiling probe sets to study microbial transcriptome profiles. Results The pipeline identifies every possible probe sequence from both forward and reverse-complement strands of all DNA sequences in the target genome including circular or linear chromosomes and plasmids. Final probe sequence lengths are adjusted based on the maximal oligonucleotide synthesis cycles and best isothermality allowed. Optimal probes are then selected in two stages - sequential and gap-filling. In the sequential stage, probes are selected from sequence windows tiled alongside the genome. In the gap-filling stage, additional probes are selected from the largest gaps between adjacent probes that have already been selected, until a predefined number of probes is reached. Selection of the highest quality probe within each window and gap is based on five criteria: sequence uniqueness, probe self-annealing, melting temperature, oligonucleotide length, and probe position. Conclusions The probe selection pipeline evaluates global and local probe sequence properties and selects a set of probes dynamically and evenly distributed along the target genome. Unique to other similar methods, an exact number of non-redundant probes can be designed to utilize all the available probe spots on any chosen microarray platform. The pipeline can be applied to microbial genomes when designing high-density tiling arrays for comparative genomics, ChIP chip, gene expression and comprehensive transcriptome studies.
Rubin, Carl-Johan; Megens, Hendrik-Jan; Barrio, Alvaro Martinez;
or white-spotted pigs, carrying the Dominant white, Patch, or Belt alleles. This discovery illustrates how structural changes have contributed to rapid phenotypic evolution in domestic animals and how alleles in domestic animals may evolve by the accumulation of multiple causative mutations as a response....... We found an excess of derived nonsynonymous substitutions in domestic pigs, most likely reflecting both positive selection and relaxed purifying selection after domestication. Our analysis of structural variation revealed four duplications at the KIT locus that were exclusively present in white...
Gertz, M; Edel, C; Ruß, I; Dodenhoff, J; Götz, K-U; Thaller, G
The aim of our study was to compare different validation methods with respect to their impact on validation results and to evaluate the feasibility of genomic selection in the German Landrace population of the Bavarian herdbook. For this purpose, a sample of 337 boars and 1,676 sows was genotyped with the Illumina PorcineSNP60 BeadChip. Conventional BLUP breeding values for fertility, growth, carcass, and quality traits were deregressed and used as phenotypes in genomic BLUP. The resulting genomic breeding values were also blended with information from the full conventional breeding value estimation to include information from nongenotyped parents. Subsequent validation used forward prediction, realized reliabilities, and theoretical reliabilities. The results indicate that the validation methods showed a relatively large effect on in the displayed reliability levels in our study: forward prediction reliabilities were found to be much lower than the conventional parent-average reliabilities whereas corresponding realized and theoretical reliabilities were found substantially greater. Theoretical reliabilities appear to be the most consistent validation approach tested in our study, because they avoid the use of proxy variables. Generally, our results suggest a substantial potential for a genomic selection implementation for the Bavarian herdbook by using both sows and boars. Theoretical genomic reliabilities of direct genomic values of selection candidates were, on average, 31 to 36% greater than the conventional parent average reliabilities. However, the inclusion of residual information from conventional breeding values had only a marginal effect on reliabilities.
Comeron, Josep M
The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future
Josep M Comeron
Full Text Available The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS. Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and
Background Interspecific hybridization occurs in every eukaryotic kingdom. While hybrid progeny are frequently at a selective disadvantage, in some instances their increased genome size and complexity may result in greater stress resistance than their ancestors, which can be adaptively advantageous at the edges of their ancestors' ranges. While this phenomenon has been repeatedly documented in the field, the response of hybrid populations to long-term selection has not often been explored in the lab. To fill this knowledge gap we crossed the two most distantly related members of the Saccharomyces sensu stricto group, S. cerevisiae and S. uvarum, and established a mixed population of homoploid and aneuploid hybrids to study how different types of selection impact hybrid genome structure. Results As temperature was raised incrementally from 31°C to 46.5°C over 500 generations of continuous culture, selection favored loss of the S. uvarum genome, although the kinetics of genome loss differed among independent replicates. Temperature-selected isolates exhibited greater inherent and induced thermal tolerance than parental species and founding hybrids, and also exhibited ethanol resistance. In contrast, as exogenous ethanol was increased from 0% to 14% over 500 generations of continuous culture, selection favored euploid S. cerevisiae x S. uvarum hybrids. Ethanol-selected isolates were more ethanol tolerant than S. uvarum and one of the founding hybrids, but did not exhibit resistance to temperature stress. Relative to parental and founding hybrids, temperature-selected strains showed heritable differences in cell wall structure in the forms of increased resistance to zymolyase digestion and Micafungin, which targets cell wall biosynthesis. Conclusions This is the first study to show experimentally that the genomic fate of newly-formed interspecific hybrids depends on the type of selection they encounter during the course of evolution, underscoring the importance of
Piotrowski Jeff S
Full Text Available Abstract Background Interspecific hybridization occurs in every eukaryotic kingdom. While hybrid progeny are frequently at a selective disadvantage, in some instances their increased genome size and complexity may result in greater stress resistance than their ancestors, which can be adaptively advantageous at the edges of their ancestors' ranges. While this phenomenon has been repeatedly documented in the field, the response of hybrid populations to long-term selection has not often been explored in the lab. To fill this knowledge gap we crossed the two most distantly related members of the Saccharomyces sensu stricto group, S. cerevisiae and S. uvarum, and established a mixed population of homoploid and aneuploid hybrids to study how different types of selection impact hybrid genome structure. Results As temperature was raised incrementally from 31°C to 46.5°C over 500 generations of continuous culture, selection favored loss of the S. uvarum genome, although the kinetics of genome loss differed among independent replicates. Temperature-selected isolates exhibited greater inherent and induced thermal tolerance than parental species and founding hybrids, and also exhibited ethanol resistance. In contrast, as exogenous ethanol was increased from 0% to 14% over 500 generations of continuous culture, selection favored euploid S. cerevisiae x S. uvarum hybrids. Ethanol-selected isolates were more ethanol tolerant than S. uvarum and one of the founding hybrids, but did not exhibit resistance to temperature stress. Relative to parental and founding hybrids, temperature-selected strains showed heritable differences in cell wall structure in the forms of increased resistance to zymolyase digestion and Micafungin, which targets cell wall biosynthesis. Conclusions This is the first study to show experimentally that the genomic fate of newly-formed interspecific hybrids depends on the type of selection they encounter during the course of evolution
Full Text Available Abstract Background It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information. Methods The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identity-by-descent relationship matrix (averaged over all marker loci. The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data. Results We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree. Conclusions Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identity-by-state information. Although, in principle, genomic selection based on identity-by-state does not require
Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren
GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).
Full Text Available GRAbB (Genomic Region Assembly by Baiting is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome, extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a, as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04, Fedora (23, CentOS (7.1.1503 and Mac OS X (10.7. Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/.
Evans, Luke M [West Virginia University, Morgantown; Slavov, Gancho [West Virginia University, Morgantown; Rodgers-Melnick, Eli [West Virginia University, Morgantown; Martin, Joel [U.S. Department of Energy, Joint Genome Institute; Ranjan, Priya [ORNL; Muchero, Wellington [ORNL; Brunner, Amy M. [Virginia Polytechnic Institute and State University; Schackwitz, Wendy [U.S. Department of Energy, Joint Genome Institute; Gunter, Lee E [ORNL; Chen, Jay [ORNL; Tuskan, Gerald A [ORNL; Difazio, Stephen P. [West Virginia University, Morgantown
Forest trees are dominant components of terrestrial ecosystems that have global ecological and economic importance. Despite distributions that span wide environmental gradients, many tree populations are locally adapted, and mechanisms underlying this adaptation are poorly understood. Here we use a combination of whole-genome selection scans and association analyses of 544 Populus trichocarpa trees to reveal genomic bases of adaptive variation across a wide latitudinal range. Three hundred ninety-seven genomic regions showed evidence of recent positive and/or divergent selection and enrichment for associations with adaptive traits that also displayed patterns consistent with natural selection. These regions also provide unexpected insights into the evolutionary dynamics of duplicated genes and their roles in adaptive trait variation.
Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas;
across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease....... breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary...... to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed...
Full Text Available The detection of signatures of selection is now possible on a genome-wide scale in many plant and animal species, and can be performed in a population-specific manner due to the wealth of per-population genome-wide genotype data that is available. With genomic regions that exhibit evidence of selection having been shown to be enriched for genes associated with biologically important traits, detection of selective pressure is emerging as an additional approach for identifying novel gene-trait associations. While high-density genotype data is now relatively easy to obtain, for many researchers it is not immediately obvious how to go about identifying signatures of selection in these data sets. Here we describe a basic workflow, constructed from open source tools, for detecting and examining evidence of selection in genomic data. Code to install and implement the pipeline components, and instructions to run a basic analysis using the workflow described here, can be downloaded from our public GitHub repository:http://www.github.com/smilefreak/selectionTools/
on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to the consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
Full Text Available Infectious diseases and epidemics have always accompanied and characterized human history, representing one of the main causes of death. Even today, despite progress in sanitation and medical research, infections are estimated to account for about 15% of deaths. The hypothesis whereby infectious diseases have been acting as a powerful selective pressure was formulated long ago, but it was not until the availability of large-scale genetic data and the development of novel methods to study molecular evolution that we could assess how pervasively infectious agents have shaped human genetic diversity. Indeed, recent evidences indicated that among the diverse environmental factors that acted as selective pressures during the evolution of our species, pathogen load had the strongest influence. Beside the textbook example of the major histocompatibility complex, selection signatures left by pathogen-exerted pressure can be identified at several human loci, including genes not directly involved in immune response. In the future, high-throughput technologies and the availability of genetic data from different populations are likely to provide novel insights into the evolutionary relationships between the human host and its pathogens. Hopefully, this will help identify the genetic determinants modulating the susceptibility to infectious diseases and will translate into new treatment strategies.
Andersen, Kristian G; Shylakhter, Ilya; Tabrizi, Shervin; Grossman, Sharon R; Happi, Christian T; Sabeti, Pardis C
Rapidly evolving viruses and other pathogens can have an immense impact on human evolution as natural selection acts to increase the prevalence of genetic variants providing resistance to disease. With the emergence of large datasets of human genetic variation, we can search for signatures of natural selection in the human genome driven by such disease-causing microorganisms. Based on this approach, we have previously hypothesized that Lassa virus (LASV) may have been a driver of natural selection in West African populations where Lassa haemorrhagic fever is endemic. In this study, we provide further evidence for this notion. By applying tests for selection to genome-wide data from the International Haplotype Map Consortium and the 1000 Genomes Consortium, we demonstrate evidence for positive selection in LARGE and interleukin 21 (IL21), two genes implicated in LASV infectivity and immunity. We further localized the signals of selection, using the recently developed composite of multiple signals method, to introns and putative regulatory regions of those genes. Our results suggest that natural selection may have targeted variants giving rise to alternative splicing or differential gene expression of LARGE and IL21. Overall, our study supports the hypothesis that selective pressures imposed by LASV may have led to the emergence of particular alleles conferring resistance to Lassa fever, and opens up new avenues of research pursuit.
Ryu, J; Lee, C
Selection signals of Korean cattle might be attributed largely to artificial selection for meat quality. Rapidly increased intragenic markers of newly annotated genes in the bovine genome would help overcome limited findings of genetic markers associated with meat quality at the selection signals in a previous study. The present study examined genetic associations of marbling score (MS) with intragenic nucleotide variants at selection signals of Korean cattle. A total of 39 092 nucleotide variants of 407 Korean cattle were utilized in the association analysis. A total of 129 variants were selected within newly annotated genes in the bovine genome. Their genetic associations were analyzed using the mixed model with random polygenic effects based on identical-by-state genetic relationships among animals in order to control for spurious associations produced by population structure. Genetic associations of MS were found (PCSPG4). In particular, the genetic associations with CDC42BPA and LARGE were confirmed using an independent data set of Korean cattle. The results implied that allele frequencies of functional variants and their proximity variants have been augmented by directional selection for greater MS and remain selection signals in the bovine genome. Further studies of fine mapping would be useful to incorporate favorable alleles in marker-assisted selection for MS of Korean cattle.
Krochko Joan E
Full Text Available Abstract Background Currently, there is little data available regarding the role of gender-specific gene expression on synonymous codon usage (translational selection in most organisms, and particularly plants. Using gender-specific EST libraries (with > 4000 ESTs from Zea mays and Triticum aestivum, we assessed whether gender-specific gene expression per se and gender-specific gene expression level are associated with selection on codon usage. Results We found clear evidence of a greater bias in codon usage for genes expressed in female than in male organs and gametes, based on the variation in GC content at third codon positions and the frequency of species-preferred codons. This finding holds true for both highly and for lowly expressed genes. In addition, we found that highly expressed genes have greater codon bias than lowly expressed genes for both female- and male-specific genes. Moreover, in both species, genes with female-specific expression show a greater usage of species-specific preferred codons for each of the 18 amino acids having synonymous codons. A supplemental analysis of Brassica napus suggests that bias in codon usage could also be higher in genes expressed in male gametophytic tissues than in heterogeneous (flower tissues. Conclusion This study reports gender-specific bias in codon usage in plants. The findings reported here, based on the analysis of 1 497 876 codons, are not caused either by differences in the biological functions of the genes or by differences in protein lengths, nor are they likely attributable to mutational bias. The data are best explained by gender-specific translational selection. Plausible explanations for these findings and the relevance to these and other organisms are discussed.
Full Text Available The genome sequence of apple (Malus×domestica Borkh. was published more than a year ago, which helped develop an 8K SNP chip to assist in implementing genomic selection (GS. In apple breeding programmes, GS can be used to obtain genomic breeding values (GEBV for choosing next-generation parents or selections for further testing as potential commercial cultivars at a very early stage. Thus GS has the potential to accelerate breeding efficiency significantly because of decreased generation interval or increased selection intensity. We evaluated the accuracy of GS in a population of 1120 seedlings generated from a factorial mating design of four females and two male parents. All seedlings were genotyped using an Illumina Infinium chip comprising 8,000 single nucleotide polymorphisms (SNPs, and were phenotyped for various fruit quality traits. Random-regression best liner unbiased prediction (RR-BLUP and the Bayesian LASSO method were used to obtain GEBV, and compared using a cross-validation approach for their accuracy to predict unobserved BLUP-BV. Accuracies were very similar for both methods, varying from 0.70 to 0.90 for various fruit quality traits. The selection response per unit time using GS compared with the traditional BLUP-based selection were very high (>100% especially for low-heritability traits. Genome-wide average estimated linkage disequilibrium (LD between adjacent SNPs was 0.32, with a relatively slow decay of LD in the long range (r(2 = 0.33 and 0.19 at 100 kb and 1,000 kb respectively, contributing to the higher accuracy of GS. Distribution of estimated SNP effects revealed involvement of large effect genes with likely pleiotropic effects. These results demonstrated that genomic selection is a credible alternative to conventional selection for fruit quality traits.
Full Text Available Genomic selection uses genome-wide marker information to predict breeding values for traits of economic interest, and is more accurate than pedigree-based methods. The development of high density SNP arrays for Atlantic salmon has enabled genomic selection in selective breeding programs, alongside high-resolution association mapping of the genetic basis of complex traits. However, in sibling testing schemes typical of salmon breeding programs, trait records are available on many thousands of fish with close relationships to the selection candidates. Therefore, routine high density SNP genotyping may be prohibitively expensive. One means to reducing genotyping cost is the use of genotype imputation, where selected key animals (e.g., breeding program parents are genotyped at high density, and the majority of individuals (e.g., performance tested fish and selection candidates are genotyped at much lower density, followed by imputation to high density. The main objectives of the current study were to assess the feasibility and accuracy of genotype imputation in the context of a salmon breeding program. The specific aims were: (i to measure the accuracy of genotype imputation using medium (25 K and high (78 K density mapped SNP panels, by masking varying proportions of the genotypes and assessing the correlation between the imputed genotypes and the true genotypes; and (ii to assess the efficacy of imputed genotype data in genomic prediction of key performance traits (sea lice resistance and body weight. Imputation accuracies of up to 0.90 were observed using the simple two-generation pedigree dataset, and moderately high accuracy (0.83 was possible even with very low density SNP data (∼250 SNPs. The performance of genomic prediction using imputed genotype data was comparable to using true genotype data, and both were superior to pedigree-based prediction. These results demonstrate that the genotype imputation approach used in this study can
Tsai, Hsin-Yuan; Matika, Oswald; Edwards, Stefan McKinnon; Antolín–Sánchez, Roberto; Hamilton, Alastair; Guy, Derrick R.; Tinch, Alan E.; Gharbi, Karim; Stear, Michael J.; Taggart, John B.; Bron, James E.; Hickey, John M.; Houston, Ross D.
Genomic selection uses genome-wide marker information to predict breeding values for traits of economic interest, and is more accurate than pedigree-based methods. The development of high density SNP arrays for Atlantic salmon has enabled genomic selection in selective breeding programs, alongside high-resolution association mapping of the genetic basis of complex traits. However, in sibling testing schemes typical of salmon breeding programs, trait records are available on many thousands of fish with close relationships to the selection candidates. Therefore, routine high density SNP genotyping may be prohibitively expensive. One means to reducing genotyping cost is the use of genotype imputation, where selected key animals (e.g., breeding program parents) are genotyped at high density, and the majority of individuals (e.g., performance tested fish and selection candidates) are genotyped at much lower density, followed by imputation to high density. The main objectives of the current study were to assess the feasibility and accuracy of genotype imputation in the context of a salmon breeding program. The specific aims were: (i) to measure the accuracy of genotype imputation using medium (25 K) and high (78 K) density mapped SNP panels, by masking varying proportions of the genotypes and assessing the correlation between the imputed genotypes and the true genotypes; and (ii) to assess the efficacy of imputed genotype data in genomic prediction of key performance traits (sea lice resistance and body weight). Imputation accuracies of up to 0.90 were observed using the simple two-generation pedigree dataset, and moderately high accuracy (0.83) was possible even with very low density SNP data (∼250 SNPs). The performance of genomic prediction using imputed genotype data was comparable to using true genotype data, and both were superior to pedigree-based prediction. These results demonstrate that the genotype imputation approach used in this study can provide a cost
Genetic regulation is a key component in development, but a clear understanding of the structure and dynamics of genetic networks is not yet at hand. In this work we investigate these properties within an artificial genome model originally introduced by Reil. We analyze statistical properties of randomly generated genomes both on the sequence- and network level, and show that this model correctly predicts the frequency of genes in genomes as found in experimental data. Using an evolutionary algorithm based on stabilizing selection for a phenotype, we show that robustness against single base mutations, as well as against random changes in initial network states that mimic stochastic fluctuations in environmental conditions, can emerge in parallel. Evolved genomes exhibit characteristic patterns on both sequence and network level.
Full Text Available Abstract Background Modern dog breeds display traits that are either breed-specific or shared by a few breeds as a result of genetic bottlenecks during the breed creation process and artificial selection for breed standards. Selective sweeps in the genome result from strong selection and can be detected as a reduction or elimination of polymorphism in a given region of the genome. Results Extended regions of homozygosity, indicative of selective sweeps, were identified in a genome-wide scan dataset of 25 Boxers from the United Kingdom genotyped at ~20,000 single-nucleotide polymorphisms (SNPs. These regions were further examined in a second dataset of Boxers collected from a different geographical location and genotyped using higher density SNP arrays (~170,000 SNPs. A selective sweep previously associated with canine brachycephaly was detected on chromosome 1. A novel selective sweep of over 8 Mb was observed on chromosome 26 in Boxer and for a shorter region in English and French bulldogs. It was absent in 171 samples from eight other dog breeds and 7 Iberian wolf samples. A region of extended increased heterozygosity on chromosome 9 overlapped with a previously reported copy number variant (CNV which was polymorphic in multiple dog breeds. Conclusion A selective sweep of more than 8 Mb on chromosome 26 was identified in the Boxer genome. This sweep is likely caused by strong artificial selection for a trait of interest and could have inadvertently led to undesired health implications for this breed. Furthermore, we provide supporting evidence for two previously described regions: a selective sweep on chromosome 1 associated with canine brachycephaly and a CNV on chromosome 9 polymorphic in multiple dog breeds.
Llorenc Cabrera-Bosquet; José Crossa; Jarislav von Zitzewitz; Maria Dolors Serret; José Luis Araus
Genomic selection (GS) and high-throughput phenotyping have recently been captivating the interest of the crop breeding community from both the public and private sectors world-wide.Both approaches promise to revolutionize the prediction of complex traits,including growth,yield and adaptation to stress.Whereas high-throughput phenotyping may help to improve understanding of crop physiology,most powerful techniques for high-throughput field phenotyping are empirical rather than analytical and comparable to genomic selection.Despite the fact that the two methodological approaches represent the extremes of what is understood as the breeding process (phenotype versus genome),they both consider the targeted traits (e.g.grain yield,growth,phenology,plant adaptation to stress) as a black box instead of dissecting them as a set of secondary traits (i.e.physiological) putatively related to the target trait.Both GS and high-throughput phenotyping have in common their empirical approach enabling breeders to use genome profile or phenotype without understanding the underlying biology.This short review discusses the main aspects of both approaches and focuses on the case of genomic selection of maize flowering traits and near-infrared spectroscopy (NIRS) and plant spectral reflectance as high-throughput field phenotyping methods for complex traits such as crop growth and yield.
Somavilla, A L; Sonstegard, T S; Higa, R H; Rosa, A N; Siqueira, F; Silva, L O C; Torres Júnior, R A A; Coutinho, L L; Mudadu, M A; Alencar, M M; Regitano, L C A
Brazilian Nellore cattle (Bos indicus) have been selected for growth traits for over more than four decades. In recent years, reproductive and meat quality traits have become more important because of increasing consumption, exports and consumer demand. The identification of genome regions altered by artificial selection can potentially permit a better understanding of the biology of specific phenotypes that are useful for the development of tools designed to increase selection efficiency. Therefore, the aims of this study were to detect evidence of recent selection signatures in Nellore cattle using extended haplotype homozygosity methodology and BovineHD marker genotypes (>777,000 single nucleotide polymorphisms) as well as to identify corresponding genes underlying these signals. Thirty-one significant regions (P meat quality, fatty acid profiles and immunity. In addition, 545 genes were identified in regions harboring selection signatures. Within this group, 58 genes were associated with growth, muscle and adipose tissue metabolism, reproductive traits or the immune system. Using relative extended haplotype homozygosity to analyze high-density single nucleotide polymorphism marker data allowed for the identification of regions potentially under artificial selection pressure in the Nellore genome, which might be used to better understand autozygosity and the effects of selection on the Nellore genome.
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Elsen, J M
In conventional small ruminant breeding programs, only pedigree and phenotype records are used to make selection decisions but prospects of including genomic information are now under consideration. The objective of this study was to assess the potential benefits of genomic selection on the genetic gain in French sheep and goat breeding designs of today. Traditional and genomic scenarios were modeled with deterministic methods for 3 breeding programs. The models included decisional variables related to male selection candidates, progeny testing capacity, and economic weights that were optimized to maximize annual genetic gain (AGG) of i) a meat sheep breeding program that improved a meat trait of heritability (h(2)) = 0.30 and a maternal trait of h(2) = 0.09 and ii) dairy sheep and goat breeding programs that improved a milk trait of h(2) = 0.30. Values of ±0.20 of genetic correlation between meat and maternal traits were considered to study their effects on AGG. The Bulmer effect was accounted for and the results presented here are the averages of AGG after 10 generations of selection. Results showed that current traditional breeding programs provide an AGG of 0.095 genetic standard deviation (σa) for meat and 0.061 σa for maternal trait in meat breed and 0.147 σa and 0.120 σa in sheep and goat dairy breeds, respectively. By optimizing decisional variables, the AGG with traditional selection methods increased to 0.139 σa for meat and 0.096 σa for maternal traits in meat breeding programs and to 0.174 σa and 0.183 σa in dairy sheep and goat breeding programs, respectively. With a medium-sized reference population (nref) of 2,000 individuals, the best genomic scenarios gave an AGG that was 17.9% greater than with traditional selection methods with optimized values of decisional variables for combined meat and maternal traits in meat sheep, 51.7% in dairy sheep, and 26.2% in dairy goats. The superiority of genomic schemes increased with the size of the
Calus, M.P.L.; Bijma, P.; Veerkamp, R.F.
Our objective was to investigate the economic effect of prioritizing heifers for replacement at the herd level based on genomic estimated breeding values, and to compute break-even genotyping costs across a wide range of scenarios. Specifically, we aimed to determine the optimal proportion of presel
K. Jun Tong
Full Text Available Genomes evolve through a combination of mutation, drift, and selection, all of which act heterogeneously across genes and lineages. This leads to differences in branch-length patterns among gene trees. Genes that yield trees with the same branch-length patterns can be grouped together into clusters. Here, we propose a novel phylogenetic approach to explain the factors that influence the number and distribution of these gene-tree clusters. We apply our method to a genomic dataset from insects, an ancient and diverse group of organisms. We find some evidence that when drift is the dominant evolutionary process, each cluster tends to contain a large number of fast-evolving genes. In contrast, strong negative selection leads to many distinct clusters, each of which contains only a few slow-evolving genes. Our work, although preliminary in nature, illustrates the use of phylogenetic methods to shed light on the factors driving rate variation in genomic evolution.
C4 photosynthesis is nature's response to CO2 limitations, and evolved recurrently in several groups of plants. To identify genes related to C4 photosynthesis, Huang et al. looked for evidence of past episodes of adaptive evolution in the genomes of C4 grasses. They identified a large number of candidate genes that evolved under divergent selection, indicating that, besides alterations to expression patterns, the history of C4 involved strong selection on protein-coding sequences.
Willoughby, Janna R; Ivy, Jamie A; Lacy, Robert C; Doyle, Jacqueline M; DeWoody, J Andrew
Captive breeding programs are often initiated to prevent species extinction until reintroduction into the wild can occur. However, the evolution of captive populations via inbreeding, drift, and selection can impair fitness, compromising reintroduction programs. To better understand the evolutionary response of species bred in captivity, we used nearly 5500 single nucleotide polymorphisms (SNPs) in populations of white-footed mice (Peromyscus leucopus) to measure the impact of breeding regimes on genomic diversity. We bred mice in captivity for 20 generations using two replicates of three protocols: random mating (RAN), selection for docile behaviors (DOC), and minimizing mean kinship (MK). The MK protocol most effectively retained genomic diversity and reduced the effects of selection. Additionally, genomic diversity was significantly related to fitness, as assessed with pedigrees and SNPs supported with genomic sequence data. Because captive-born individuals are often less fit in wild settings compared to wild-born individuals, captive-estimated fitness correlations likely underestimate the effects in wild populations. Therefore, minimizing inbreeding and selection in captive populations is critical to increasing the probability of releasing fit individuals into the wild.
Full Text Available Abstract Missing genotypes are a common feature of high density SNP datasets obtained using SNP chip technology and this is likely to decrease the accuracy of genomic selection. This problem can be circumvented by imputing the missing genotypes with estimated genotypes. When implementing imputation, the criteria used for SNP data quality control and whether to perform imputation before or after data quality control need to consider. In this paper, we compared six strategies of imputation and quality control using different imputation methods, different quality control criteria and by changing the order of imputation and quality control, against a real dataset of milk production traits in Chinese Holstein cattle. The results demonstrated that, no matter what imputation method and quality control criteria were used, strategies with imputation before quality control performed better than strategies with imputation after quality control in terms of accuracy of genomic selection. The different imputation methods and quality control criteria did not significantly influence the accuracy of genomic selection. We concluded that performing imputation before quality control could increase the accuracy of genomic selection, especially when the rate of missing genotypes is high and the reference population is small.
Wang, Lei; Madsen, Per; Sapp, Robyn
H-BLUP uses a variance-covariance structure based on a combined relationship matrix (H), which augments a pedigree-based relationship matrix (A) with a genomic relationship matrix (G) for genotyped individuals. In practice, often only preselected individuals are genotyped and this selective genot...
The beef cattle production in Latin America in very important on a worldwide scale and for several regional countries. The region accounts for 29% of the world cattle population and beef production. Genomic selection allows the estimation of breeding values in animals for young animals from DNA samp...
The Korean Hanwoo cattle have been intensively selected for production traits, especially high intramuscular fat content. It is believed that ancient crossings between different breeds contributed to forming the Hanwoo, but little is known about the genomic differences and similarities between other...
Bin Wang; Jing Liu; Liang Jin; Xue-Ying Feng; Jian-Qun Chen
Mutation and selection are two major forces causing codon usage biases. How these two forces influence the codon usages in green plant mitochondrial genomes has not been well investigated. In the present study, we surveyed five bryophyte mitochondrial genomes to reveal their codon usagepatterns as well as the determining forces. Three interesting findings were made. First, comparing to Chara vulgaris, an algal species sister to all extant land plants, bryophytes have more G, C-ending codon usages in their mitochondrial genes. This is consistent with the generally higher genomic GC content in bryophyte mitochondria, suggesting an increased mutational pressure toward GC. Second, as indicated by Wright's Nc-GC3s plot, mutation, not selection, is the major force affecting codon usages of bryophyte mitochondrial genes. However, the real mutational dynamics seem very complex. Context-dependent analysis indicated that nucleotide at the 2nd codon position would slightly affect synonymous codon choices. Finally, in bryophyte mitochondria, tRNA genes would apply a weak selection force to finetune the synonymous codon frequencies, as revealed by data of Ser4-Pro-Thr-Val families. In summary,complex mutation and weak selection together determined the codon usages in bryophyte mitochondrial genomes.
Atanur, Santosh S.; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R.; Kaisaki, Pamela J.; Otto, Georg W.; Ma, Man Chun John; Keane, Thomas M.; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R.; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J.; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J.
Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and ins
Atanur, S.S.; Diaz, A.G.; Maratou, K.; Sarkis, A.; Rotival, M.; Game, L.; Tschannen, M.R.; Kaisaki, P.J.; Otto, G.W.; Ma, M.C.; Keane, T.M.; Hummel, O.; Saar, K.; Chen, W.; Guryev, V.; Gopalakrishnan, K.; Garrett, M.R.; Joe, B.; Citterio, L.; Bianchi, G.; McBride, M.; Dominiczak, A.; Adams, D.J.; Serikawa, T.; Flicek, P.; Cuppen, E.; Hubner, N.; Petretto, E.; Gauguier, D.; Kwitek, A.; Jacob, H.; Aitman, T.J.
Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and ins
Full Text Available Since the divergence of humans and chimpanzees about 5 million years ago, these species have undergone a remarkable evolution with drastic divergence in anatomy and cognitive abilities. At the molecular level, despite the small overall magnitude of DNA sequence divergence, we might expect such evolutionary changes to leave a noticeable signature throughout the genome. We here compare 13,731 annotated genes from humans to their chimpanzee orthologs to identify genes that show evidence of positive selection. Many of the genes that present a signature of positive selection tend to be involved in sensory perception or immune defenses. However, the group of genes that show the strongest evidence for positive selection also includes a surprising number of genes involved in tumor suppression and apoptosis, and of genes involved in spermatogenesis. We hypothesize that positive selection in some of these genes may be driven by genomic conflict due to apoptosis during spermatogenesis. Genes with maximal expression in the brain show little or no evidence for positive selection, while genes with maximal expression in the testis tend to be enriched with positively selected genes. Genes on the X chromosome also tend to show an elevated tendency for positive selection. We also present polymorphism data from 20 Caucasian Americans and 19 African Americans for the 50 annotated genes showing the strongest evidence for positive selection. The polymorphism analysis further supports the presence of positive selection in these genes by showing an excess of high-frequency derived nonsynonymous mutations.
Martin G Elferink
Full Text Available Identifying genomics regions that are affected by selection is important to understand the domestication and selection history of the domesticated chicken, as well as understanding molecular pathways underlying phenotypic traits and breeding goals. While whole-genome approaches, either high-density SNP chips or massively parallel sequencing, have been successfully applied to identify evidence for selective sweeps in chicken, it has been difficult to distinguish patterns of selection and stochastic and breed specific effects. Here we present a study to identify selective sweeps in a large number of chicken breeds (67 in total using a high-density (58 K SNP chip. We analyzed commercial chickens representing all major breeding goals. In addition, we analyzed non-commercial chicken diversity for almost all recognized traditional Dutch breeds and a selection of representative breeds from China. Based on their shared history or breeding goal we in silico grouped the breeds into 14 breed groups. We identified 396 chromosomal regions that show suggestive evidence of selection in at least one breed group with 26 of these regions showing strong evidence of selection. Of these 26 regions, 13 were previously described and 13 yield new candidate genes for performance traits in chicken. Our approach demonstrates the strength of including many different populations with similar, and breed groups with different selection histories to reduce stochastic effects based on single populations.
Elferink, Martin G; Megens, Hendrik-Jan; Vereijken, Addie; Hu, Xiaoxiang; Crooijmans, Richard P M A; Groenen, Martien A M
Identifying genomics regions that are affected by selection is important to understand the domestication and selection history of the domesticated chicken, as well as understanding molecular pathways underlying phenotypic traits and breeding goals. While whole-genome approaches, either high-density SNP chips or massively parallel sequencing, have been successfully applied to identify evidence for selective sweeps in chicken, it has been difficult to distinguish patterns of selection and stochastic and breed specific effects. Here we present a study to identify selective sweeps in a large number of chicken breeds (67 in total) using a high-density (58 K) SNP chip. We analyzed commercial chickens representing all major breeding goals. In addition, we analyzed non-commercial chicken diversity for almost all recognized traditional Dutch breeds and a selection of representative breeds from China. Based on their shared history or breeding goal we in silico grouped the breeds into 14 breed groups. We identified 396 chromosomal regions that show suggestive evidence of selection in at least one breed group with 26 of these regions showing strong evidence of selection. Of these 26 regions, 13 were previously described and 13 yield new candidate genes for performance traits in chicken. Our approach demonstrates the strength of including many different populations with similar, and breed groups with different selection histories to reduce stochastic effects based on single populations.
Mariana F Nery
Full Text Available Cetaceans are unique in being the only mammals completely adapted to an aquatic environment. This adaptation has required complex changes and sometimes a complete restructuring of physiology, behavior and morphology. Identifying genes that have been subjected to selection pressure during cetacean evolution would greatly enhance our knowledge of the ways in which genetic variation in this mammalian order has been shaped by natural selection. Here, we performed a genome-wide scan for positive selection in the dolphin lineage. We employed models of codon substitution that account for variation of selective pressure over branches on the tree and across sites in a sequence. We analyzed 7,859 nuclear-coding ortholog genes and using a series of likelihood ratio tests (LRTs, we identified 376 genes (4.8% with molecular signatures of positive selection in the dolphin lineage. We used the cow as the sister group and compared estimates of selection in the cetacean genome to this using the same methods. This allowed us to define which genes have been exclusively under positive selection in the dolphin lineage. The enrichment analysis found that the identified positively selected genes are significantly over-represented for three exclusive functional categories only in the dolphin lineage: segment specification, mesoderm development and system development. Of particular interest for cetacean adaptation to an aquatic life are the following GeneOntology targets under positive selection: genes related to kidney, heart, lung, eye, ear and nervous system development.
Full Text Available The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain
Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail. PMID:27255426
Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain
Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail.
Full Text Available Deedu (DU Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR, neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1, as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG. Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1 shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments.
Full Text Available BACKGROUND: Eukaryotic genomes are scattered with retroelements that proliferate through retrotransposition. Although retroelements make up around 40 percent of the human genome, large regions are found to be completely devoid of retroelements. This has been hypothesised to be a result of genomic regions being intolerant to insertions of retroelements. The inadvertent transcriptional activity of retroelements may affect neighbouring genes, which in turn could be detrimental to an organism. We speculate that such retroelement transcription, or transcriptional interference, is a contributing factor in generating and maintaining retroelement-free regions in the human genome. METHODOLOGY/PRINCIPAL FINDINGS: Based on the known transcriptional properties of retroelements, we expect long interspersed elements (LINEs to be able to display a high degree of transcriptional interference. In contrast, we expect short interspersed elements (SINEs to display very low levels of transcriptional interference. We find that genomic regions devoid of long interspersed elements (LINEs are enriched for protein-coding genes, but that this is not the case for regions devoid of short interspersed elements (SINEs. This is expected if genes are subject to selection against transcriptional interference. We do not find microRNAs to be associated with genomic regions devoid of either SINEs or LINEs. We further observe an increased relative activity of genes overlapping LINE-free regions during early embryogenesis, where activity of LINEs has been identified previously. CONCLUSIONS/SIGNIFICANCE: Our observations are consistent with the notion that selection against transcriptional interference has contributed to the maintenance and/or generation of retroelement-free regions in the human genome.
Charles E Chapple
Full Text Available BACKGROUND: Selenoproteins are a diverse family of proteins notable for the presence of the 21st amino acid, selenocysteine. Until very recently, all metazoan genomes investigated encoded selenoproteins, and these proteins had therefore been believed to be essential for animal life. Challenging this assumption, recent comparative analyses of insect genomes have revealed that some insect genomes appear to have lost selenoprotein genes. METHODOLOGY/PRINCIPAL FINDINGS: In this paper we investigate in detail the fate of selenoproteins, and that of selenoprotein factors, in all available arthropod genomes. We use a variety of in silico comparative genomics approaches to look for known selenoprotein genes and factors involved in selenoprotein biosynthesis. We have found that five insect species have completely lost the ability to encode selenoproteins and that selenoprotein loss in these species, although so far confined to the Endopterygota infraclass, cannot be attributed to a single evolutionary event, but rather to multiple, independent events. Loss of selenoproteins and selenoprotein factors is usually coupled to the deletion of the entire no-longer functional genomic region, rather than to sequence degradation and consequent pseudogenisation. Such dynamics of gene extinction are consistent with the high rate of genome rearrangements observed in Drosophila. We have also found that, while many selenoprotein factors are concomitantly lost with the selenoproteins, others are present and conserved in all investigated genomes, irrespective of whether they code for selenoproteins or not, suggesting that they are involved in additional, non-selenoprotein related functions. CONCLUSIONS/SIGNIFICANCE: Selenoproteins have been independently lost in several insect species, possibly as a consequence of the relaxation in insects of the selective constraints acting across metazoans to maintain selenoproteins. The dispensability of selenoproteins in insects may
Mallick, Swapan; Gnerre, Sante; Muller, Paul; Reich, David
Several studies have found evidence for more positive selection on the chimpanzee lineage compared with the human lineage since the two species split. A potential concern, however, is that these findings may simply reflect artifacts of the data: inaccuracies in the underlying chimpanzee genome sequence, which is of lower quality than human. To test this hypothesis, we generated de novo genome assemblies of chimpanzee and macaque and aligned them with human. We also implemented a novel bioinformatic procedure for producing alignments of closely related species that uses synteny information to remove misassembled and misaligned regions, and sequence quality scores to remove nucleotides that are less reliable. We applied this procedure to re-examine 59 genes recently identified as candidates for positive selection in chimpanzees. The great majority of these signals disappear after application of our new bioinformatic procedure. We also carried out laboratory-based resequencing of 10 of the regions in multiple chimpanzees and humans, and found that our alignments were correct wherever there was a conflict with the published results. These findings throw into question previous findings that there has been more positive selection in chimpanzees than in humans since the two species diverged. Our study also highlights the challenges of searching the extreme tails of distributions for signals of natural selection. Inaccuracies in the genome sequence at even a tiny fraction of genes can produce false-positive signals, which make it difficult to identify loci that have genuinely been targets of selection.
Carolina M. Voloch
Full Text Available Lyssavirus is a diverse genus of viruses that infect a variety of mammalian hosts, typically causing encephalitis. The evolution of this lineage, particularly the rabies virus, has been a focus of research because of the extensive occurrence of cross-species transmission, and the distinctive geographical patterns present throughout the diversification of these viruses. Although numerous studies have examined pattern-related questions concerning Lyssavirus evolution, analyses of the evolutionary processes acting on Lyssavirus diversification are scarce. To clarify the relevance of positive natural selection in Lyssavirus diversification, we conducted a comprehensive scan for episodic diversifying selection across all lineages and codon sites of the five coding regions in lyssavirus genomes. Although the genomes of these viruses are generally conserved, the glycoprotein (G, RNA-dependent RNA polymerase (L and polymerase (P genes were frequently targets of adaptive evolution during the diversification of the genus. Adaptive evolution is particularly manifest in the glycoprotein gene, which was inferred to have experienced the highest density of positively selected codon sites along branches. Substitutions in the L gene were found to be associated with the early diversification of phylogroups. A comparison between the number of positively selected sites inferred along the branches of RABV population branches and Lyssavirus intespecies branches suggested that the occurrence of positive selection was similar on the five coding regions of the genome in both groups.
Castillo-Juárez, Héctor; Campos-Montes, Gabriel R.; Caballero-Zamora, Alejandra; Montaldo, Hugo H.
The uses of breeding programs for the Pacific white shrimp [Penaeus (Litopenaeus) vannamei] based on mixed linear models with pedigreed data are described. The application of these classic breeding methods yielded continuous progress of great value to increase the profitability of the shrimp industry in several countries. Recent advances in such areas as genomics in shrimp will allow for the development of new breeding programs in the near future that will increase genetic progress. In particular, these novel techniques may help increase disease resistance to specific emerging diseases, which is today a very important component of shrimp breeding programs. Thanks to increased selection accuracy, simulated genetic advance using genomic selection for survival to a disease challenge was up to 2.6 times that of phenotypic sib selection. PMID:25852740
Full Text Available The use of breeding programs for the Pacific white shrimp (Penaeus (Litopenaeus vannamei based on mixed linear models with pedigreed data are described. The application of these classic breeding methods yielded continuous progress of great value to increase the profitability of the shrimp industry in several countries. Recent advances in such areas as genomics in shrimp will allow for the development of new breeding programs in the near future that will increase genetic progress. In particular, these novel techniques may help increase disease resistance to specific emerging diseases, which is today a very important component of shrimp breeding programs. Thanks to increased selection accuracy, simulated genetic advance using genomic selection for survival to a disease challenge was up to 2.6 times that of phenotypic sib selection.
Sergey I Nikolaev
Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.
Genova, Antonio; Goossens, Sander; Lemoine, Frank G.; Mazarico, Erwan; Smith, David E.; Zuber, Maria T.
The Mars Global Surveyor (MGS), Mars Odyssey (ODY), and Mars Reconnaissance Orbiter (MRO) missions have enabled NASA to conduct reconnaissance and exploration of Mars from orbit for sixteen consecutive years. These radio systems on these spacecraft enabled radio science in orbit around Mars to improve the knowledge of the static structure of the Martian gravitational field. The continuity of the radio tracking data, which cover more than a solar cycle, also provides useful information to characterize the temporal variability of the gravity field, relevant to the planet's internal dynamics and the structure and dynamics of the atmosphere . MGS operated for more than 7 years, between 1999 and 2006, in a frozen sun-synchronous, near-circular, polar orbit with the periapsis at approximately 370 km altitude. ODY and MRO have been orbiting Mars in two separate sun-synchronous orbits at different local times and altitudes. ODY began its mapping phase in 2002 with the periapis at approximately 390 km altitude and 4-5pm Local Solar Time (LST), whereas the MRO science mission started in November 2006 with the periapis at approximately 255 km altitude and 3pm LST. The 16 years of radio tracking data provide useful information on the atmospheric density in the Martian upper atmosphere. We used ODY and MRO radio data to recover the long-term periodicity of the major atmospheric constituents -- CO2, O, and He -- at the orbit altitudes of these two spacecraft . The improved atmospheric model provides a better prediction of the annual and semi-annual variability of the dominant species. Therefore, the inclusion of the recovered model leads to improved orbit determination and an improved gravity field model of Mars with MGS, ODY, and MRO radio tracking data.
Emile R Chimusa
Full Text Available We report a study of genome-wide, dense SNP (∼ 900K and copy number polymorphism data of indigenous southern Africans. We demonstrate the genetic contribution to southern and eastern African populations, which involved admixture between indigenous San, Niger-Congo-speaking and populations of Eurasian ancestry. This finding illustrates the need to account for stratification in genome-wide association studies, and that admixture mapping would likely be a successful approach in these populations. We developed a strategy to detect the signature of selection prior to and following putative admixture events. Several genomic regions show an unusual excess of Niger-Kordofanian, and unusual deficiency of both San and Eurasian ancestry, which were considered the footprints of selection after population admixture. Several SNPs with strong allele frequency differences were observed predominantly between the admixed indigenous southern African populations, and their ancestral Eurasian populations. Interestingly, many candidate genes, which were identified within the genomic regions showing signals for selection, were associated with southern African-specific high-risk, mostly communicable diseases, such as malaria, influenza, tuberculosis, and human immunodeficiency virus/AIDs. This observation suggests a potentially important role that these genes might have played in adapting to the environment. Additionally, our analyses of haplotype structure, linkage disequilibrium, recombination, copy number variation and genome-wide admixture highlight, and support the unique position of San relative to both African and non-African populations. This study contributes to a better understanding of population ancestry and selection in south-eastern African populations; and the data and results obtained will support research into the genetic contributions to infectious as well as non-communicable diseases in the region.
Full Text Available Abstract Background Chlamydia trachomatis is an obligate intracellular bacterial parasite, which causes several severe and debilitating diseases in humans. This study uses comparative genomic analyses of 12 complete published C. trachomatis genomes to assess the contribution of recombination and selection in this pathogen and to understand the major evolutionary forces acting on the genome of this bacterium. Results The conserved core genes of C. trachomatis are a large proportion of the pan-genome: we identified 836 core genes in C. trachomatis out of a range of 874-927 total genes in each genome. The ratio of recombination events compared to mutation (ρ/θ was 0.07 based on ancestral reconstructions using the ClonalFrame tool, but recombination had a significant effect on genetic diversification (r/m = 0.71. The distance-dependent decay of linkage disequilibrium also indicated that C. trachomatis populations behaved intermediately between sexual and clonal extremes. Fifty-five genes were identified as having a history of recombination and 92 were under positive selection based on statistical tests. Twenty-three genes showed evidence of being under both positive selection and recombination, which included genes with a known role in virulence and pathogencity (e.g., ompA, pmps, tarp. Analysis of inter-clade recombination flux indicated non-uniform currents of recombination between clades, which suggests the possibility of spatial population structure in C. trachomatis infections. Conclusions C. trachomatis is the archetype of a bacterial species where recombination is relatively frequent yet gene gains by horizontal gene transfer (HGT and losses (by deletion are rare. Gene conversion occurs at sites across the whole C. trachomatis genome but may be more often fixed in genes that are under diversifying selection. Furthermore, genome sequencing will reveal patterns of serotype specific gene exchange and selection that will generate important
Morabito, D.; Butman, S.; Shambayati, S.
The Mars Global Surveyor (MGS) spacecraft, launched on November 7, 1996, carries an experimental space-to-ground telecommunications link at Ka-band (32 GHz) along with the primary X-band (8.4-GHz) downlink. The signals are simultaneously transmitted from a 1.5-m-diameter parabolic antenna on MGS and received by a beam-waveguide (BWG) research and development (R&D) 34-meter a ntenna located in NASA's Goldstone Deep Space Network (DSN) complex near Barstow, California. This Ka-band link experiment (KaBLE-II) allows the performances of the Ka-band and X-band signals to be compared under nearly identical conditions. The two signals have been regularly tracked during the past 2 years. This article presents carrier-signal-level data (P_c/N_o) for both X-band and Ka-band acquired over a wide range of station elevation angles, weather conditions, and solar elongation angles. The cruise phase of the mission covered the period from launch (November 7, 1996) to Mars orbit capture (September 12, 1997). Since September 12, 1997, MGS has been in orbit around Mars. The measurements confirm that Ka-band could increase data capacity by at least a factor of three (5 dB) as compared with X-band. During May 1998, the solar corona experiment, in which the effects of solar plasma on the X-band and Ka-band links were studied, was conducted. In addition, frequency and difference frequency (f_x - f_(Ka)/3.8), ranging, and telemetry data results are presented. MGS/KaBLE-II measured signal strengths (for 54 percent of the experiments conducted) that were in reasonable agreement with predicted values based on preflight knowledge, and frequency residuals that agreed between bands and whose statistics were consistent with expected noise sources. For passes in which measured signal strengths disagreed with predicted values, the problems were traced to known deficiencies, for example, equipment operating under certain conditions, such as a cold Ka-band solid-state power amplifier (SSPA
Gerke, Justin P; Edwards, Jode W; Guill, Katherine E; Ross-Ibarra, Jeffrey; McMullen, Michael D
Although maize is naturally an outcrossing organism, modern breeding utilizes highly inbred lines in controlled crosses to produce hybrids. The U.S. Department of Agriculture's reciprocal recurrent selection experiment between the Iowa Stiff Stalk Synthetic (BSSS) and the Iowa Corn Borer Synthetic No. 1 (BSCB1) populations represents one of the longest running experiments to understand the response to selection for hybrid performance. To investigate the genomic impact of this selection program, we genotyped the progenitor lines and >600 individuals across multiple cycles of selection using a genome-wide panel of ∼40,000 SNPs. We confirmed previous results showing a steady temporal decrease in genetic diversity within populations and a corresponding increase in differentiation between populations. Thanks to detailed historical information on experimental design, we were able to perform extensive simulations using founder haplotypes to replicate the experiment in the absence of selection. These simulations demonstrate that while most of the observed reduction in genetic diversity can be attributed to genetic drift, heterozygosity in each population has fallen more than expected. We then took advantage of our high-density genotype data to identify extensive regions of haplotype fixation and trace haplotype ancestry to single founder inbred lines. The vast majority of regions showing such evidence of selection differ between the two populations, providing evidence for the dominance model of heterosis. We discuss how this pattern is likely to occur during selection for hybrid performance and how it poses challenges for dissecting the impacts of modern breeding and selection on the maize genome.
Full Text Available Abstract Background Mammalian genomes consist of regions differing in GC content, referred to as isochores or GC-content domains. The scientific debate is still open as to whether such compositional heterogeneity is a selected or neutral trait. Results Here we analyze SNP allele frequencies, retrotransposon insertion polymorphisms (RIPs, as well as fixed substitutions accumulated in the human lineage since its divergence from chimpanzee to indicate that biased gene conversion (BGC has been playing a role in within-genome GC content variation. Yet, a distinct contribution to GC content evolution is accounted for by a selective process. Accordingly, we searched for independent evidences that GC content distribution does not conform to neutral expectations. Indeed, after correcting for possible biases, we show that intron GC content and size display isochore-specific correlations. Conclusion We consider that the more parsimonious explanation for our results is that GC content is subjected to the action of both weak selection and BGC in the human genome with features such as nucleosome positioning or chromatin conformation possibly representing the final target of selective processes. This view might reconcile previous contrasting findings and add some theoretical background to recent evidences suggesting that GC content domains display different behaviors with respect to highly regulated biological processes such as developmentally-stage related gene expression and programmed replication timing during neural stem cell differentiation.
Kwong, Qi Bin; Ong, Ai Ling; Teh, Chee Keng; Chew, Fook Tim; Tammi, Martti; Mayes, Sean; Kulaveerasingam, Harikrishna; Yeoh, Suat Hui; Harikrishna, Jennifer Ann; Appleton, David Ross
Genomic selection (GS) uses genome-wide markers to select individuals with the desired overall combination of breeding traits. A total of 1,218 individuals from a commercial population of Ulu Remis x AVROS (UR x AVROS) were genotyped using the OP200K array. The traits of interest included: shell-to-fruit ratio (S/F, %), mesocarp-to-fruit ratio (M/F, %), kernel-to-fruit ratio (K/F, %), fruit per bunch (F/B, %), oil per bunch (O/B, %) and oil per palm (O/P, kg/palm/year). Genomic heritabilities of these traits were estimated to be in the range of 0.40 to 0.80. GS methods assessed were RR-BLUP, Bayes A (BA), Cπ (BC), Lasso (BL) and Ridge Regression (BRR). All methods resulted in almost equal prediction accuracy. The accuracy achieved ranged from 0.40 to 0.70, correlating with the heritability of traits. By selecting the most important markers, RR-BLUP B has the potential to outperform other methods. The marker density for certain traits can be further reduced based on the linkage disequilibrium (LD). Together with in silico breeding, GS is now being used in oil palm breeding programs to hasten parental palm selection.
Frank M. You
Full Text Available Flax is an important economic crop for seed oil and stem fiber. Phenotyping of traits such as seed yield, seed quality, stem fiber yield, and quality characteristics is expensive and time consuming. Genomic selection (GS refers to a breeding approach aimed at selecting preferred individuals based on genomic estimated breeding values predicted by a statistical model based on the relationship between phenotypes and genome-wide genetic markers. We evaluated the prediction accuracy of GS (rMP and the efficiency of GS relative to phenotypic selection (RE for three GS models: ridge regression best linear unbiased prediction (RR-BLUP, Bayesian LASSO (BL, and Bayesian ridge regression (BRR, for seed yield, oil content, iodine value, linoleic, and linolenic acid content with a full and a common set of genome-wide simple sequence repeat markers in each of three biparental populations. The three GS models generated similar rMP and RE, while BRR displayed a higher coefficient of determination (R2 of the fitted models than did RR-BLUP or BL. The mean rMP and RE varied for traits with different heritabilities and was affected by the genetic variation of the traits in the populations. GS for seed yield generated a mean RE of 1.52 across populations and marker sets, a value significantly superior to that for direct phenotypic selection. Our empirical results provide the first validation of GS in flax and demonstrate that GS could increase genetic gain per unit time for linseed breeding. Further studies for selection of training populations and markers are warranted.
Nellåker, Christoffer; Keane, Thomas M; Yalcin, Binnaz; Wong, Kim; Agam, Avigail; Belgard, T Grant; Flint, Jonathan; Adams, David J; Frankel, Wayne N; Ponting, Chris P
Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.
Pilot, M; Greco, C; vonHoldt, B M; Jędrzejewska, B; Randi, E; Jędrzejewski, W; Sidorovich, V E; Ostrander, E A; Wayne, R K
Genomic resources developed for domesticated species provide powerful tools for studying the evolutionary history of their wild relatives. Here we use 61K single-nucleotide polymorphisms (SNPs) evenly spaced throughout the canine nuclear genome to analyse evolutionary relationships among the three largest European populations of grey wolves in comparison with other populations worldwide, and investigate genome-wide effects of demographic bottlenecks and signatures of selection. European wolves have a discontinuous range, with large and connected populations in Eastern Europe and relatively smaller, isolated populations in Italy and the Iberian Peninsula. Our results suggest a continuous decline in wolf numbers in Europe since the Late Pleistocene, and long-term isolation and bottlenecks in the Italian and Iberian populations following their divergence from the Eastern European population. The Italian and Iberian populations have low genetic variability and high linkage disequilibrium, but relatively few autozygous segments across the genome. This last characteristic clearly distinguishes them from populations that underwent recent drastic demographic declines or founder events, and implies long-term bottlenecks in these two populations. Although genetic drift due to spatial isolation and bottlenecks seems to be a major evolutionary force diversifying the European populations, we detected 35 loci that are putatively under diversifying selection. Two of these loci flank the canine platelet-derived growth factor gene, which affects bone growth and may influence differences in body size between wolf populations. This study demonstrates the power of population genomics for identifying genetic signals of demographic bottlenecks and detecting signatures of directional selection in bottlenecked populations, despite their low background variability.
Linkeviciute, Viktorija; Rackham, Owen J L; Gough, Julian; Oates, Matt E; Fang, Hai
To help evaluate how protein function impacts on genome evolution, we introduce a new concept of 'architecture plasticity potential' - the capacity to form distinct domain architectures - both for an individual domain, or more generally for a set of domains grouped by shared function. We devise a scoring metric to measure the plasticity potential for these domain sets, and evaluate how function has changed over time for different species. Applying this metric to a phylogenetic tree of eukaryotic genomes, we find that the involvement of each function is not random but highly selective. For certain lineages there is strong bias for evolution to involve domains related to certain functions. In general eukaryotic genomes, particularly animals, expand complex functional activities such as signalling and regulation, but at the cost of reducing metabolic processes. We also observe differential evolution of transcriptional regulation and a unique evolutionary role of channel regulators; crucially this is only observable in terms of the architecture plasticity potential. Our findings provide a new layer of information to understand the significance of function in eukaryotic genome evolution. A web search tool, available at http://supfam.org/Pevo, offers a wide spectrum of options for exploring functional importance in eukaryotic genome evolution.
Yang, Xin; Wang, Lixia; Chen, Hongmei; Feng, Hanli; Shen, Bang; Hu, Min; Fang, Rui
In the present study, we sequenced and analyzed the mitochondrial (mt) genome of Gastrothylax crumenifer and compared it with other selected trematodes. The full mt genome of G. crumenifer was amplified, sequenced, assembled, analyzed and then subjected to phylogenetic analysis. The complete mt genome of G. crumenifer is 14,801 bp in length and contains two rRNA genes, two non-coding regions (LNR and SNR), 12 protein-coding genes, and 22 transfer RNA genes. The gene organization of the G. crumenifer mt genome is the same as that of other trematodes, except for Schistosoma haematobium and Schistosoma spindale. All the genes are transcribed in the same direction and rich in "A + T", which is in accordance with other trematodes, such as Fasciola hepatica, Paramphistomum cervi, and Fischoederius elongatus. Phylogenetic analysis using concatenated amino acid sequences of the 12 protein-coding genes showed that G. crumenifer is closely related to F. elongatus. The availability of mt genome sequence of G. crumenifer can provide useful DNA markers for studying the molecular epidemiology and population genetics of this parasite and other paramphistomes.
Cros, David; Denis, Marie; Sánchez, Leopoldo; Cochard, Benoit; Flori, Albert; Durand-Gasselin, Tristan; Nouy, Bruno; Omoré, Alphonse; Pomiès, Virginie; Riou, Virginie; Suryana, Edyana; Bouvet, Jean-Marc
Genomic selection empirically appeared valuable for reciprocal recurrent selection in oil palm as it could account for family effects and Mendelian sampling terms, despite small populations and low marker density. Genomic selection (GS) can increase the genetic gain in plants. In perennial crops, this is expected mainly through shortened breeding cycles and increased selection intensity, which requires sufficient GS accuracy in selection candidates, despite often small training populations. Our objective was to obtain the first empirical estimate of GS accuracy in oil palm (Elaeis guineensis), the major world oil crop. We used two parental populations involved in conventional reciprocal recurrent selection (Deli and Group B) with 131 individuals each, genotyped with 265 SSR. We estimated within-population GS accuracies when predicting breeding values of non-progeny-tested individuals for eight yield traits. We used three methods to sample training sets and five statistical methods to estimate genomic breeding values. The results showed that GS could account for family effects and Mendelian sampling terms in Group B but only for family effects in Deli. Presumably, this difference between populations originated from their contrasting breeding history. The GS accuracy ranged from -0.41 to 0.94 and was positively correlated with the relationship between training and test sets. Training sets optimized with the so-called CDmean criterion gave the highest accuracies, ranging from 0.49 (pulp to fruit ratio in Group B) to 0.94 (fruit weight in Group B). The statistical methods did not affect the accuracy. Finally, Group B could be preselected for progeny tests by applying GS to key yield traits, therefore increasing the selection intensity. Our results should be valuable for breeding programs with small populations, long breeding cycles, or reduced effective size.
Natural selection, as the driving force of human evolution, has direct impact on population differentiation. However, it is still unclear to what extent the genetic differentiation has been caused by natural selection. To explore this question, we performed a genome-wide scan with single nucleotide polymorphism (SNP) data from the International HapMap Project. Single locus FST analysis was applied to assess the frequency difference among populations in autosomes. Based on the empirical distribution of FST, we identified 12669 SNPs correlating to population differentiation and 1853 candidate genes subjected to geographic restricted natural selection. Further interpretation of gene ontogeny revealed 121 categories of biological process with the enrichments of candidate genes. Our results suggest that natural selection may play an important role in human population differentiation. In addition, our analysis provides new clues as well as research methods for our understanding of population differentiation and natural selection.
Kadarmideen, Haja; Do, Duy Ngoc
growth will increase the demand for food as well as animal products, particularly in emerging economic giants like India. Moreover, the urbanization has considerable impact on patterns of food consumption in general and on demand for livestock products, in particular and the increased income growth led......Global livestock production has increased substantially during the last decades, in both number of animals and productivity. Meanwhile, the human population is projected to reach 9.6 billions by 2050 and most of the increase in the projection takes place in developing countries. Rapid population...... to more expenditure on livestock products. Since livestock production in developed countries has well adopted livestock genomic selection tools to improve both productivity and quality of animal products, opportunities to increase productivity in developing countries via genomic tools/selection have...
Platzer, Alexander; Zhang, Qingrun; Vilhjálmsson, Bjarni J; Korte, Arthur; Nizhynska, Viktoria; Voronin, Viktor; Korte, Pamela; Sedman, Laura; Mandáková, Terezie; Lysak, Martin A; Seren, Ümit; Hellmann, Ines; Nordborg, Magnus
Despite advances in sequencing, the goal of obtaining a comprehensive view of genetic variation in populations is still far from reached. We sequenced 180 lines of A. thaliana from Sweden to obtain as complete a picture as possible of variation in a single region. Whereas simple polymorphisms in the unique portion of the genome are readily identified, other polymorphisms are not. The massive variation in genome size identified by flow cytometry seems largely to be due to 45S rDNA copy number variation, with lines from northern Sweden having particularly large numbers of copies. Strong selection is evident in the form of long-range linkage disequilibrium (LD), as well as in LD between nearby compensatory mutations. Many footprints of selective sweeps were found in lines from northern Sweden, and a massive global sweep was shown to have involved a 700-kb transposition. PMID:23793030
Kober, Kord M; Pogson, Grant H
Comparative genomics studies investigating the signals of positive selection among groups of closely related species are still rare and limited in taxonomic breadth. Such studies show great promise in advancing our knowledge about the proportion and the identity of genes experiencing diversifying selection. However, methodological challenges have led to high levels of false positives in past studies. Here, we use the well-annotated genome of the purple sea urchin, Strongylocentrotus purpuratus, as a reference to investigate the signals of positive selection at 6520 single-copy orthologs from nine sea urchin species belonging to the family Strongylocentrotidae paying careful attention to minimizing false positives. We identified 1008 (15.5%) candidate positive selection genes (PSGs). Tests for positive selection along the nine terminal branches of the phylogeny identified 824 genes that showed lineage-specific adaptive diversification (1.67% of branch-sites tests performed). Positively selected codons were not enriched at exon borders or near regions containing missing data, suggesting a limited contribution of false positives caused by alignment or annotation errors. Alignments were validated at 10 loci with re-sequencing using Sanger methods. No differences were observed in the rates of synonymous substitution (d S), GC content, and codon bias between the candidate PSGs and those not showing positive selection. However, the candidate PSGs had 68% higher rates of nonsynonymous substitution (d N) and 33% lower levels of heterozygosity, consistent with selective sweeps and opposite to that expected by a relaxation of selective constraint. Although positive selection was identified at reproductive proteins and innate immunity genes, the strongest signals of adaptive diversification were observed at extracellular matrix proteins, cell adhesion molecules, membrane receptors, and ion channels. Many candidate PSGs have been widely implicated as targets of pathogen binding
Miyaoka, Yuichiro; Chan, Amanda H; Judge, Luke M; Yoo, Jennie; Huang, Miller; Nguyen, Trieu D; Lizarraga, Paweena P; So, Po-Lin; Conklin, Bruce R
Precise editing of human genomes in pluripotent stem cells by homology-driven repair of targeted nuclease-induced cleavage has been hindered by the difficulty of isolating rare clones. We developed an efficient method to capture rare mutational events, enabling isolation of mutant lines with single-base substitutions without antibiotic selection. This method facilitates efficient induction or reversion of mutations associated with human disease in isogenic human induced pluripotent stem cells.
Genome-wide microarray analysis (Affymetrix array) was used (i) to determine whether only one gene, the cytochrome P450 enzyme Cyp6g1, is differentially transcribed in dichlorodiphenyltrichloroethane (DDT)-resistant vs. -susceptible Drosophila; and (ii) to profile common genes differentially transcribed across a DDT-resistant field isolate [Rst(2)DDTWisconsin] and a laboratory DDT-selected population [Rst(2)DDT91-R]. Statistical analysis (ANOVA model) identified 158 probe sets that were diffe...
Prabha, Ratna; Singh, Dhananjaya P; Rai, Anil
Genome analysis of thermophilic cyanobacterium, Thermosynechococcus elongatus BP-1 revealed factors ruling choices of codons in this organism. Multiple parameters like Nc, GC3s, RSCU, Codon Adaptation Index (CAI), optimal and rare codons, codon-pair context and amino acid usage were analysed and compositional constraint was identified as major factor. Wide range of Nc values for the same GC3 content suggested the role of translational selection. Mutational bias is suggested at synonymous position. Among optimal codons for translation, most were GC-ending. Seven codons (AGA, AGG, AUA, UAA, UAG, UCA and UGA) were found to have least occurrence in the entire genome and except stop codons all were A-ending (exception AGG). Most widely used codon-pair in the genome are G-ending or C-ending and A-ending or U-ending codons make pair with G-ending or C-ending codons. Amino acids which are largely distributed in T. elongatus tend to use G-ending or C-ending codons most frequently. Findings showed cumulative role of translational selection, translational accuracy and gene expression levels with mutational bias as key player in codon selection pattern of this organism.
Glebes, Tirzah Y; Sandoval, Nicholas R; Gillis, Jacob H; Gill, Ryan T
Engineering both feedstock and product tolerance is important for transitioning towards next-generation biofuels derived from renewable sources. Tolerance to chemical inhibitors typically results in complex phenotypes, for which multiple genetic changes must often be made to confer tolerance. Here, we performed a genome-wide search for furfural-tolerant alleles using the TRackable Multiplex Recombineering (TRMR) method (Warner et al. (2010), Nature Biotechnology), which uses chromosomally integrated mutations directed towards increased or decreased expression of virtually every gene in Escherichia coli. We employed various growth selection strategies to assess the role of selection design towards growth enrichments. We also compared genes with increased fitness from our TRMR selection to those from a previously reported genome-wide identification study of furfural tolerance genes using a plasmid-based genomic library approach (Glebes et al. (2014) PLOS ONE). In several cases, growth improvements were observed for the chromosomally integrated promoter/RBS mutations but not for the plasmid-based overexpression constructs. Through this assessment, four novel tolerance genes, ahpC, yhjH, rna, and dicA, were identified and confirmed for their effect on improving growth in the presence of furfural.
Jobson Richard W
Full Text Available Abstract Background The C↔U substitution types of RNA editing have been observed frequently in organellar genomes of land plants. Although various attempts have been made to explain why such a seemingly inefficient genetic mechanism would have evolved, no satisfactory explanation exists in our view. In this study, we examined editing patterns in chloroplast genomes of the hornwort Anthoceros formosae and the fern Adiantum capillus-veneris and in mitochondrial genomes of the angiosperms Arabidopsis thaliana, Beta vulgaris and Oryza sativa, to gain an understanding of the question of how RNA editing originated. Results We found that 1 most editing sites were distributed at the 2nd and 1st codon positions, 2 editing affected codons that resulted in larger hydrophobicity and molecular size changes much more frequently than those with little change involved, 3 editing uniformly increased protein hydrophobicity, 4 editing occurred more frequently in ancestrally T-rich sequences, which were more abundant in genes encoding membrane-bound proteins with many hydrophobic amino acids than in genes encoding soluble proteins, and 5 editing occurred most often in genes found to be under strong selective constraint. Conclusion These analyses show that editing mostly affects functionally important and evolutionarily conserved codon positions, codons and genes encoding membrane-bound proteins. In particular, abundance of RNA editing in plant organellar genomes may be associated with disproportionately large percentages of genes in these two genomes that encode membrane-bound proteins, which are rich in hydrophobic amino acids and selectively constrained. These data support a hypothesis that natural selection imposed by protein functional constraints has contributed to selective fixation of certain editing sites and maintenance of the editing activity in plant organelles over a period of more than four hundred millions years. The retention of genes encoding RNA
Henry S Gibbons
Full Text Available BACKGROUND: Despite the decades-long use of Bacillus atrophaeus var. globigii (BG as a simulant for biological warfare (BW agents, knowledge of its genome composition is limited. Furthermore, the ability to differentiate signatures of deliberate adaptation and selection from natural variation is lacking for most bacterial agents. We characterized a lineage of BGwith a long history of use as a simulant for BW operations, focusing on classical bacteriological markers, metabolic profiling and whole-genome shotgun sequencing (WGS. RESULTS: Archival strains and two "present day" type strains were compared to simulant strains on different laboratory media. Several of the samples produced multiple colony morphotypes that differed from that of an archival isolate. To trace the microevolutionary history of these isolates, we obtained WGS data for several archival and present-day strains and morphotypes. Bacillus-wide phylogenetic analysis identified B. subtilis as the nearest neighbor to B. atrophaeus. The genome of B. atrophaeus is, on average, 86% identical to B. subtilis on the nucleotide level. WGS of variants revealed that several strains were mixed but highly related populations and uncovered a progressive accumulation of mutations among the "military" isolates. Metabolic profiling and microscopic examination of bacterial cultures revealed enhanced growth of "military" isolates on lactate-containing media, and showed that the "military" strains exhibited a hypersporulating phenotype. CONCLUSIONS: Our analysis revealed the genomic and phenotypic signatures of strain adaptation and deliberate selection for traits that were desirable in a simulant organism. Together, these results demonstrate the power of whole-genome and modern systems-level approaches to characterize microbial lineages to develop and validate forensic markers for strain discrimination and reveal signatures of deliberate adaptation.
Full Text Available Genome-wide molecular markers are often being used to evaluate genetic diversity in germplasm collections and for making genomic selections in breeding programs. To accurately predict phenotypes and assay genetic diversity, molecular markers should assay a representative sample of the polymorphisms in the population under study. Ascertainment bias arises when marker data is not obtained from a random sample of the polymorphisms in the population of interest. Genotyping-by-sequencing (GBS is rapidly emerging as a low-cost genotyping platform, even for the large, complex, and polyploid wheat (Triticum aestivum L. genome. With GBS, marker discovery and genotyping occur simultaneously, resulting in minimal ascertainment bias. The previous platform of choice for whole-genome genotyping in many species such as wheat was DArT (Diversity Array Technology and has formed the basis of most of our knowledge about cereals genetic diversity. This study compared GBS and DArT marker platforms for measuring genetic diversity and genomic selection (GS accuracy in elite U.S. soft winter wheat. From a set of 365 breeding lines, 38,412 single nucleotide polymorphism GBS markers were discovered and genotyped. The GBS SNPs gave a higher GS accuracy than 1,544 DArT markers on the same lines, despite 43.9% missing data. Using a bootstrap approach, we observed significantly more clustering of markers and ascertainment bias with DArT relative to GBS. The minor allele frequency distribution of GBS markers had a deficit of rare variants compared to DArT markers. Despite the ascertainment bias of the DArT markers, GS accuracy for three traits out of four was not significantly different when an equal number of markers were used for each platform. This suggests that the gain in accuracy observed using GBS compared to DArT markers was mainly due to a large increase in the number of markers available for the analysis.
Jessica L Petersen
Full Text Available Intense selective pressures applied over short evolutionary time have resulted in homogeneity within, but substantial variation among, horse breeds. Utilizing this population structure, 744 individuals from 33 breeds, and a 54,000 SNP genotyping array, breed-specific targets of selection were identified using an F(ST-based statistic calculated in 500-kb windows across the genome. A 5.5-Mb region of ECA18, in which the myostatin (MSTN gene was centered, contained the highest signature of selection in both the Paint and Quarter Horse. Gene sequencing and histological analysis of gluteal muscle biopsies showed a promoter variant and intronic SNP of MSTN were each significantly associated with higher Type 2B and lower Type 1 muscle fiber proportions in the Quarter Horse, demonstrating a functional consequence of selection at this locus. Signatures of selection on ECA23 in all gaited breeds in the sample led to the identification of a shared, 186-kb haplotype including two doublesex related mab transcription factor genes (DMRT2 and 3. The recent identification of a DMRT3 mutation within this haplotype, which appears necessary for the ability to perform alternative gaits, provides further evidence for selection at this locus. Finally, putative loci for the determination of size were identified in the draft breeds and the Miniature horse on ECA11, as well as when signatures of selection surrounding candidate genes at other loci were examined. This work provides further evidence of the importance of MSTN in racing breeds, provides strong evidence for selection upon gait and size, and illustrates the potential for population-based techniques to find genomic regions driving important phenotypes in the modern horse.
Petersen, Jessica L.; Mickelson, James R.; Rendahl, Aaron K.; Valberg, Stephanie J.; Andersson, Lisa S.; Axelsson, Jeanette; Bailey, Ernie; Bannasch, Danika; Binns, Matthew M.; Borges, Alexandre S.; Brama, Pieter; da Câmara Machado, Artur; Capomaccio, Stefano; Cappelli, Katia; Cothran, E. Gus; Distl, Ottmar; Fox-Clipsham, Laura; Graves, Kathryn T.; Guérin, Gérard; Haase, Bianca; Hasegawa, Telhisa; Hemmann, Karin; Hill, Emmeline W.; Leeb, Tosso; Lindgren, Gabriella; Lohi, Hannes; Lopes, Maria Susana; McGivney, Beatrice A.; Mikko, Sofia; Orr, Nicholas; Penedo, M. Cecilia T.; Piercy, Richard J.; Raekallio, Marja; Rieder, Stefan; Røed, Knut H.; Swinburne, June; Tozaki, Teruaki; Vaudin, Mark; Wade, Claire M.; McCue, Molly E.
Intense selective pressures applied over short evolutionary time have resulted in homogeneity within, but substantial variation among, horse breeds. Utilizing this population structure, 744 individuals from 33 breeds, and a 54,000 SNP genotyping array, breed-specific targets of selection were identified using an FST-based statistic calculated in 500-kb windows across the genome. A 5.5-Mb region of ECA18, in which the myostatin (MSTN) gene was centered, contained the highest signature of selection in both the Paint and Quarter Horse. Gene sequencing and histological analysis of gluteal muscle biopsies showed a promoter variant and intronic SNP of MSTN were each significantly associated with higher Type 2B and lower Type 1 muscle fiber proportions in the Quarter Horse, demonstrating a functional consequence of selection at this locus. Signatures of selection on ECA23 in all gaited breeds in the sample led to the identification of a shared, 186-kb haplotype including two doublesex related mab transcription factor genes (DMRT2 and 3). The recent identification of a DMRT3 mutation within this haplotype, which appears necessary for the ability to perform alternative gaits, provides further evidence for selection at this locus. Finally, putative loci for the determination of size were identified in the draft breeds and the Miniature horse on ECA11, as well as when signatures of selection surrounding candidate genes at other loci were examined. This work provides further evidence of the importance of MSTN in racing breeds, provides strong evidence for selection upon gait and size, and illustrates the potential for population-based techniques to find genomic regions driving important phenotypes in the modern horse. PMID:23349635
Kovach, Ryan P; Hand, Brian K; Hohenlohe, Paul A; Cosart, Ted F; Boyer, Matthew C; Neville, Helen H; Muhlfeld, Clint C; Amish, Stephen J; Carim, Kellie; Narum, Shawn R; Lowe, Winsor H; Allendorf, Fred W; Luikart, Gordon
Evolutionary and ecological consequences of hybridization between native and invasive species are notoriously complicated because patterns of selection acting on non-native alleles can vary throughout the genome and across environments. Rapid advances in genomics now make it feasible to assess locus-specific and genome-wide patterns of natural selection acting on invasive introgression within and among natural populations occupying diverse environments. We quantified genome-wide patterns of admixture across multiple independent hybrid zones of native westslope cutthroat trout and invasive rainbow trout, the world's most widely introduced fish, by genotyping 339 individuals from 21 populations using 9380 species-diagnostic loci. A significantly greater proportion of the genome appeared to be under selection favouring native cutthroat trout (rather than rainbow trout), and this pattern was pervasive across the genome (detected on most chromosomes). Furthermore, selection against invasive alleles was consistent across populations and environments, even in those where rainbow trout were predicted to have a selective advantage (warm environments). These data corroborate field studies showing that hybrids between these species have lower fitness than the native taxa, and show that these fitness differences are due to selection favouring many native genes distributed widely throughout the genome. © 2016 The Author(s).
He, Xiaoping; Johansson, Mattias L; Heath, Daniel D
The use and importance of reintroduction as a conservation tool to return a species to its historical range from which it has been extirpated will increase as climate change and human development accelerate habitat loss and population extinctions. Although the number of reintroduction attempts has increased rapidly over the past 2 decades, the success rate is generally low. As a result of population differences in fitness-related traits and divergent responses to environmental stresses, population performance upon reintroduction is highly variable, and it is generally agreed that selecting an appropriate source population is a critical component of a successful reintroduction. Conservation genomics is an emerging field that addresses long-standing challenges in conservation, and the potential for using novel molecular genetic approaches to inform and improve conservation efforts is high. Because the successful establishment and persistence of reintroduced populations is highly dependent on the functional genetic variation and environmental stress tolerance of the source population, we propose the application of conservation genomics and transcriptomics to guide reintroduction practices. Specifically, we propose using genome-wide functional loci to estimate genetic variation of source populations. This estimate can then be used to predict the potential for adaptation. We also propose using transcriptional profiling to measure the expression response of fitness-related genes to environmental stresses as a proxy for acclimation (tolerance) capacity. Appropriate application of conservation genomics and transcriptomics has the potential to dramatically enhance reintroduction success in a time of rapidly declining biodiversity and accelerating environmental change. © 2016 Society for Conservation Biology.
Kessner, Darren; Novembre, John
Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTL) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTL under selection affects the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50-100%) can be explained by detected QTL in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.
Full Text Available For years evolutionary biologists have been interested in searching for the genetic bases underlying humanness. Recent efforts at a large or a complete genomic scale have been conducted to search for positively selected genes in human and in chimp. However, recently developed methods allowing for a more sensitive and controlled approach in the detection of positive selection can be employed. Here, using 13,198 genes, we have deduced the sets of genes involved in rate acceleration, positive selection, and relaxation of selective constraints in human, in chimp, and in their ancestral lineage since the divergence from murids. Significant deviations from the strict molecular clock were observed in 469 human and in 651 chimp genes. The more stringent branch-site test of positive selection detected 108 human and 577 chimp positively selected genes. An important proportion of the positively selected genes did not show a significant acceleration in rates, and similarly, many of the accelerated genes did not show significant signals of positive selection. Functional differentiation of genes under rate acceleration, positive selection, and relaxation was not statistically significant between human and chimp with the exception of terms related to G-protein coupled receptors and sensory perception. Both of these were over-represented under relaxation in human in relation to chimp. Comparing differences between derived and ancestral lineages, a more conspicuous change in trends seems to have favored positive selection in the human lineage. Since most of the positively selected genes are different under the same functional categories between these species, we suggest that the individual roles of the alternative positively selected genes may be an important factor underlying biological differences between these species.
Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma
Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude. PMID:26879527
Jobson, Richard W; Nabholz, Benoit; Galtier, Nicolas
Aging is thought to occur through the accumulation of biochemical damage affecting DNA, proteins, and lipids. The major source of cellular damage involves the generation of reactive oxygen species produced during mitochondrial respiratory activity of the electron transport chain. Energetic metabolism, antioxidative processes, genome maintenance, and cell cycle are the cellular functions most commonly associated with aging, from experimental studies of model organisms. The significance of these experiments with respect to longevity-related selective constraints in nature remains unclear. Here we took a phylogenomic approach to identify the genetic targets of natural selection for elongated life span in mammals. By comparing the nonsynonymous and synonymous evolution of approximately 5.7 million codon sites across 25 species, we identify codons and genes showing a stronger level of amino acid conservation in long-lived than in short-lived lineages. We show that genes involved in lipid composition and (collagen associated) vitamin C binding have collectively undergone increased selective pressure in long-lived species, whereas genes involved in DNA replication/repair or antioxidation have not. Most of the candidate genes experimentally associated with aging (e.g., PolG, Sod, Foxo) have played no detectable role in the evolution of longevity in mammals. A large body of current medical research aims at discovering how to increase longevity in human. In this study, we uncovered the way natural selection has completed this task during mammalian evolution. Cellular membrane and extracellular collagen composition, not genome integrity, have apparently been the optimized features.
Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma
Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude.
Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie; Schwientek, Patrick; Tremblay, Julien; Schackwitz, Wendy; Martin, Joel; Pati, Amrita; Bushnell, Brian; Foster, Brian; Kang, Dongwan; Tringe, Susannah G.; Bertilsson, Stefan; Moran, Mary Ann; Shade, Ashley; Newton, Ryan J.; Stevens, Sarah; McMahon, Katherine D.; Malmstrom, Rex R.
Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.
Chen, Haixia; Sun, Shichun; Norenburg, Jon L; Sundberg, Per
The phenomenon of codon usage bias is known to exist in many genomes and it is mainly determined by mutation and selection. To understand the patterns of codon usage in nemertean mitochondrial genomes, we use bioinformatic approaches to analyze the protein-coding sequences of eight nemertean species. Neutrality analysis did not find a significant correlation between GC12 and GC3. ENc-plot showed a few genes on or close to the expected curve, but the majority of points with low-ENc values are below it. ENc-plot suggested that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that GC and AT were not used proportionally and we propose that codons containing A or U at third position are used preferentially in nemertean species, regardless of whether corresponding tRNAs are encoded in the mitochondrial DNA. Context-dependent analysis indicated that the nucleotide at the second codon position slightly affects synonymous codon choices. These results suggested that mutational and selection forces are probably acting to codon usage bias in nemertean mitochondrial genomes.
Full Text Available The phenomenon of codon usage bias is known to exist in many genomes and it is mainly determined by mutation and selection. To understand the patterns of codon usage in nemertean mitochondrial genomes, we use bioinformatic approaches to analyze the protein-coding sequences of eight nemertean species. Neutrality analysis did not find a significant correlation between GC12 and GC3. ENc-plot showed a few genes on or close to the expected curve, but the majority of points with low-ENc values are below it. ENc-plot suggested that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2 analysis showed that GC and AT were not used proportionally and we propose that codons containing A or U at third position are used preferentially in nemertean species, regardless of whether corresponding tRNAs are encoded in the mitochondrial DNA. Context-dependent analysis indicated that the nucleotide at the second codon position slightly affects synonymous codon choices. These results suggested that mutational and selection forces are probably acting to codon usage bias in nemertean mitochondrial genomes.
Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie; Schwientek, Patrick; Tremblay, Julien; Schackwitz, Wendy; Martin, Joel; Pati, Amrita; Bushnell, Brian; Foster, Brian; Kang, Dongwan; Tringe, Susannah G.; Bertilsson, Stefan; Moran, Mary Ann; Shade, Ashley; Newton, Ryan J.; Stevens, Sarah; McMcahon, Katherine D.; Mamlstrom, Rex R.
Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.
Harris, S; Foord, S M
The completion of the first draft of the human genome presents both a tremendous opportunity and enormous challenge to the pharmaceutical industry since the whole community, with few exceptions, will soon have access to the same pool of candidate gene sequences from which to select future therapeutic targets. The commercial imperative to select and pursue therapeutically relevant genes from within the overall content of the genome will be particularly intense for those gene families that currently represent the chemically tractable or 'drugable' gene targets. As a consequence the emphasis within exploratory research has shifted towards the evaluation and adoption of technology platforms that can add additional value to the gene selection process, either through functional studies or direct/indirect measures of disease alignment e.g., genetics, differential gene expression, proteomics, tissue distribution, comparative species data etc. The selection of biological targets for the development of potential new medicines relies, in part, on the quality of the in vivo biological data that correlates a particular molecular target with the underlying pathophysiology of a disease. Within the pharmaceutical industry, studies employing transgenic animals and, in particular, animals with specific gene deletions are playing an increasingly important role in the therapeutic target gene selection, drug candidate selection and product development phases of the overall drug discovery process. The potential of phenotypic information from gene knock-outs to contribute to a high-throughput target selection/validation strategy has hitherto been limited by the resources required to rapidly generate and characterise a large number of knock-out transgenics in a timely fashion. The offerings of several companies that provide an opportunity to overcome these hurdles, albeit at a cost, are assessed with respect to the strategic business needs of the pharmaceutical industry.
Dimitriadou, Eftychia; Melotte, Cindy; Debrock, Sophie; Esteki, Masoud Zamani; Dierickx, Kris; Voet, Thierry; Devriendt, Koen; de Ravel, Thomy; Legius, Eric; Peeraer, Karen; Meuleman, Christel; Vermeesch, Joris Robert
How to select and prioritize embryos during PGD following genome-wide haplotyping? In addition to genetic disease-specific information, the embryo selected for transfer is based on ranking criteria including the existence of mitotic and/or meiotic aneuploidies, but not carriership of mutations causing recessive disorders. Embryo selection for monogenic diseases has been mainly performed using targeted disease-specific assays. Recently, these targeted approaches are being complemented by generic genome-wide genetic analysis methods such as karyomapping or haplarithmisis, which are based on genomic haplotype reconstruction of cell(s) biopsied from embryos. This provides not only information about the inheritance of Mendelian disease alleles but also about numerical and structural chromosome anomalies and haplotypes genome-wide. Reflections on how to use this information in the diagnostic laboratory are lacking. We present the results of the first 101 PGD cycles (373 embryos) using haplarithmisis, performed in the Centre for Human Genetics, UZ Leuven. The questions raised were addressed by a multidisciplinary team of clinical geneticist, fertility specialists and ethicists. Sixty-three couples enrolled in the genome-wide haplotyping-based PGD program. Families presented with either inherited genetic variants causing known disorders and/or chromosomal rearrangements that could lead to unbalanced translocations in the offspring. Embryos were selected based on the absence or presence of the disease allele, a trisomy or other chromosomal abnormality leading to known developmental disorders. In addition, morphologically normal Day 5 embryos were prioritized for transfer based on the presence of other chromosomal imbalances and/or carrier information. Some of the choices made and principles put forward are specific for cleavage-stage-based genetic testing. The proposed guidelines are subject to continuous update based on the accumulating knowledge from the implementation of
Full Text Available Acquired immunity in vertebrates maintains polymorphisms in endemic pathogens, leading to identifiable signatures of balancing selection. To comprehensively survey for genes under such selection in the human malaria parasite Plasmodium falciparum, we generated paired-end short-read sequences of parasites in clinical isolates from an endemic Gambian population, which were mapped to the 3D7 strain reference genome to yield high-quality genome-wide coding sequence data for 65 isolates. A minority of genes did not map reliably, including the hypervariable var, rifin, and stevor families, but 5,056 genes (90.9% of all in the genome had >70% sequence coverage with minimum read depth of 5 for at least 50 isolates, of which 2,853 genes contained 3 or more single nucleotide polymorphisms (SNPs for analysis of polymorphic site frequency spectra. Against an overall background of negatively skewed frequencies, as expected from historical population expansion combined with purifying selection, the outlying minority of genes with signatures indicating exceptionally intermediate frequencies were identified. Comparing genes with different stage-specificity, such signatures were most common in those with peak expression at the merozoite stage that invades erythrocytes. Members of clag, PfMC-2TM, surfin, and msp3-like gene families were highly represented, the strongest signature being in the msp3-like gene PF10_0355. Analysis of msp3-like transcripts in 45 clinical and 11 laboratory adapted isolates grown to merozoite-containing schizont stages revealed surprisingly low expression of PF10_0355. In diverse clonal parasite lines the protein product was expressed in a minority of mature schizonts (<1% in most lines and ∼10% in clone HB3, and eight sub-clones of HB3 cultured separately had an intermediate spectrum of positive frequencies (0.9 to 7.5%, indicating phase variable expression of this polymorphic antigen. This and other identified targets of balancing
Blackmon, M. A.; Murphy, J. R.
Using atmospheric dust abundance and atmospheric temperature observation data from the Thermal Emission Spectrometer (TES) on board the Mars Global Surveyor (MGS), the net flux of dust into and out of the Martian polar regions will be examined. Mars polar regions possess layered terrain , believed to be comprised of a mixture of ice and dust, with the different layers possibly representing different past climate regimes. These changes in climate may reflect changes in the deposition of dust and volatiles through impacts, volcanism, changes in resources of ice and dust, and response to Milankovitch type cycles (changes in eccentricity of orbit, obliquity and precession of axis). Understanding how rapidly such layers can be generated is an important element to understanding Mars climate history. This study uses the observed vertical temperature data and dust content measurements from TES to analyze the sign (gain or loss) of dust at high latitudes.
Hertveldt, Kirsten; Robben, Johan; Volckaert, Guido
Interaction selection by biopanning from a fragmented yeast proteome displayed on filamentous phage particles was successful in identifying proline-rich fragments of Boi1p and Boi2p. These proteins bind to the second "src homology region 3'' (SH3) domain of Bem1p, a protein of Saccharomyces cerevisiae involved in bud formation. Target Bem1p was a doubly-tagged recombinant, Bem1([Asn142-Ile551]), which strongly interacts in ELISA with a C-terminal 75 amino acids polypeptide from Cdc24p exposed on phage. The whole yeast genomic display library contained approximately 7.7 x 10(7) independent clones of sheared S. cerevisiae genomic DNA fused to a truncated M13 gene III. This study corroborates the value of fragmented-proteome display to identify strong and direct interacting protein modules.
Olsen, Kenneth M.; Caicedo, Ana L.; Polato, Nicholas; McClung, Anna; McCouch, Susan; Purugganan, Michael D.
Rice (Oryza sativa) was cultivated by Asian Neolithic farmers >11,000 years ago, and different cultures have selected for divergent starch qualities in the rice grain during and after the domestication process. An intron 1 splice donor site mutation of the Waxy gene is responsible for the absence of amylose in glutinous rice varieties. This mutation appears to have also played an important role in the origin of low amylose, nonglutinous temperate japonica rice varieties, which form a primary component of Northeast Asian cuisines. Waxy DNA sequence analyses indicate that the splice donor mutation is prevalent in temperate japonica rice varieties, but rare or absent in tropical japonica, indica, aus, and aromatic varieties. Sequence analysis across a 500-kb genomic region centered on Waxy reveals patterns consistent with a selective sweep in the temperate japonicas associated with the mutation. The size of the selective sweep (>250 kb) indicates very strong selection in this region, with an inferred selection coefficient that is higher than similar estimates from maize domestication genes or wild species. These findings demonstrate that selection pressures associated with crop domestication regimes can exceed by one to two orders of magnitude those observed for genes under even strong selection in natural systems. PMID:16547098
Moon, Sunjin; Lee, Jin Woo; Shin, Donghyun; Shin, Kwang-Yun; Kim, Jun; Choi, Ik-Young; Kim, Jaemin; Kim, Heebal
Using next-generation sequencing, we conducted a genome-wide scan of selective sweeps associated with selection toward genetic improvement in Thoroughbreds. We investigated potential phenotypic consequence of putative candidate loci by candidate gene association mapping for the finishing time in 240 Thoroughbred horses. We found a significant association with the trait for Ral GApase alpha 2 (RALGAP2) that regulates a variety of cellular processes of signal trafficking. Neighboring genes around RALGAP2 included insulinoma-associated 1 (INSM1), pallid (PLDN), and Ras and Rab interactor 2 (RIN2) genes have similar roles in signal trafficking, suggesting that a co-evolving gene cluster located on the chromosome 22 is under strong artificial selection in racehorses. PMID:26333666
Full Text Available Using next-generation sequencing, we conducted a genome-wide scan of selective sweeps associated with selection toward genetic improvement in Thoroughbreds. We investigated potential phenotypic consequence of putative candidate loci by candidate gene association mapping for the finishing time in 240 Thoroughbred horses. We found a significant association with the trait for Ral GApase alpha 2 (RALGAP2 that regulates a variety of cellular processes of signal trafficking. Neighboring genes around RALGAP2 included insulinoma-associated 1 (INSM1, pallid (PLDN, and Ras and Rab interactor 2 (RIN2 genes have similar roles in signal trafficking, suggesting that a co-evolving gene cluster located on the chromosome 22 is under strong artificial selection in racehorses.
Vamathevan, Jessica J., E-mail: email@example.com [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage (United Kingdom); Hall, Matthew D.; Hasan, Samiul; Woollard, Peter M. [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage (United Kingdom); Xu, Meng; Yang, Yulan; Li, Xin; Wang, Xiaoli [BGI-Shenzen, Shenzhen (China); Kenny, Steve [Safety Assessment, PTS, GlaxoSmithKline, Ware (United Kingdom); Brown, James R. [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Collegeville, PA (United States); Huxley-Jones, Julie [UK Platform Technology Sciences (PTS) Operations and Planning, PTS, GlaxoSmithKline, Stevenage (United Kingdom); Lyon, Jon; Haselden, John [Safety Assessment, PTS, GlaxoSmithKline, Ware (United Kingdom); Min, Jiumeng [BGI-Shenzen, Shenzhen (China); Sanseau, Philippe [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage (United Kingdom)
Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of the Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research. - Highlights: • Genomes of the minipig and beagle dog, two species used in pharmaceutical studies. • First systematic comparative genome analysis of human and six experimental animals. • Key drug toxicology genes display unique duplication patterns across species. • Comparison of 317 drug targets show species-specific evolutionary patterns.
Andreia J Amaral
Full Text Available BACKGROUND: Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. METHODOLOGY/MAIN FINDINGS: Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL representing ∼2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. CONCLUSIONS/SIGNIFICANCE: These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify
Full Text Available High-altitude hypoxia (reduced inspired oxygen tension due to decreased barometric pressure exerts severe physiological stress on the human body. Two high-altitude regions where humans have lived for millennia are the Andean Altiplano and the Tibetan Plateau. Populations living in these regions exhibit unique circulatory, respiratory, and hematological adaptations to life at high altitude. Although these responses have been well characterized physiologically, their underlying genetic basis remains unknown. We performed a genome scan to identify genes showing evidence of adaptation to hypoxia. We looked across each chromosome to identify genomic regions with previously unknown function with respect to altitude phenotypes. In addition, groups of genes functioning in oxygen metabolism and sensing were examined to test the hypothesis that particular pathways have been involved in genetic adaptation to altitude. Applying four population genetic statistics commonly used for detecting signatures of natural selection, we identified selection-nominated candidate genes and gene regions in these two populations (Andeans and Tibetans separately. The Tibetan and Andean patterns of genetic adaptation are largely distinct from one another, with both populations showing evidence of positive natural selection in different genes or gene regions. Interestingly, one gene previously known to be important in cellular oxygen sensing, EGLN1 (also known as PHD2, shows evidence of positive selection in both Tibetans and Andeans. However, the pattern of variation for this gene differs between the two populations. Our results indicate that several key HIF-regulatory and targeted genes are responsible for adaptation to high altitude in Andeans and Tibetans, and several different chromosomal regions are implicated in the putative response to selection. These data suggest a genetic role in high-altitude adaption and provide a basis for future genotype/phenotype association
Lorenz, Aaron J.; Beissinger, Timothy M.; Silva, Renato Rodrigues; de Leon, Natalia
Maize silage is forage of high quality and yield, and represents the second most important use of maize in the United States. The Wisconsin Quality Synthetic (WQS) maize population has undergone five cycles of recurrent selection for silage yield and composition, resulting in a genetically improved population. The application of high-density molecular markers allows breeders and geneticists to identify important loci through association analysis and selection mapping, as well as to monitor changes in the distribution of genetic diversity across the genome. The objectives of this study were to identify loci controlling variation for maize silage traits through association analysis and the assessment of selection signatures and to describe changes in the genomic distribution of gene diversity through selection and genetic drift in the WQS recurrent selection program. We failed to find any significant marker-trait associations using the historical phenotypic data from WQS breeding trials combined with 17,719 high-quality, informative single nucleotide polymorphisms. Likewise, no strong genomic signatures were left by selection on silage yield and quality in the WQS despite genetic gain for these traits. These results could be due to the genetic complexity underlying these traits, or the role of selection on standing genetic variation. Variation in loss of diversity through drift was observed across the genome. Some large regions experienced much greater loss in diversity than what is expected, suggesting limited recombination combined with small populations in recurrent selection programs could easily lead to fixation of large swaths of the genome. PMID:25645532
Lorenz, Aaron J; Beissinger, Timothy M; Silva, Renato Rodrigues; de Leon, Natalia
Maize silage is forage of high quality and yield, and represents the second most important use of maize in the United States. The Wisconsin Quality Synthetic (WQS) maize population has undergone five cycles of recurrent selection for silage yield and composition, resulting in a genetically improved population. The application of high-density molecular markers allows breeders and geneticists to identify important loci through association analysis and selection mapping, as well as to monitor changes in the distribution of genetic diversity across the genome. The objectives of this study were to identify loci controlling variation for maize silage traits through association analysis and the assessment of selection signatures and to describe changes in the genomic distribution of gene diversity through selection and genetic drift in the WQS recurrent selection program. We failed to find any significant marker-trait associations using the historical phenotypic data from WQS breeding trials combined with 17,719 high-quality, informative single nucleotide polymorphisms. Likewise, no strong genomic signatures were left by selection on silage yield and quality in the WQS despite genetic gain for these traits. These results could be due to the genetic complexity underlying these traits, or the role of selection on standing genetic variation. Variation in loss of diversity through drift was observed across the genome. Some large regions experienced much greater loss in diversity than what is expected, suggesting limited recombination combined with small populations in recurrent selection programs could easily lead to fixation of large swaths of the genome.
Cornelia Di Gaetano
Full Text Available The peculiar position of Sardinia in the Mediterranean sea has rendered its population an interesting biogeographical isolate. The aim of this study was to investigate the genetic population structure, as well as to estimate Runs of Homozygosity and regions under positive selection, using about 1.2 million single nucleotide polymorphisms genotyped in 1077 Sardinian individuals. Using four different methods--fixation index, inflation factor, principal component analysis and ancestry estimation--we were able to highlight, as expected for a genetic isolate, the high internal homogeneity of the island. Sardinians showed a higher percentage of genome covered by RoHs>0.5 Mb (F(RoH%0.5 when compared to peninsular Italians, with the only exception of the area surrounding Alghero. We furthermore identified 9 genomic regions showing signs of positive selection and, we re-captured many previously inferred signals. Other regions harbor novel candidate genes for positive selection, like TMEM252, or regions containing long non coding RNA. With the present study we confirmed the high genetic homogeneity of Sardinia that may be explained by the shared ancestry combined with the action of evolutionary forces.
Full Text Available Heritability of acquired phenotypic traits is an adaptive evolutionary process that appears more complex than the basic allele selection guided by environmental pressure. In insects, the trans-generational transmission of epigenetic marks in clonal and/or sexual species is poorly documented. Aphids were used as a model to explore this feature because their asexual phase generates a stochastic and/or environment-oriented repertoire of variants. The a priori unchanged genome in clonal individuals prompts us to hypothesize whether covalent methyl DNA marks might be associated to the phenotypic variability and fitness selection. The full differential transcriptome between two environmentally selected clonal variants that originated from the same founder mother was mapped on the entire genomic scaffolds, in parallel with the methyl cytosine distribution. Data suggest that the assortments of heavily methylated DNA sites are distinct in these two clonal phenotypes. This might constitute an epigenetic mechanism that confers the robust adaptation of insect species to various environments involving clonal reproduction.
Full Text Available Korean Hanwoo cattle have been subjected to intensive artificial selection over the past four decades to improve meat production traits. Another three cattle varieties very closely related to Hanwoo reside in Korea (Jeju Black and Brindle and in China (Yanbian. These breeds have not been part of a breeding scheme to improve production traits. Here, we compare the selected Hanwoo against these similar but presumed to be unselected populations to identify genomic regions that have been under recent selection pressure due to the breeding program. Rsb statistics were used to contrast the genomes of Hanwoo versus a pooled sample of the three unselected population (UN. We identified 37 significant SNPs (FDR corrected in the HW/UN comparison and 21 known protein coding genes were within 1 MB to the identified SNPs. These genes were previously reported to affect traits important for meat production (14 genes, reproduction including mammary gland development (3 genes, coat color (2 genes, and genes affecting behavioral traits in a broader sense (2 genes. We subsequently sequenced (Illumina HiSeq 2000 platform 10 individuals of the brown Hanwoo and the Chinese Yanbian to identify SNPs within the candidate genomic regions. Based on allele frequency differences, haplotype structures, and literature research, we singled out one non-synonymous SNP in the APP gene (APP: c.569C>T, Ala199Val and predicted the mutational effect on the protein structure. We found that protein-protein interactions might be impaired due to increased exposed hydrophobic surfaces of the mutated protein. The APP gene has also been reported to affect meat tenderness in pigs and obesity in humans. Meat tenderness has been linked to intramuscular fat content, which is one of the main breeding goals for brown Hanwoo, potentially supporting a causal influence of the herein described nsSNP in the APP gene.
Choi, Bong Hwan; Chai, Han Ha; Cho, Yong Min; Jang, Gul Won; Kim, Tae-Hun; Gondro, Cedric; Lee, Seung Hwan
Korean Hanwoo cattle have been subjected to intensive artificial selection over the past four decades to improve meat production traits. Another three cattle varieties very closely related to Hanwoo reside in Korea (Jeju Black and Brindle) and in China (Yanbian). These breeds have not been part of a breeding scheme to improve production traits. Here, we compare the selected Hanwoo against these similar but presumed to be unselected populations to identify genomic regions that have been under recent selection pressure due to the breeding program. Rsb statistics were used to contrast the genomes of Hanwoo versus a pooled sample of the three unselected population (UN). We identified 37 significant SNPs (FDR corrected) in the HW/UN comparison and 21 known protein coding genes were within 1 MB to the identified SNPs. These genes were previously reported to affect traits important for meat production (14 genes), reproduction including mammary gland development (3 genes), coat color (2 genes), and genes affecting behavioral traits in a broader sense (2 genes). We subsequently sequenced (Illumina HiSeq 2000 platform) 10 individuals of the brown Hanwoo and the Chinese Yanbian to identify SNPs within the candidate genomic regions. Based on allele frequency differences, haplotype structures, and literature research, we singled out one non-synonymous SNP in the APP gene (APP: c.569C>T, Ala199Val) and predicted the mutational effect on the protein structure. We found that protein-protein interactions might be impaired due to increased exposed hydrophobic surfaces of the mutated protein. The APP gene has also been reported to affect meat tenderness in pigs and obesity in humans. Meat tenderness has been linked to intramuscular fat content, which is one of the main breeding goals for brown Hanwoo, potentially supporting a causal influence of the herein described nsSNP in the APP gene. PMID:27023061
Xu, Zhichao; Xu, Jiang; Ji, Aijia; Zhu, Yingjie; Zhang, Xin; Hu, Yuanlei; Song, Jingyuan; Chen, Shilin
Quantitative real-time polymerase chain reaction (qRT-PCR) is widely used for the accurate analysis of gene expression. However, high homology among gene families might result in unsuitability of reference genes, which leads to the inaccuracy of qRT-PCR analysis. The release of the Ganoderma lucidum genome has triggered numerous studies to be done on the homology among gene families with the purpose of selecting reliable reference genes. Based on the G. lucdum genome and transcriptome database, 38 candidate reference genes including 28 novel genes were systematically selected and evaluated for qRT-PCR normalization. The result indicated that commonly used polyubiquitin (PUB), beta-actin (BAT), and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) were unsuitable reference genes because of the high sequence similarity and low primer specificity. According to the evaluation of RefFinder, cyclophilin 5 (CYP5) was ranked as the most stable reference gene for 27 tested samples under all experimental conditions and eighteen mycelial samples. Based on sequence analysis and expression analysis, our study suggested that gene characteristic, primer specificity of high homologous genes, allele-specificity expression of candidate genes and under-evaluation of reference genes influenced the accuracy and sensitivity of qRT-PCR analysis. This investigation not only revealed potential factors influencing the unsuitability of reference genes but also selected the superior reference genes from more candidate genes and testing samples than those used in the previous study. Furthermore, our study established a model for reference gene analysis by using the genomic sequence.
We have shown previously that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative enabling exploitation...
Zieba Jennifer T
Full Text Available Abstract Background Genetic interactions within hybrids influence their overall fitness. Understanding the details of these interactions can improve our understanding of speciation. One experimental approach is to investigate deviations from Mendelian expectations (segregation distortion in the inheritance of mapped genetic markers. In this study, we used the copepod Tigriopus californicus, a species which exhibits high genetic divergence between populations and a general pattern of reduced fitness in F2 interpopulation hybrids. Previous studies have implicated both nuclear-cytoplasmic and nuclear-nuclear interactions in causing this fitness reduction. We identified and mapped population-diagnostic single nucleotide polymorphisms (SNPs and used these to examine segregation distortion across the genome within F2 hybrids. Results We generated a linkage map which included 45 newly elucidated SNPs and 8 population-diagnostic microsatellites used in previous studies. The map, the first available for the Copepoda, was estimated to cover 75% of the genome and included markers on all 12 T. californicus chromosomes. We observed little segregation distortion in newly hatched F2 hybrid larvae (fewer than 10% of markers at p Conclusion Adult male F2 hybrids between two populations of T. californius exhibit dramatic segregation distortion across the genome. Distorted loci are clustered within specific linkage groups, and the direction of distortion differs between chromosomes. This segregation distortion is due to selection acting between hatching and adulthood.
Egan, Scott P; Ragland, Gregory J; Assour, Lauren; Powell, Thomas H Q; Hood, Glen R; Emrich, Scott; Nosil, Patrik; Feder, Jeffrey L
Theory predicts that speciation-with-gene-flow is more likely when the consequences of selection for population divergence transitions from mainly direct effects of selection acting on individual genes to a collective property of all selected genes in the genome. Thus, understanding the direct impacts of ecologically based selection, as well as the indirect effects due to correlations among loci, is critical to understanding speciation. Here, we measure the genome-wide impacts of host-associated selection between hawthorn and apple host races of Rhagoletis pomonella (Diptera: Tephritidae), a model for contemporary speciation-with-gene-flow. Allele frequency shifts of 32 455 SNPs induced in a selection experiment based on host phenology were genome wide and highly concordant with genetic divergence between co-occurring apple and hawthorn flies in nature. This striking genome-wide similarity between experimental and natural populations of R. pomonella underscores the importance of ecological selection at early stages of divergence and calls for further integration of studies of eco-evolutionary dynamics and genome divergence. © 2015 The Authors Ecology Letters published by John Wiley & Sons Ltd and CNRS.
Full Text Available Abstract Background Ammonium is one of the major forms in which nitrogen is available for plant growth. OsAMT1;1 is a high-affinity ammonium transporter in rice (Oryza sativa L., responsible for ammonium uptake at low nitrogen concentration. The expression pattern of the gene has been reported. However, variations in its nucleotides and the evolutionary pathway of its descent from wild progenitors are yet to be elucidated. In this study, nucleotide diversity of the gene OsAMT1;1 and the diversity pattern of seven gene fragments spanning a genomic region approximately 150 kb long surrounding the gene were surveyed by sequencing a panel of 216 rice accessions including both cultivated rice and wild relatives. Results Nucleotide polymorphism (Pi of OsAMT1;1 was as low as 0.00004 in cultivated rice (Oryza sativa, only 2.3% of that in the common wild rice (O. rufipogon. A single dominant haplotype was fixed at the locus in O. sativa. The test values for neutrality were significantly negative in the entire region stretching 5' upstream and 3' downstream of the gene in all accessions. The value of linkage disequilibrium remained high across a 100 kb genomic region around OsAMT1;1 in O. sativa, but fell rapidly in O. rufipogon on either side of the promoter of OsAMT1;1, demonstrating a strong natural selection within or nearby the ammonium transporter. Conclusions The severe reduction in nucleotide variation at OsAMT1;1 in rice was caused by a selective sweep around OsAMT1;1, which may reflect the nitrogen uptake system under strong selection by the paddy soil during the domestication of rice. Purifying selection also occurred before the wild rice diverged into its two subspecies, namely indica and japonica. These findings would provide useful insights into the processes of evolution and domestication of nitrogen uptake genes in rice.
Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.
Gladieux, Pierre; Wilson, Benjamin A; Perraudeau, Fanny; Montoya, Liliam A; Kowbel, David; Hann-Soden, Christopher; Fischer, Monika; Sylvain, Iman; Jacobson, David J; Taylor, John W
Delineating microbial populations, discovering ecologically relevant phenotypes and identifying migrants, hybrids or admixed individuals have long proved notoriously difficult, thereby limiting our understanding of the evolutionary forces at play during the diversification of microbial species. However, recent advances in sequencing and computational methods have enabled an unbiased approach whereby incipient species and the genetic correlates of speciation can be identified by examining patterns of genomic variation within and between lineages. We present here a population genomic study of a phylogenetic species in the Neurospora discreta species complex, based on the resequencing of full genomes (~37 Mb) for 52 fungal isolates from nine sites in three continents. Population structure analyses revealed two distinct lineages in South-East Asia, and three lineages in North America/Europe with a broad longitudinal and latitudinal range and limited admixture between lineages. Genome scans for selective sweeps and comparisons of the genomic landscapes of diversity and recombination provided no support for a role of selection at linked sites on genomic heterogeneity in levels of divergence between lineages. However, demographic inference indicated that the observed genomic heterogeneity in divergence was generated by varying rates of gene flow between lineages following a period of isolation. Many putative cases of exchange of genetic material between phylogenetically divergent fungal lineages have been discovered, and our work highlights the quantitative importance of genetic exchanges between more closely related taxa to the evolution of fungal genomes. Our study also supports the role of allopatric isolation as a driver of diversification in saprobic microbes.
Fernández-Fueyo, Elena; Ruiz-Dueñas, Francisco J; Miki, Yuta; Martínez, María Jesús; Hammel, Kenneth E; Martínez, Angel T
The white-rot fungus Ceriporiopsis subvermispora delignifies lignocellulose with high selectivity, but until now it has appeared to lack the specialized peroxidases, termed lignin peroxidases (LiPs) and versatile peroxidases (VPs), that are generally thought important for ligninolysis. We screened the recently sequenced C. subvermispora genome for genes that encode peroxidases with a potential ligninolytic role. A total of 26 peroxidase genes was apparent after a structural-functional classification based on homology modeling and a search for diagnostic catalytic amino acid residues. In addition to revealing the presence of nine heme-thiolate peroxidase superfamily members and the unexpected absence of the dye-decolorizing peroxidase superfamily, the search showed that the C. subvermispora genome encodes 16 class II enzymes in the plant-fungal-bacterial peroxidase superfamily, where LiPs and VPs are classified. The 16 encoded enzymes include 13 putative manganese peroxidases and one generic peroxidase but most notably two peroxidases containing the catalytic tryptophan characteristic of LiPs and VPs. We expressed these two enzymes in Escherichia coli and determined their substrate specificities on typical LiP/VP substrates, including nonphenolic lignin model monomers and dimers, as well as synthetic lignin. The results show that the two newly discovered C. subvermispora peroxidases are functionally competent LiPs and also suggest that they are phylogenetically and catalytically intermediate between classical LiPs and VPs. These results offer new insight into selective lignin degradation by C. subvermispora.
Jeffrey B. Endelman
Full Text Available Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marker-assisted selection. Genomic selection addresses this complexity by including all markers in the prediction model. A key method for the genomic prediction of breeding values is ridge regression (RR, which is equivalent to best linear unbiased prediction (BLUP when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other kernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive kernels in plant breeding, a new software package for R called rrBLUP has been developed. At its core is a fast maximum-likelihood algorithm for mixed models with a single variance component besides the residual error, which allows for efficient prediction with unreplicated training data. Use of the rrBLUP software is demonstrated through several examples, including the identification of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with nonadditive kernels was significantly higher than RR for wheat ( L. grain yield but equivalent for several maize ( L. traits.
Chandonia, John-Marc; Brenner, Steven E.
Structural Genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good financial value, and tractable. In 2003, we presented the ''Pfam5000'' strategy, which involves selecting the 5,000 most important families from the Pfam database as sources for targets. In this update, we show that although both the Pfam database and the number of sequenced genomes have increased in size, the expected benefits of the Pfam5000 strategy have not changed substantially. Solving the structures of proteins from the 5,000 largest Pfam families would allow accurate fold assignment for approximately 65 percent of all prokaryotic proteins (covering 54 percent of residues) and 63 percent of eukaryotic proteins (42 percent of residues). Fewer than 2,300 of the largest families on this list remain to be solved, making the project feasible in the next five years given the expected throughput to be achieved in the production phase of the Protein Structure Initiative.
Nielsen, Karen L.; Godfrey, Paul A.; Stegger, Marc; Andersen, Paal S.; Feldgarden, Michael; Frimodt-Møller, Niels
Identifying and characterizing clonal diversity is important when analysing fecal flora. We evaluated random amplified polymorphic DNA (RAPD) PCR, applied for selection of Escherichia coli isolates, by whole genome sequencing. RAPD was fast, and reproducible as screening method for selection of distinct E. coli clones in fecal swabs. PMID:24912108
Ivanov, A. B.; Byrne, S.; Richardson, M. I.; Vasavada, A. R.; Titus, T. N.; Bell, J. F.; McConnochie, T. H.; Christensen, P. R.
One of the many questions of Martian exploration is to uncover the history of Mars, through analysis of the polar layered deposits (PLD). Martian polar ice caps hold most of the exposed water on the surface on Mars and yet their history and physical processed involved in their formation are unclear. We will attempt to contribute to our knowledge of the composition and stratigraphy of the polar deposits. In this work we present the latest imaging data acquired by the Mars Odyssey THermal EMission Imaging System (THEMIS) and place it into context of the Mars Global Surveyor (MGS) data. THEMIS provides capabilities for imaging in both thermal IR and visible color wavelengths. These observations are affected by atmospheric scattering and topography. The Mars Orbiter Laser Altimeter (MOLA) and Thermal Emission Spectrometer (TES) instruments on board of the MGS spacecraft can provide context information for THEMIS data. Of particular interest are Mars Orbiter Camera (MOC) images, which provide high resolution data. We are primarily interested in the seasonal evolution of ice cap temperatures during the first northern summer of THEMIS observations. Morphology, stratigraphy and composition of the layered deposits can be addressed by THEMIS VIS color images, along with MOC high resolution data and MOLA Digital Elevation Models (DEM). This work is intentionally descriptive. Based on the knowledge obtained by the orbiting spacecraft and described here, we will attempt to expose major directions for modeling and further understanding of of the physical processes involved in the formation of the polar layered terrain 2 Available data 2.1 THEMIS IR The THEMIS IR camera has 10 bands from 6 to 15 m. Due to to signal-to-noise restrictions the most useful band for polar observations is band 9 (12.57 m ). Band 10 (14.88 m ) data can be used for atmospheric calibration. An example of seasonal evolution observed by the THEMIS IR subsystem is shown in Figure 1. We have projected all IR
Nguyen, Thuy T T; Bowman, Phil J; Haile-Mariam, Mekonnen; Pryce, Jennie E; Hayes, Benjamin J
Temperature and humidity levels above a certain threshold decrease milk production in dairy cattle, and genetic variation is associated with the amount of lost production. To enable selection for improved heat tolerance, the aim of this study was to develop genomic estimated breeding values (GEBV) for heat tolerance in dairy cattle. Heat tolerance was defined as the rate of decline in production under heat stress. We combined herd test-day recording data from 366,835 Holstein and 76,852 Jersey cows with daily temperature and humidity measurements from weather stations closest to the tested herds for test days between 2003 and 2013. We used daily mean values of temperature-humidity index averaged for the day of test and the 4 previous days as the measure of heat stress. Tolerance to heat stress was estimated for each cow using a random regression model with a common threshold of temperature-humidity index=60 for all cows. The slope solutions for cows from this model were used to define the daughter trait deviations of their sires. Genomic best linear unbiased prediction was used to calculate GEBV for heat tolerance for milk, fat, and protein yield. Two reference populations were used, the first consisted of genotyped sires only (2,300 Holstein and 575 Jersey sires), and the other included genotyped sires and cows (2,189 Holstein and 1,188 Jersey cows). The remainder of the genotyped sires were used as a validation set. All animals had genotypes for 632,003 single nucleotide polymorphisms. When using only genotyped sires in the reference set and only the first parity data, the accuracy of GEBV for heat tolerance in relation to changes in milk, fat, and protein yield were 0.48, 0.50, and 0.49 in the Holstein validation sires and 0.44, 0.61, and 0.53 in the Jersey validation sires, respectively. Some slight improvement in the accuracy of prediction was achieved when cows were included in the reference population for Holsteins. No clear improvements in the accuracy of
Neves Haroldo HR
Full Text Available Abstract Background The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population. Results For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells, within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS and Support Vector Regression (SVR, were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP figured among the most accurate for WGS and BL, while two variable selection methods ( LASSO and Random Forest, RF had the greatest predictive abilities for %CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions. Conclusions Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of %CD8+ and CD4+/CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR
Adam H Freedman
Full Text Available Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.
José Marcelo Soriano Viana
Full Text Available ABSTRACT To date, the quantitative genetics theory for genomic selection has focused mainly on the relationship between marker and additive variances assuming one marker and one quantitative trait locus (QTL. This study extends the quantitative genetics theory to genomic selection in order to prove that prediction of breeding values based on thousands of single nucleotide polymorphisms (SNPs depends on linkage disequilibrium (LD between markers and QTLs, assuming dominance. We also assessed the efficiency of genomic selection in relation to phenotypic selection, assuming mass selection in an open-pollinated population, all QTLs of lower effect, and reduced sample size, based on simulated data. We show that the average effect of a SNP substitution is proportional to LD measure and to average effect of a gene substitution for each QTL that is in LD with the marker. Weighted (by SNP frequencies and unweighted breeding value predictors have the same accuracy. Efficiency of genomic selection in relation to phenotypic selection is inversely proportional to heritability. Accuracy of breeding value prediction is not affected by the dominance degree and the method of analysis, however, it is influenced by LD extent and magnitude of additive variance. The increase in the number of markers asymptotically improved accuracy of breeding value prediction. The decrease in the sample size from 500 to 200 did not reduce considerably accuracy of breeding value prediction.
Full Text Available Abstract- This article proposes a probabilistic frame built on Scenario fabrication to considerate the uncertainties in the finest action managing of Micro Grids MGs. The MG contains different recoverable energy resources such as Wind Turbine WT Micro Turbine MT Photovoltaic PV Fuel Cell FC and one battery as the storing device. The advised frame is based on scenario generation and Roulette wheel mechanism to produce different circumstances for handling the uncertainties of altered factors. It habits typical spreading role as a probability scattering function of random factors. The uncertainties which are measured in this paper are grid bid alterations cargo request calculating error and PV and WT yield power productions. It is well-intentioned to asset that solving the MG difficult for 24 hours of a day by considering diverse uncertainties and different constraints needs one powerful optimization method that can converge fast when it doesnt fall in local optimal topic. Simultaneously single Group Search Optimization GSO system is presented to vision the total search space globally. The GSO algorithm is instigated from group active of beasts. Also the GSO procedure one change is similarly planned for this algorithm. The planned context and way is applied o one test grid-connected MG as a typical grid.
Genova, Antonio; Goossens, Sander; Lemoine, Frank G.; Mazarico, Erwan; Neumann, Gregory A.; Smith, David E.; Zuber, Maria T.
We present a spherical harmonic solution of the static gravity field of Mars to degree and order 120, GMM-3, that has been calculated using the Deep Space Network tracking data of the NASA Mars missions, Mars Global Surveyor (MGS), Mars Odyssey (ODY), and the Mars Reconnaissance Orbiter (MRO). We have also jointly determined spherical harmonic solutions for the static and time-variable gravity field of Mars, and the Mars k2 Love numbers, exclusive of the gravity contribution of the atmosphere. Consequently, the retrieved time-varying gravity coefficients and the Love number k2 solely yield seasonal variations in the mass of the polar caps and the solid tides of Mars, respectively. We obtain a Mars Love number k2 of 0.1697 ± 0.0027 (3-σ). The inclusion of MRO tracking data results in improved seasonal gravity field coefficients C30 and, for the first time, C50. Refinements of the atmospheric model in our orbit determination program have allowed us to monitor the odd zonal harmonic C30 for ∼1.5 solar cycles (16 years). This gravity model shows improved correlations with MOLA topography up to 15% larger at higher harmonics (l = 60-80) than previous solutions.
Leplat, Florian Jean Victor
-wide association study (GWAS) and chlorophyll a fluorescence phenotyping allowed to identify several QTLs involved in the plant response to Mn deficiency. Multiple candidate coding genes were fund, among which, photosystem II PsbP subunit, germin-like proteins or Mn-Superoxide Dismutase. It supports the Mn...... functionality in Mn dependent pathways and processes. In a the second step, a genuine statistical method to assist breeding programs in selecting new varieties, named Genomic Selection (GS), was applied. It was demonstrated that GS is an effective tool to be used in breeding programs for selecting more...
Fernandez-Fueyo, Elena; Ruiz-Duenas, Francisco J.; Ferreira, Patrica; Floudas, Dimitrios; HIbbett, David S.; Canessa, Paulo; Larrondo, Luis F.; James, Tim Y.; Seelenfreund, Daniela; Lobos, Sergio; Polanco, Ruben; Tello, Mario; Honda, Yoichi; Watanabe, Takahito; Watanabe, Takashi; Ryu, Jae San; Kubicek, Christian P.; Schmoll, Monika; Gaskell, Jill; Hammel, Kenneth E.; John, Franz J.; Vanden Wymelenberg, Amber; Sabat, Grzegorz; Splinter BonDurant, Sandra; Syed, Khajamohiddin; Yadav, Jagjit S.; Doddapaneni, Harshavardhan; Subramanian, Venkataramanan; Lavin, Jose L.; Oguiza, Jose A.; Perez, Gumer; Pisabarro, Antonio G.; Ramirez, Lucia; Santoyo, Francisco; Master, Emma; Coutinho, Pedro M.; Henrissat, Bernard; Lombard, Vincent; Magnuson, Jon Karl; Kues, Ursula; Hori, Chiaki; Igarashi, Kiyohiko; Samejima, Masahiro; Held, Benjamin W.; Barry, Kerrie W.; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lucas, Susan M.; Riley, Robert; Salamov, Asaf A.; Hoffmeister, Dirk; Schwenk, Daniel; Hadar, Yitzhak; Yarden, Oded; de Vries, Ronald P.; Wiebenga, Ad; Stenlid, Jan; Eastwood, Daniel; Grigoriev, Igor V.; Berka, Randy M.; Blanchette, Robert A.; Kersten, Phil; Martinez, Angel T.; Vicuna, Rafael; Cullen, Dan
Efficient lignin depolymerization is unique to the wood decay basidiomycetes, collectively referred to as white rot fungi. Phanerochaete chrysosporium simultaneously degrades lignin and cellulose, whereas the closely related species, Ceriporiopsis subvermispora, also depolymerizes lignin but may do so with relatively little cellulose degradation. To investigate the basis for selective ligninolysis, we conducted comparative genome analysis of C. subvermispora and P. chrysosporium. Genes encoding manganese peroxidase numbered 13 and five in C. subvermispora and P. chrysosporium, respectively. In addition, the C. subvermispora genome contains at least seven genes predicted to encode laccases, whereas the P. chrysosporium genome contains none. We also observed expansion of the number of C. subvermispora desaturase-encoding genes putatively involved in lipid metabolism. Microarray-based transcriptome analysis showed substantial up-regulation of several desaturase and MnP genes in wood-containing medium. MS identified MnP proteins in C. subvermispora culture filtrates, but none in P. chrysosporium cultures. These results support the importance of MnP and a lignin degradation mechanism whereby cleavage of the dominant nonphenolic structures is mediated by lipid peroxidation products. Two C. subvermispora genes were predicted to encode peroxidases structurally similar to P. chrysosporium lignin peroxidase and, following heterologous expression in Escherichia coli, the enzymes were shown to oxidize high redox potential substrates, but not Mn2. Apart from oxidative lignin degradation, we also examined cellulolytic and hemicellulolytic systems in both fungi. In summary, the C. subvermispora genetic inventory and expression patterns exhibit increased oxidoreductase potential and diminished cellulolytic capability relative to P. chrysosporium.
Havird, Justin C; Sloan, Daniel B
Eukaryotes rely on proteins encoded by the nuclear and mitochondrial (mt) genomes, which interact within multisubunit complexes such as oxidative-phosphorylation enzymes. Although selection is thought to be less efficient on the asexual mt genome, in bilaterian animals the ratio of nonsynonymous to synonymous substitutions (ω) is lower in mt- compared with nuclear-encoded OXPHOS subunits, suggesting stronger effects of purifying selection in the mt genome. Because high levels of gene expression constrain protein sequence evolution, one proposed resolution to this paradox is that mt genes are expressed more highly than nuclear genes. To test this hypothesis, we investigated expression and sequence evolution of mt and nuclear genes from 84 diverse eukaryotes that vary in mt gene content and mutation rate. We found that the relationship between mt and nuclear ω values varied dramatically across eukaryotes. In contrast, transcript abundance is consistently higher for mt genes than nuclear genes, regardless of which genes happen to be in the mt genome. Consequently, expression levels cannot be responsible for the differences in ω Rather, 84% of the variance in the ratio of ω values between mt and nuclear genes could be explained by differences in mutation rate between the two genomes. We relate these findings to the hypothesis that high rates of mt mutation select for compensatory changes in the nuclear genome. We also propose an explanation for why mt transcripts consistently outnumber their nuclear counterparts, with implications for mitonuclear protein imbalance and aging.
Larkin, Denis M; Daetwyler, Hans D; Hernandez, Alvaro G; Wright, Chris L; Hetrick, Lorie A; Boucek, Lisa; Bachman, Sharon L; Band, Mark R; Akraiko, Tatsiana V; Cohen-Zinder, Miri; Thimmapuram, Jyothi; Macleod, Iona M; Harkins, Timothy T; McCague, Jennifer E; Goddard, Michael E; Hayes, Ben J; Lewin, Harris A
Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.
Dedukh, Dmitry; Litvinchuk, Spartak; Rosanov, Juriy; Mazepa, Glib; Saifitdinova, Alsu; Shabanov, Dmitry; Krasikova, Alla
Incompatibilities between parental genomes decrease viability of interspecific hybrids; however, deviations from canonical gametogenesis such as genome endoreplication and elimination can rescue hybrid organisms. To evaluate frequency and regularity of genome elimination and endoreplication during gametogenesis in hybrid animals with different ploidy, we examined genome composition in oocytes of di- and triploid hybrid frogs of the Pelophylax esculentus complex. Obtained results allowed us to suggest that during oogenesis the endoreplication involves all genomes occurring before the selective genome elimination. We accepted the hypothesis that only elimination of one copied genome occurs premeiotically in most of triploid hybrid females. At the same time, we rejected the hypothesis stating that the genome of parental species hybrid frogs co-exist with is always eliminated during oogenesis in diploid hybrids. Diploid hybrid frogs demonstrate an enlarged frequency of deviations in oogenesis comparatively to triploid hybrids. Typical for hybrid frogs deviations in gametogenesis increase variability of produced gametes and provide a mechanism for appearance of different forms of hybrids.
Liang, Gang; Zhang, Huimin; Lou, Dengji; Yu, Diqiu
The CRISPR/Cas9-sgRNA system has been developed to mediate genome editing and become a powerful tool for biological research. Employing the CRISPR/Cas9-sgRNA system for genome editing and manipulation has accelerated research and expanded researchers' ability to generate genetic models. However, the method evaluating the efficiency of sgRNAs is lacking in plants. Based on the nucleotide compositions and secondary structures of sgRNAs which have been experimentally validated in plants, we instituted criteria to design efficient sgRNAs. To facilitate the assembly of multiple sgRNA cassettes, we also developed a new strategy to rapidly construct CRISPR/Cas9-sgRNA system for multiplex editing in plants. In theory, up to ten single guide RNA (sgRNA) cassettes can be simultaneously assembled into the final binary vectors. As a proof of concept, 21 sgRNAs complying with the criteria were designed and the corresponding Cas9/sgRNAs expression vectors were constructed. Sequencing analysis of transgenic rice plants suggested that 82% of the desired target sites were edited with deletion, insertion, substitution, and inversion, displaying high editing efficiency. This work provides a convenient approach to select efficient sgRNAs for target editing.
Derringer, Jaime; Corley, Robin P.; Haberstick, Brett C.; Young, Susan E.; Demmitt, Brittany; Howrigan, Daniel P.; Kirkpatrick, Robert M.; Iacono, William G.; McGue, Matt; Keller, Matthew; Brown, Sandra; Tapert, Susan; Hopfer, Christian J.; Stallings, Michael C.; Crowley, Thomas J.; Rhee, Soo Hyun; Krauter, Ken; Hewitt, John K.; McQueen, Matthew B.
Behavioral disinhibition (BD) is a quantitative measure designed to capture the heritable variation encompassing risky and impulsive behaviors. As a result, BD represents an ideal target for discovering genetic loci that predispose individuals to a wide range of antisocial behaviors and substance misuse that together represent a large cost to society as a whole. Published genome-wide association studies (GWAS) have examined specific phenotypes that fall under the umbrella of BD (e.g. alcohol dependence, conduct disorder); however no GWAS has specifically examined the overall BD construct. We conducted a GWAS of BD using a sample of 1,901 adolescents over-selected for characteristics that define high BD, such as substance and antisocial behavior problems, finding no individual locus that surpassed genome-wide significance. Although no single SNP was significantly associated with BD, restricted maximum likelihood analysis estimated that 49.3% of the variance in BD within the Caucasian sub-sample was accounted for by the genotyped SNPs (p=0.06). Gene-based tests identified seven genes associated with BD (p≤2.0×10−6). Although the current study was unable to identify specific SNPs or pathways with replicable effects on BD, the substantial sample variance that could be explained by all genotyped SNPs suggests that larger studies could successfully identify common variants associated with BD. PMID:25637581
Liao, Xiaoping; Peng, Fred; Forni, Selma; McLaren, David; Plastow, Graham; Stothard, Paul
Genetic variation in Gir cattle (Bos indicus) has so far not been well characterized. In this study, we used whole genome sequencing of three Gir bulls and a pooled sample from another 11 bulls to identify polymorphisms and loci under selection. A total of 9 990 733 single nucleotide polymorphisms (SNPs) and 604 308 insertion/deletions (indels) were discovered in Gir samples, of which 62.34% and 83.62%, respectively, are previously unknown. Moreover, we detected 79 putative selective sweeps using the sequence data of the pooled sample. One of the most striking sweeps harbours several genes belonging to the cathelicidin gene family, such as CAMP, CATHL1, CATHL2, and CATHL3, which are related to pathogen- and parasite-resistance. Another interesting region harbours genes encoding mitogen-activated protein kinases, which are involved in directing cellular responses to a variety of stimuli, such as osmotic stress and heat shock. These findings are particularly interesting because Gir is resistant to hot temperatures and tropical diseases. This initial selective sweep analysis of Gir cattle has revealed a number of loci that could be important for their adaptation to tropical climates.
Olson, Erik D; Cantara, William A; Musier-Forsyth, Karin
Two copies of unspliced human immunodeficiency virus (HIV)-1 genomic RNA (gRNA) are preferentially selected for packaging by the group-specific antigen (Gag) polyprotein into progeny virions as a dimer during the late stages of the viral lifecycle. Elucidating the RNA features responsible for selective recognition of the full-length gRNA in the presence of an abundance of other cellular RNAs and spliced viral RNAs remains an area of intense research. The recent nuclear magnetic resonance (NMR) structure by Keane et al.  expands upon previous efforts to determine the conformation of the HIV-1 RNA packaging signal. The data support a secondary structure wherein sequences that constitute the major splice donor site are sequestered through base pairing, and a tertiary structure that adopts a tandem 3-way junction motif that exposes the dimerization initiation site and unpaired guanosines for specific recognition by Gag. While it remains to be established whether this structure is conserved in the context of larger RNA constructs or in the dimer, this study serves as the basis for characterizing large RNA structures using novel NMR techniques, and as a major advance toward understanding how the HIV-1 gRNA is selectively packaged.
Erik D. Olson
Full Text Available Two copies of unspliced human immunodeficiency virus (HIV-1 genomic RNA (gRNA are preferentially selected for packaging by the group-specific antigen (Gag polyprotein into progeny virions as a dimer during the late stages of the viral lifecycle. Elucidating the RNA features responsible for selective recognition of the full-length gRNA in the presence of an abundance of other cellular RNAs and spliced viral RNAs remains an area of intense research. The recent nuclear magnetic resonance (NMR structure by Keane et al.  expands upon previous efforts to determine the conformation of the HIV-1 RNA packaging signal. The data support a secondary structure wherein sequences that constitute the major splice donor site are sequestered through base pairing, and a tertiary structure that adopts a tandem 3-way junction motif that exposes the dimerization initiation site and unpaired guanosines for specific recognition by Gag. While it remains to be established whether this structure is conserved in the context of larger RNA constructs or in the dimer, this study serves as the basis for characterizing large RNA structures using novel NMR techniques, and as a major advance toward understanding how the HIV-1 gRNA is selectively packaged.
Full Text Available Abstract Background Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression methods are employed for SNP selection that consider all markers simultaneously and thus incorporate dependencies among SNPs. Results We develop a novel multivariate algorithm for large scale SNP selection using CAR score regression, a promising new approach for prioritizing biomarkers. Specifically, we propose a computationally efficient procedure for shrinkage estimation of CAR scores from high-dimensional data. Subsequently, we conduct a comprehensive comparison study including five advanced regression approaches (boosting, lasso, NEG, MCP, and CAR score and a univariate approach (marginal correlation to determine the effectiveness in finding true causal SNPs. Conclusions Simultaneous SNP selection is a challenging task. We demonstrate that our CAR score-based algorithm consistently outperforms all competing approaches, both uni- and multivariate, in terms of correctly recovered causal SNPs and SNP ranking. An R package implementing the approach as well as R code to reproduce the complete study presented here is available from http://strimmerlab.org/software/care/.
Treppmann, Tabea; Ickstadt, Katja; Zucknick, Manuela
Bayesian variable selection becomes more and more important in statistical analyses, in particular when performing variable selection in high dimensions. For survival time models and in the presence of genomic data, the state of the art is still quite unexploited. One of the more recent approaches suggests a Bayesian semiparametric proportional hazards model for right censored time-to-event data. We extend this model to directly include variable selection, based on a stochastic search procedure within a Markov chain Monte Carlo sampler for inference. This equips us with an intuitive and flexible approach and provides a way for integrating additional data sources and further extensions. We make use of the possibility of implementing parallel tempering to help improve the mixing of the Markov chains. In our examples, we use this Bayesian approach to integrate copy number variation data into a gene-expression-based survival prediction model. This is achieved by formulating an informed prior based on copy number variation. We perform a simulation study to investigate the model's behavior and prediction performance in different situations before applying it to a dataset of glioblastoma patients and evaluating the biological relevance of the findings.
Full Text Available Recurrent selection (RS has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents ( Np , but little is known about how Np affects genomic selection (GS in RS, especially the persistency of prediction accuracy (rg , g ^ and genetic gain. Synthetics were simulated by intermating Np= 2–32 parent lines from an ancestral population with short- or long-range linkage disequilibrium (LDA and subjected to multiple cycles of GS. We determined rg , g ^ and genetic gain across 30 cycles for different training set (TS sizes, marker densities, and generations of recombination before model training. Contributions to rg , g ^ and genetic gain from pedigree relationships, as well as from cosegregation and LDA between QTL and markers, were analyzed via four scenarios differing in (i the relatedness between TS and selection candidates and (ii whether selection was based on markers or pedigree records. Persistency of rg , g ^ was high for small Np , where predominantly cosegregation contributed to rg , g ^ , but also for large Np , where LDA replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing Np > 4, given long-range LDA in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to rg , g ^ for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger TS size (NTS and higher marker density improved persistency of rg , g ^ and hence genetic gain, but additional recombinations could not increase genetic gain.
Washietl, Stefan; Pedersen, Jakob Skou; Korbel, Jan O
characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding......Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack...... and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions...
Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L
In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni genome. The largest proportion of the 493 SNPs was on porcine chromosome (SSC) 7, SSC2, SSC8 and SSC18 for a total of 204 haploblocks. Functional annotations of genomic regions, including the 493 shifted SNPs, reported a few Gene Ontology terms that might underly the biological processes that contributed to increase performances of the pigs over the 20 years of the selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population.
Full Text Available Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP. This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross resulted in small haplotype blocks (HB with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS, were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50% of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284 and intronic regions (169 with the least in exon's (4, suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a, excitatory receptors (Grin2a, Gria3, Grip1, neurotransmitters (Pomc, and synapses (Snap29. This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits.
Lo, Chiao-Ling; Lossie, Amy C; Liang, Tiebing; Liu, Yunlong; Xuei, Xiaoling; Lumeng, Lawrence; Zhou, Feng C; Muir, William M
Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits.
Lai, Y. H.; He, Q. L. [Nano Science and Nano Technology Program, The Hong Kong University of Science and Technology, HKSAR, People' s Republic of China (China); Department of Physics and William Mong Institute of Nano Science and Technology, The Hong Kong University of Science and Technology, HKSAR, People' s Republic of China (China); Cheung, W. Y.; Lok, S. K.; Wong, K. S.; Sou, I. K. [Department of Physics and William Mong Institute of Nano Science and Technology, The Hong Kong University of Science and Technology, HKSAR, People' s Republic of China (China); Ho, S. K. [Faculty of Science and Technology, University of Macau, Macau, People' s Republic of China (China); Tam, K. W. [Department of Electrical and Electronics Engineering, University of Macau, Macau, People' s Republic of China (China)
Molecular beam epitaxy grown MgS on GaAs(111)B substrate was resulted in wurtzite phase, as demonstrated by detailed structural characterizations. Phenomenological arguments were used to account for why wurtzite phase is preferred over zincblende phase or its most stable rocksalt phase. Results of photoresponse and reflectance measurements performed on wurtzite MgS photodiodes suggest a direct bandgap at around 5.1 eV. Their response peaks at 245 nm with quantum efficiency of 9.9% and enjoys rejection of more than three orders at 320 nm and close to five orders at longer wavelengths, proving the photodiodes highly competitive in solar-blind ultraviolet detection.
Giacani, Lorenzo; Chattopadhyay, Sujay; Centurion-Lara, Arturo; Jeffrey, Brendan M; Le, Hoavan T; Molini, Barbara J; Lukehart, Sheila A; Sokurenko, Evgeni V; Rockey, Daniel D
In the rabbit model of syphilis, infection phenotypes associated with the Nichols and Chicago strains of Treponema pallidum (T. pallidum), though similar, are not identical. Between these strains, significant differences are found in expression of, and antibody responses to some candidate virulence factors, suggesting the existence of functional genetic differences between isolates. The Chicago strain genome was therefore sequenced and compared to the Nichols genome, available since 1998. Initial comparative analysis suggested the presence of 44 single nucleotide polymorphisms (SNPs), 103 small (≤3 nucleotides) indels, and 1 large (1204 bp) insertion in the Chicago genome with respect to the Nichols genome. To confirm the above findings, Sanger sequencing was performed on most loci carrying differences using DNA from Chicago and the Nichols strain used in the original T. pallidum genome project. A majority of the previously identified differences were found to be due to errors in the published Nichols genome, while the accuracy of the Chicago genome was confirmed. However, 20 SNPs were confirmed between the two genomes, and 16 (80.0%) were found in coding regions, with all being of non-synonymous nature, strongly indicating action of positive selection. Sequencing of 16 genomic loci harboring SNPs in 12 additional T. pallidum strains, (SS14, Bal 3, Bal 7, Bal 9, Sea 81-3, Sea 81-8, Sea 86-1, Sea 87-1, Mexico A, UW231B, UW236B, and UW249C), was used to identify "Chicago-" or "Nichols -specific" differences. All but one of the 16 SNPs were "Nichols-specific", with Chicago having identical sequences at these positions to almost all of the additional strains examined. These mutations could reflect differential adaptation of the Nichols strain to the rabbit host or pathoadaptive mutations acquired during human infection. Our findings indicate that SNPs among T. pallidum strains emerge under positive selection and, therefore, are likely to be functional in nature.
Mateu, I., E-mail: firstname.lastname@example.org [Université de Toulouse, UPS-OMP, IRAP, Toulouse (France); CNRS, IRAP, 9 Av. colonel Roche, BP 44346, F-31028 Toulouse cedex 4 (France); Medina, P., E-mail: email@example.com [IPHC, IN2P3 – CNRS/Université Louis Pasteur, 23 rue du Loess, PB28, Strasbourg Cedex 2, F67037 (France); Roques, J.P., E-mail: firstname.lastname@example.org [Université de Toulouse, UPS-OMP, IRAP, Toulouse (France); CNRS, IRAP, 9 Av. colonel Roche, BP 44346, F-31028 Toulouse cedex 4 (France); Jourdain, E., E-mail: email@example.com [Université de Toulouse, UPS-OMP, IRAP, Toulouse (France); CNRS, IRAP, 9 Av. colonel Roche, BP 44346, F-31028 Toulouse cedex 4 (France)
This paper aims to present Multi geometry Simulation (MGS), a software intended for the characterization of the signal response of solid state detectors. Its main feature is the calculation of the pulse shapes induced at the electrodes of the detector by a photon–semiconductor interaction occurring at a specific position inside the detector volume. The program uses numerical methods to simulate the drift of the charge carriers generated by the interaction, as the movement of these particles induces the useful signal for detection to the electrodes. After the description of the tool fundamentals, an example of application is presented where MGS was used for simulating a High Purity Germanium (HPGe) double sided strip detector conceived for hard X-ray astronomy. Simulated and measured pulse shapes are compared for interactions occurring at different depths in the detector volume. The comparison focuses on the difference in time of arrival between the anode and cathode pulses, as this measure allows, together with the X/Y information retrieved from the strips, a 3D determination of the photon interaction point, which is an important feature of the detector. A good matching between simulations and measurements is obtained, with a discrepancy less than 0.5 mm between the measured and the simulated depth of the interaction, for an 11 mm thick detector. -- Highlights: • Description of MGS, a tool for the synthesis of the signal response of solid state detectors. • Validation of the simulator through comparison with measurements on a DSSD prototype. • Discussion on the advantages, drawbacks and possible evolutions of MGS.
Full Text Available Abstract Background The increasing number of genomic sequences of bacteria makes it possible to select unique SNPs of a particular strain/species at the whole genome level and thus design specific primers based on the SNPs. The high similarity of genomic sequences among phylogenetically-related bacteria requires the identification of the few loci in the genome that can serve as unique markers for strain differentiation. PrimerSNP attempts to identify reliable strain-specific markers, on which specific primers are designed for pathogen detection purpose. Results PrimerSNP is an online tool to design primers based on strain specific SNPs for multiple strains/species of microorganisms at the whole genome level. The allele-specific primers could distinguish query sequences of one strain from other homologous sequences by standard PCR reaction. Additionally, PrimerSNP provides a feature for designing common primers that can amplify all the homologous sequences of multiple strains/species of microorganisms. PrimerSNP is freely available at http://cropdisease.ars.usda.gov/~primer. Conclusion PrimerSNP is a high-throughput specific primer generation tool for the differentiation of phylogenetically-related strains/species. Experimental validation showed that this software had a successful prediction rate of 80.4 – 100% for strain specific primer design.
Fleming, Damarius S; Weigend, Steffen; Simianer, Henner; Weigend, Annett; Rothschild, Max; Schmidt, Carl; Ashwell, Chris; Persia, Mike; Reecy, James; Lamont, Susan J
Global climate change is increasing the magnitude of environmental stressors, such as temperature, pathogens, and drought, that limit the survivability and sustainability of livestock production. Poultry production and its expansion is dependent upon robust animals that are able to cope with stressors in multiple environments. Understanding the genetic strategies that indigenous, noncommercial breeds have evolved to survive in their environment could help to elucidate molecular mechanisms underlying biological traits of environmental adaptation. We examined poultry from diverse breeds and climates of Africa and Northern Europe for selection signatures that have allowed them to adapt to their indigenous environments. Selection signatures were studied using a combination of population genomic methods that employed FST , integrated haplotype score (iHS), and runs of homozygosity (ROH) procedures. All the analyses indicated differences in environment as a driver of selective pressure in both groups of populations. The analyses revealed unique differences in the genomic regions under selection pressure from the environment for each population. The African chickens showed stronger selection toward stress signaling and angiogenesis, while the Northern European chickens showed more selection pressure toward processes related to energy homeostasis. The results suggest that chromosomes 2 and 27 are the most diverged between populations and the most selected upon within the African (chromosome 27) and Northern European (chromosome 2) birds. Examination of the divergent populations has provided new insight into genes under possible selection related to tolerance of a population's indigenous environment that may be baselines for examining the genomic contribution to tolerance adaptions. Copyright © 2017 Fleming et al.
Caron, Nicholas; Tokaryk, Dennis W.; Adam, Allan G.; Linton, Colan
The spectra of some astrophysical sources contain signatures from molecules containing magnesium or sulphur atoms. Therefore, we have extended previous studies of the diatomic molecule MgS, which is a possible candidate for astrophysical detection. Microwave spectra of X^1Σ^+ , the ground electronic state, were reported in 1989 and 1997, and the B^1Σ^+-X^1Σ^+ electronic absorption spectrum in the blue was last studied in 1970. We have investigated the B^1Σ^+-X^1Σ^+ 0-0 spectrum of MgS at high resolution under jet-cooled conditions in a laser-ablation molecular source, and have obtained laser-induced fluorescence spectra from four isotopologues. Dispersed fluorescence from this source identified the low-lying A^1Π state near 4520 wn. We also created MgS in a Broida oven, with the help of a stream of activated nitrogen, and took rotationally resolved dispersed fluorescence spectra of the B^1Σ^+-A^1Π transition with a grating spectrometer by laser excitation of individual rotational levels of the B^1Σ^+ state via the B^1Σ^+-X^1Σ^+ transition. These spectra provide a first observation and analysis of the A^1Π state. S. Takano, S. Yamamoto and S. Saito, Chem. Phys. Lett. 159, 563-566 (1989) K. A. Walker and M. C. L. Gerry, J. Mol. Spectrosc 182, 178-183 (1997) M. Marcano and R. F. Barrow, Trans. Faraday Soc. 66, 2936-2938 (1970)
Cushing, G.E.; Titus, T.N.
Temperatures of the Arsia Mons caldera floor and two nearby control areas were obtained by the Mars Global Surveyor (MGS) Thermal Emission Spectrometer (TES). These observations revealed that the Arsia Mons caldera floor exhibits thermal behavior different from the surrounding Tharsis region when compared with thermal models. Our technique compares modeled and observed data to determine best fit values of thermal inertia, layer depth, and albedo. Best fit modeled values are accurate in the two control regions, but those in the Arsia Mons' caldera are consistently either up to 15 K warmer than afternoon observations, or have albedo values that are more than two standard deviations higher than the observed mean. Models of both homogeneous and layered (such as dust over bedrock) cases were compared, with layered-cases indicating a surface layer at least thick enough to insulate itself from diurnal effects of an underlying substrate material. Because best fit models of the caldera floor poorly match observations, it is likely that the caldera floor experiences some physical process not incorporated into our thermal model. Even on Mars, Arsia Mons is an extreme environment where CO2 condenses upon the caldera floor every night, diurnal temperatures range each day by a factor of nearly 2, and annual average atmospheric pressure is only around one millibar. Here, we explore several possibilities that may explain the poor modeled fits to caldera floor and conclude that temperature dependent thermal conductivity may cause thermal inertia to vary diurnally, and this effect may be exaggerated by presence of water-ice clouds, which occur frequently above Arsia Mons. Copyright 2008 by the American Geophysical Union.
Ferchaud, Anne-Laure; Hansen, Michael M
Heterogeneous genomic divergence between populations may reflect selection, but should also be seen in conjunction with gene flow and drift, particularly population bottlenecks. Marine and freshwater three-spine stickleback (Gasterosteus aculeatus) populations often exhibit different lateral armour plate morphs. Moreover, strikingly parallel genomic footprints across different marine-freshwater population pairs are interpreted as parallel evolution and gene reuse. Nevertheless, in some geographic regions like the North Sea and Baltic Sea, different patterns are observed. Freshwater populations in coastal regions are often dominated by marine morphs, suggesting that gene flow overwhelms selection, and genomic parallelism may also be less pronounced. We used RAD sequencing for analysing 28 888 SNPs in two marine and seven freshwater populations in Denmark, Europe. Freshwater populations represented a variety of environments: river populations accessible to gene flow from marine sticklebacks and large and small isolated lakes with and without fish predators. Sticklebacks in an accessible river environment showed minimal morphological and genomewide divergence from marine populations, supporting the hypothesis of gene flow overriding selection. Allele frequency spectra suggested bottlenecks in all freshwater populations, and particularly two small lake populations. However, genomic footprints ascribed to selection could nevertheless be identified. No genomic regions were consistent freshwater-marine outliers, and parallelism was much lower than in other comparable studies. Two genomic regions previously described to be under divergent selection in freshwater and marine populations were outliers between different freshwater populations. We ascribe these patterns to stronger environmental heterogeneity among freshwater populations in our study as compared to most other studies, although the demographic history involving bottlenecks should also be considered in the
Greco, Ermanno; Bono, Sara; Ruberti, Alessandra; Lobascio, Anna Maria; Greco, Pierfrancesco; Biricik, Anil; Spizzichino, Letizia; Greco, Alessia; Tesarik, Jan; Minasi, Maria Giulia; Fiorentino, Francesco
The aim of this study is to determine if the use of preimplantation genetic screening (PGS) by array comparative genomic hybridization (array CGH) and transfer of a single euploid blastocyst in patients with repeated implantation failure (RIF) can improve clinical results. Three patient groups are compared: 43 couples with RIF for whom embryos were selected by array CGH (group RIF-PGS), 33 couples with the same history for whom array CGH was not performed (group RIF NO PGS), and 45 good prognosis infertile couples with array CGH selected embryos (group NO RIF PGS). A single euploid blastocyst was transferred in groups RIF-PGS and NO RIF PGS. Array CGH was not performed in group RIF NO PGS in which 1-2 blastocysts were transferred. One monoembryonic sac with heartbeat was found in 28 patients of group RIF PGS and 31 patients of group NO RIF PGS showing similar clinical pregnancy and implantation rates (68.3% and 70.5%, resp.). In contrast, an embryonic sac with heartbeat was only detected in 7 (21.2%) patients of group RIF NO PGS. In conclusion, PGS by array CGH with single euploid blastocyst transfer appears to be a successful strategy for patients with multiple failed IVF attempts. PMID:24779011
James, P; Halladay, J; Craig, E A
The two-hybrid system is a powerful technique for detecting protein-protein interactions that utilizes the well-developed molecular genetics of the yeast Saccharomyces cerevisiae. However, the full potential of this technique has not been realized due to limitations imposed by the components available for use in the system. These limitations include unwieldy plasmid vectors, incomplete or poorly designed two-hybrid libraries, and host strains that result in the selection of large numbers of false positives. We have used a novel multienzyme approach to generate a set of highly representative genomic libraries from S. cerevisiae. In addition, a unique host strain was created that contains three easily assayed reporter genes, each under the control of a different inducible promoter. This host strain is extremely sensitive to weak interactions and eliminates nearly all false positives using simple plate assays. Improved vectors were also constructed that simplify the construction of the gene fusions necessary for the two-hybrid system. Our analysis indicates that the libraries and host strain provide significant improvements in both the number of interacting clones identified and the efficiency of two-hybrid selections.
Haas, Y de; Windig, J J; Calus, M P L; Dijkstra, J; Haan, M de; Bannink, A; Veerkamp, R F
Mitigation of enteric methane (CH₄) emission in ruminants has become an important area of research because accumulation of CH₄ is linked to global warming. Nutritional and microbial opportunities to reduce CH₄ emissions have been extensively researched, but little is known about using natural variation to breed animals with lower CH₄ yield. Measuring CH₄ emission rates directly from animals is difficult and hinders direct selection on reduced CH₄ emission. However, improvements can be made through selection on associated traits (e.g., residual feed intake, RFI) or through selection on CH₄ predicted from feed intake and diet composition. The objective was to establish phenotypic and genetic variation in predicted CH₄ output, and to determine the potential of genetics to reduce methane emissions in dairy cattle. Experimental data were used and records on daily feed intake, weekly body weights, and weekly milk production were available from 548 heifers. Residual feed intake (MJ/d) is the difference between net energy intake and calculated net energy requirements for maintenance as a function of body weight and for fat- and protein-corrected milk production. Predicted methane emission (PME; g/d) is 6% of gross energy intake (Intergovernmental Panel on Climate Change methodology) corrected for energy content of methane (55.65 kJ/g). The estimated heritabilities for PME and RFI were 0.35 and 0.40, respectively. The positive genetic correlation between RFI and PME indicated that cows with lower RFI have lower PME (estimates ranging from 0.18 to 0.84). Hence, it is possible to decrease the methane production of a cow by selecting more-efficient cows, and the genetic variation suggests that reductions in the order of 11 to 26% in 10 yr are theoretically possible, and could be even higher in a genomic selection program. However, several uncertainties are discussed; for example, the lack of true methane measurements (and the key assumption that methane
Mourier, Tobias; Willerslev, Eske
BACKGROUND: Eukaryotic genomes are scattered with retroelements that proliferate through retrotransposition. Although retroelements make up around 40 percent of the human genome, large regions are found to be completely devoid of retroelements. This has been hypothesised to be a result of genomic...
Frantz, Laurent A F; Schraiber, Joshua G; Madsen, Ole; Megens, Hendrik-Jan; Cagan, Alex; Bosse, Mirte; Paudel, Yogesh; Crooijmans, Richard P M A; Larson, Greger; Groenen, Martien A M
Traditionally, the process of domestication is assumed to be initiated by humans, involve few individuals and rely on reproductive isolation between wild and domestic forms. We analyzed pig domestication using over 100 genome sequences and tested whether pig domestication followed a traditional linear model or a more complex, reticulate model. We found that the assumptions of traditional models, such as reproductive isolation and strong domestication bottlenecks, are incompatible with the genetic data. In addition, our results show that, despite gene flow, the genomes of domestic pigs have strong signatures of selection at loci that affect behavior and morphology. We argue that recurrent selection for domestic traits likely counteracted the homogenizing effect of gene flow from wild boars and created 'islands of domestication' in the genome. Our results have major ramifications for the understanding of animal domestication and suggest that future studies should employ models that do not assume reproductive isolation.
Full Text Available General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1, homologues of human genes involved in adaptations (e.g. alpha-amylase genes or in genetic diseases (e.g. Huntingtin and Parkin. Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice
Annie N. Cowell
Full Text Available Whole-genome sequencing (WGS of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA, which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens.
Loy, Dorothy E.; Sundararaman, Sesh A.; Valdivia, Hugo; Fisch, Kathleen; Lescano, Andres G.; Baldeviano, G. Christian; Durand, Salomon; Gerbasi, Vince; Sutherland, Colin J.; Nolder, Debbie; Vinetz, Joseph M.; Hahn, Beatrice H.
ABSTRACT Whole-genome sequencing (WGS) of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA), which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA) can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP) characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens. PMID:28174312
Jancek, Séverine; Bézier, Annie; Gayral, Philippe; Paillusson, Corentin; Kaiser, Laure; Dupas, Stéphane; Le Ru, Bruno Pierre; Barbe, Valérie; Periquet, Georges; Drezen, Jean-Michel; Herniou, Elisabeth A
The geographic mosaic of coevolution predicts parasite virulence should be locally adapted to the host community. Cotesia parasitoid wasps adapt to local lepidopteran species possibly through their symbiotic bracovirus. The virus, essential for the parasitism success, is at the heart of the complex coevolutionary relationship linking the wasps and their hosts. The large segmented genome contained in the virus particles encodes virulence genes involved in host immune and developmental suppression. Coevolutionary arms race should result in the positive selection of particular beneficial alleles. To understand the global role of bracoviruses in the local adaptation or specialization of parasitoid wasps to their hosts, we studied the molecular evolution of four bracoviruses associated with wasps of the genus Cotesia, including C congregata, C vestalis and new data and annotation on two ecologically differentiated populations of C sesamie, Kitale and Mombasa. Paired orthologs analyses revealed more genes under positive selection when comparing the two C sesamiae bracoviruses belonging to the same species, and more genes under strong evolutionary constraint between species. Furthermore branch-site evolutionary models showed that 17 genes, out of the 54 currently available shared by the four bracoviruses, harboured sites under positive selection including: the histone H4-like, a C-type lectin, two ep1-like, ep2, a viral ankyrin, CrV1, a ben-domain, a Serine-rich, and eight unknown genes. Lastly the phylogenetic analyses of the histone, ep2 and CrV1 genes in different African C sesamiae populations showed that each gene described differently the individual relationships. In particular we found recombination had happened between the ep2 and CrV1 genes, which are localized 37.5 kb apart on the wasp chromosomes. Involved in multidirectional coevolutionary interactions, C sesamiae wasps rely on different bracovirus mediated molecular pathways to overcome local host resistance.
Full Text Available The geographic mosaic of coevolution predicts parasite virulence should be locally adapted to the host community. Cotesia parasitoid wasps adapt to local lepidopteran species possibly through their symbiotic bracovirus. The virus, essential for the parasitism success, is at the heart of the complex coevolutionary relationship linking the wasps and their hosts. The large segmented genome contained in the virus particles encodes virulence genes involved in host immune and developmental suppression. Coevolutionary arms race should result in the positive selection of particular beneficial alleles. To understand the global role of bracoviruses in the local adaptation or specialization of parasitoid wasps to their hosts, we studied the molecular evolution of four bracoviruses associated with wasps of the genus Cotesia, including C congregata, C vestalis and new data and annotation on two ecologically differentiated populations of C sesamie, Kitale and Mombasa. Paired orthologs analyses revealed more genes under positive selection when comparing the two C sesamiae bracoviruses belonging to the same species, and more genes under strong evolutionary constraint between species. Furthermore branch-site evolutionary models showed that 17 genes, out of the 54 currently available shared by the four bracoviruses, harboured sites under positive selection including: the histone H4-like, a C-type lectin, two ep1-like, ep2, a viral ankyrin, CrV1, a ben-domain, a Serine-rich, and eight unknown genes. Lastly the phylogenetic analyses of the histone, ep2 and CrV1 genes in different African C sesamiae populations showed that each gene described differently the individual relationships. In particular we found recombination had happened between the ep2 and CrV1 genes, which are localized 37.5 kb apart on the wasp chromosomes. Involved in multidirectional coevolutionary interactions, C sesamiae wasps rely on different bracovirus mediated molecular pathways to overcome
Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to intestinal nematodes. In this study, we performed a large sca...
Singh, Nitin K; Blachowicz, Adriana; Romsdahl, Jillian; Wang, Clay; Torok, Tamas; Venkateswaran, Kasthuri
The whole-genome sequences of eight fungal strains that were selected for exposure to microgravity at the International Space Station are presented here. These baseline sequences will help to understand the observed production of novel bioactive compounds. Copyright © 2017 Singh et al.
Prior to implementation of genomic selection, an evaluation of the potential accuracy of prediction can be obtained by cross validation. In this procedure, a population with both phenotypes and genotypes is split into training and validation sets. The prediction model is fitted using the training se...
Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio
The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. In Escherichia coli, we have established a sequence pattern that distinguishes regulatory from nonregulatory regions. The density of promoter-like sequences, that are recognizable by RNA polymerase and may function as potential promoters, is high within regulatory regions, in contrast to coding regions and regions located between convergently-transcribed genes. Moreover, functional promoter sites identified experimentally are often found in the subregions of highest density of promoter-like signals, even when individual sites with higher binding affinity for RNA polymerase exist elsewhere within the regulatory region. In order to investigate the generality of this pattern, we have used position weight matrices describing the -35 and -10 promoter boxes of E. coli to search for these motifs in 43 additional genomes belonging to most established bacterial phyla, after specific calibration of the matrices according to the base composition of the noncoding regions of each genome. We have found that all bacterial species analyzed contain similar promoter-like motifs, and that, in most cases, these motifs follow the same genomic distribution observed in E. coli. Differential densities between regulatory and nonregulatory regions are detectable in most bacterial genomes, with the exception of those that have experienced evolutionary extreme genome reduction. Thus, the phylogenetic distribution of this pattern mirrors that of genes and other genomic features that require weak selection to be effective in order to persist. On this basis, we suggest that the loss of differential densities in the reduced genomes of host-restricted pathogens and symbionts is the outcome of a process of genome degradation resulting from the decreased efficiency of purifying selection in highly structured small populations. This implies that the differential
Thomas Merle B
Full Text Available Abstract Background The goal of genome wide analyses of polymorphisms is to achieve a better understanding of the link between genotype and phenotype. Part of that goal is to understand the selective forces that have operated on a population. Results In this study we compared the signals of selection, identified through population divergence in the Bovine HapMap project, to those found in an independent sample of cattle from Australia. Evidence for population differentiation across the genome, as measured by FST, was highly correlated in the two data sets. Nevertheless, 40% of the variance in FST between the two studies was attributed to the differences in breed composition. Seventy six percent of the variance in FST was attributed to differences in SNP composition and density when the same breeds were compared. The difference between FST of adjacent loci increased rapidly with the increase in distance between SNP, reaching an asymptote after 20 kb. Using 129 SNP that have highly divergent FST values in both data sets, we identified 12 regions that had additive effects on the traits residual feed intake, beef yield or intramuscular fatness measured in the Australian sample. Four of these regions had effects on more than one trait. One of these regions includes the R3HDM1 gene, which is under selection in European humans. Conclusion Firstly, many different populations will be necessary for a full description of selective signatures across the genome, not just a small set of highly divergent populations. Secondly, it is necessary to use the same SNP when comparing the signatures of selection from one study to another. Thirdly, useful signatures of selection can be obtained where many of the groups have only minor genetic differences and may not be clearly separated in a principal component analysis. Fourthly, combining analyses of genome wide selection signatures and genome wide associations to traits helps to define the trait under selection or
Chandonia, John-Marc; Kim, Sung-Hou; Brenner, Steven E.
At the Berkeley Structural Genomics Center (BSGC), our goalis to obtain a near-complete structural complement of proteins in theminimal organisms Mycoplasma genitalium and M. pneumoniae, two closelyrelated pathogens. Current targets for structure determination have beenselected in six major stages, starting with those predicted to be mosttractable to high throughput study and likely to yield new structuralinformation. We report on the process used to select these proteins, aswell as our target deselection procedure. Target deselection reducesexperimental effort by eliminating targets similar to those recentlysolved by the structural biology community or other centers. We measurethe impact of the 69 structures solved at the BSGC as of July 2004 onstructure prediction coverage of the M. pneumoniae and M. genitaliumproteomes. The number of Mycoplasma proteins for which thefold couldfirst be reliably assigned based on structures solved at the BSGC (24 M.pneumoniae and 21 M. genitalium) is approximately 25 percent of the totalresulting from work at all structural genomics centers and the worldwidestructural biology community (94 M. pneumoniae and 86M. genitalium)during the same period. As the number of structures contributed by theBSGC during that period is less than 1 percent of the total worldwideoutput, the benefits of a focused target selection strategy are apparent.If the structures of all current targets were solved, the percentage ofM. pneumoniae proteins for which folds could be reliably assigned wouldincrease from approximately 57 percent (391 of 687) at present to around80 percent (550 of 687), and the percentage of the proteome that could beaccurately modeled would increase from around 37 percent (254 of 687) toabout 64 percent (438 of 687). In M. genitalium, the percentage of theproteome that could be structurally annotated based on structures of ourremaining targets would rise from 72 percent (348 of 486) to around 76percent (371 of 486), with the
Hansen, Gary B.
A detailed analysis of data from one revolution of the Mars Global Surveyor (MGS) is presented. Approximately 80% of this revolution observes the mid-winter northern seasonal polar cap, which covers the surface to night. The surface composition and temperature are determined through analysis of 6-50 μm infrared spectra from the Thermal Emission Spectrometer (TES). The infrared radiative balance, which is the entire heat balance in the polar night except for small subsurface and atmospheric advection terms, is calculated for the surface and atmospheric column. The primary constituent, CO2 ice, also dominates the infrared spectral properties by variations in its grain size and by admixtures of dust and water ice, which cause large variations in the 20-50 μm emissivity. This is modified by incomplete areal coverage, and clouds or hazes. This quantitative analysis reveals CO2 grain radii ranging from ˜100 μm in isolated areas, to 1-5 mm in more widespread regions. The water ice content varies from none to about one part per thousand by mass, with a clear increase towards the periphery of the polar cap. The dust content is typically a few parts per thousand by mass, but is as much as an order of magnitude less abundant in "cold spot" regions, where the low emissivity of pure CO2 ice is revealed. This is the first quantitative analysis of thermal spectra of the seasonal polar cap and the first to estimate water ice content. Our models show that the cold spots represent cleaner, dust-free ice rather than finer grained ice than the background. Our guess is that the dust in cold spots is hidden in the center of the CO2 frost particles rather than not present. The fringes of the cap have more dust and water ice, and become patchy, with warmer water snow filling the gaps on the night side, and warmer bare soil on the day side. A low optical depth (night side, and appears with smaller optical depth on the day side. The infrared radiative balance at the surface is typically
Full Text Available Since the time of their domestication, goats (Capra hircus have evolved in a large variety of locally adapted populations in response to different human and environmental pressures. In the present era, many indigenous populations are threatened with extinction due to their substitution by cosmopolitan breeds, while they might represent highly valuable genomic resources. It is thus crucial to characterize the neutral and adaptive genetic diversity of indigenous populations. A fine characterization of whole genome variation in farm animals is now possible by using new sequencing technologies. We sequenced the complete genome at 12X coverage of 44 goats geographically representative of the three phenotypically distinct indigenous populations in Morocco. The study of mitochondrial genomes showed a high diversity exclusively restricted to the haplogroup A. The 44 nuclear genomes showed a very high diversity (24 million variants associated with low linkage disequilibrium. The overall genetic diversity was weakly structured according to geography and phenotypes. When looking for signals of positive selection in each population we identified many candidate genes, several of which gave insights into the metabolic pathways or biological processes involved in the adaptation to local conditions (e.g. panting in warm/desert conditions. This study highlights the interest of WGS data to characterize livestock genomic diversity. It illustrates the valuable genetic richness present in indigenous populations that have to be sustainably managed and may represent valuable genetic resources for the long-term preservation of the species.
Benjelloun, Badr; Alberto, Florian J; Streeter, Ian; Boyer, Frédéric; Coissac, Eric; Stucki, Sylvie; BenBati, Mohammed; Ibnelbachyr, Mustapha; Chentouf, Mouad; Bechchari, Abdelmajid; Leempoel, Kevin; Alberti, Adriana; Engelen, Stefan; Chikhi, Abdelkader; Clarke, Laura; Flicek, Paul; Joost, Stéphane; Taberlet, Pierre; Pompanon, François
Since the time of their domestication, goats (Capra hircus) have evolved in a large variety of locally adapted populations in response to different human and environmental pressures. In the present era, many indigenous populations are threatened with extinction due to their substitution by cosmopolitan breeds, while they might represent highly valuable genomic resources. It is thus crucial to characterize the neutral and adaptive genetic diversity of indigenous populations. A fine characterization of whole genome variation in farm animals is now possible by using new sequencing technologies. We sequenced the complete genome at 12× coverage of 44 goats geographically representative of the three phenotypically distinct indigenous populations in Morocco. The study of mitochondrial genomes showed a high diversity exclusively restricted to the haplogroup A. The 44 nuclear genomes showed a very high diversity (24 million variants) associated with low linkage disequilibrium. The overall genetic diversity was weakly structured according to geography and phenotypes. When looking for signals of positive selection in each population we identified many candidate genes, several of which gave insights into the metabolic pathways or biological processes involved in the adaptation to local conditions (e.g., panting in warm/desert conditions). This study highlights the interest of WGS data to characterize livestock genomic diversity. It illustrates the valuable genetic richness present in indigenous populations that have to be sustainably managed and may represent valuable genetic resources for the long-term preservation of the species.
Zeng, C; Kouprina, N; Zhu, B; Cairo, A; Hoek, M; Cross, G; Osoegawa, K; Larionov, V; de Jong, P
We constructed representative large-insert bacterial artificial chromosome (BAC) libraries of two human pathogens (Trypanosoma brucei and Giardia lamblia) using a new hybrid vector, pTARBAC1, containing a yeast artificial chromosome (YAC) cassette (a yeast selectable marker and a centromere). The cassette allows transferring of BACs into yeast for their further modification. Furthermore, the new hybrid vector provides the opportunity to re-isolate each DNA insert without construction of a new library of random clones. Digestion of a BAC DNA by an endonuclease that has no recognition site in the vector, but which deletes most of the internal insert sequence and leaves the unique flanking sequences, converts a BAC into a TAR vector, thus allowing direct gene isolation. Cotransformation of a TAR vector and genomic DNA into yeast spheroplasts, and subsequent recombination between the TAR vector's flanking ends and a specific genomic fragment, allows rescue of the fragment as a circular YAC/BAC molecule. Here we prove a new cloning strategy by re-isolation of randomly chosen genomic fragments of different size from T. brucei cloned in BACs. We conclude that genomic regions of unicellular eukaryotes can be easily re-isolated using this technique, which provides an opportunity to study evolution of these genomes and the role of genome instability in pathogenicity.
Shields, Alexandra E; Crown, William H
Objective To extend recent conceptual and methodological advances in disparities research to include the incorporation of genomic information in analyses of racial/ethnic disparities in health care and health outcomes. Data Sources Published literature on human genetic variation, the role of genetics in disease and response to treatment, and methodological developments in disparities research. Study Design We present a conceptual framework for incorporating genomic information into the Institute of Medicine definition of racial/ethnic disparities in health care, identify key concepts used in disparities research that can be informed by genomics research, and illustrate the incorporation of genomic information into current methods using the example of HER-2 mutations guiding care for breast cancer. Principal Findings Genomic information has not yet been incorporated into disparities research, though it has direct relevance to concepts of race/ethnicity, health status, appropriate care, and socioeconomic status. The HER-2 example demonstrates how available genetic information can be incorporated into current disparities methods to reduce selection bias and measurement error. Advances in health information infrastructure may soon make standardized genetic information more available to health services researchers. Conclusion Genomic information can refine measurement of racial/ethnic disparities in health care and health outcomes and should be included wherever possible in disparities research. PMID:22515190
Full Text Available Although complex diseases and traits are thought to have multifactorial genetic basis, the common methods in genome-wide association analyses test each variant for association independent of the others. This computational simplification may lead to reduced power to identify variants with small effect sizes and requires correcting for multiple hypothesis tests with complex relationships. However, advances in computational methods and increase in computational resources are enabling the computation of models that adhere more closely to the theory of multifactorial inheritance. Here, a Bayesian variable selection and model averaging approach is formulated for searching for additive and dominant genetic effects. The approach considers simultaneously all available variants for inclusion as predictors in a linear genotype-phenotype mapping and averages over the uncertainty in the variable selection. This leads to naturally interpretable summary quantities on the significances of the variants and their contribution to the genetic basis of the studied trait. We first characterize the behavior of the approach in simulations. The results indicate a gain in the causal variant identification performance when additive and dominant variation are simulated, with a negligible loss of power in purely additive case. An application to the analysis of high- and low-density lipoprotein cholesterol levels in a dataset of 3895 Finns is then presented, demonstrating the feasibility of the approach at the current scale of single-nucleotide polymorphism data. We describe a Markov chain Monte Carlo algorithm for the computation and give suggestions on the specification of prior parameters using commonly available prior information. An open-source software implementing the method is available at http://www.lce.hut.fi/research/mm/bmagwa/ and https://github.com/to-mi/.
Ma, Yan-Ping; Ke, Hao; Liang, Zhi-Ling; Liu, Zhen-Xing; Hao, Le; Ma, Jiang-Yao; Li, Yu-Gu
Streptococcus agalactiae is an important human and animal pathogen. To better understand the genetic features and evolution of S. agalactiae, multiple factors influencing synonymous codon usage patterns in S. agalactiae were analyzed in this study. A- and U-ending rich codons were used in S. agalactiae function genes through the overall codon usage analysis, indicating that Adenine (A)/Thymine (T) compositional constraints might contribute an important role to the synonymous codon usage pattern. The GC3% against the effective number of codon (ENC) value suggested that translational selection was the important factor for codon bias in the microorganism. Principal component analysis (PCA) showed that (i) mutational pressure was the most important factor in shaping codon usage of all open reading frames (ORFs) in the S. agalactiae genome; (ii) strand specific mutational bias was not capable of influencing the codon usage bias in the leading and lagging strands; and (iii) gene length was not the important factor in synonymous codon usage pattern in this organism. Additionally, the high correlation between tRNA adaptation index (tAI) value and codon adaptation index (CAI), frequency of optimal codons (Fop) value, reinforced the role of natural selection for efficient translation in S. agalactiae. Comparison of synonymous codon usage pattern between S. agalactiae and susceptible hosts (human and tilapia) showed that synonymous codon usage of S. agalactiae was independent of the synonymous codon usage of susceptible hosts. The study of codon usage in S. agalactiae may provide evidence about the molecular evolution of the bacterium and a greater understanding of evolutionary relationships between S. agalactiae and its hosts.
Full Text Available Genomic selection (GS is a breeding tool that estimates breeding values (GEBVs of individuals based solely on marker data by using a model built using phenotypic and marker data from a training population (TP. The effectiveness of GS increases as the correlation of GEBVs and phenotypes (accuracy increases. Using phenotypic and genotypic data from a TP of 470 soft winter wheat lines, we assessed the accuracy of GS for grain yield, Fusarium Head Blight (FHB resistance, softness equivalence (SE, and flour yield (FY. Four TP data sampling schemes were tested: (1 use all TP data, (2 use subsets of TP lines with low genotype-by-environment interaction, (3 use subsets of markers significantly associated with quantitative trait loci (QTL, and (4 a combination of 2 and 3. We also correlated the phenotypes of relatives of the TP to their GEBVs calculated from TP data. The GS accuracy within the TP using all TP data ranged from 0.35 (FHB to 0.62 (FY. On average, the accuracy of GS from using subsets of data increased by 54% relative to using all TP data. Using subsets of markers selected for significant association with the target trait had the greatest impact on GS accuracy. Between-environment prediction accuracy was also increased by using data subsets. The accuracy of GS when predicting the phenotypes of TP relatives ranged from 0.00 to 0.85. These results suggest that GS could be useful for these traits and GS accuracy can be greatly improved by using subsets of TP data.
Full Text Available The human immune system functions to provide continuous body-wide surveillance to detect and eliminate foreign agents such as bacteria and viruses as well as the body's own cells that undergo malignant transformation. To counteract this surveillance, tumor cells evolve mechanisms to evade elimination by the immune system; this tumor immunoescape leads to continuous tumor expansion, albeit potentially with a different composition of the tumor cell population ("immunoediting". Tumor immunoescape and immunoediting are products of an evolutionary process and are hence driven by mutation and selection. Higher mutation rates allow cells to more rapidly acquire new phenotypes that help evade the immune system, but also harbor the risk of an inability to maintain essential genome structure and functions, thereby leading to an error catastrophe. In this paper, we designed a novel mathematical framework, based upon the quasispecies model, to study the effects of tumor immunoediting and the evolution of (epigenetic instability on the abundance of tumor and immune system cells. We found that there exists an optimum number of tumor variants and an optimum magnitude of mutation rates that maximize tumor progression despite an active immune response. Our findings provide insights into the dynamics of tumorigenesis during immune system attacks and help guide the choice of treatment strategies that best inhibit diverse tumor cell populations.
Colombani, C; Legarra, A; Fritz, S; Guillaume, F; Croiseau, P; Ducrocq, V; Robert-Granié, C
Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 bayesian methods, BayesCπ and bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, bayesian
Bhatia, Gaurav; Tandon, Arti; Patterson, Nick; Aldrich, Melinda C.; Ambrosone, Christine B.; Amos, Christopher; Bandera, Elisa V.; Berndt, Sonja I.; Bernstein, Leslie; Blot, William J.; Bock, Cathryn H.; Caporaso, Neil; Casey, Graham; Deming, Sandra L.; Diver, W. Ryan; Gapstur, Susan M.; Gillanders, Elizabeth M.; Harris, Curtis C.; Henderson, Brian E.; Ingles, Sue A.; Isaacs, William; De Jager, Phillip L.; John, Esther M.; Kittles, Rick A.; Larkin, Emma; McNeill, Lorna H.; Millikan, Robert C.; Murphy, Adam; Neslund-Dudas, Christine; Nyante, Sarah; Press, Michael F.; Rodriguez-Gil, Jorge L.; Rybicki, Benjamin A.; Schwartz, Ann G.; Signorello, Lisa B.; Spitz, Margaret; Strom, Sara S.; Tucker, Margaret A.; Wiencke, John K.; Witte, John S.; Wu, Xifeng; Yamamura, Yuko; Zanetti, Krista A.; Zheng, Wei; Ziegler, Regina G.; Chanock, Stephen J.; Haiman, Christopher A.; Reich, David; Price, Alkes L.
The extent of recent selection in admixed populations is currently an unresolved question. We scanned the genomes of 29,141 African Americans and failed to find any genome-wide-significant deviations in local ancestry, indicating no evidence of selection influencing ancestry after admixture. A recent analysis of data from 1,890 African Americans reported that there was evidence of selection in African Americans after their ancestors left Africa, both before and after admixture. Selection after admixture was reported on the basis of deviations in local ancestry, and selection before admixture was reported on the basis of allele-frequency differences between African Americans and African populations. The local-ancestry deviations reported by the previous study did not replicate in our very large sample, and we show that such deviations were expected purely by chance, given the number of hypotheses tested. We further show that the previous study’s conclusion of selection in African Americans before admixture is also subject to doubt. This is because the FST statistics they used were inflated and because true signals of unusual allele-frequency differences between African Americans and African populations would be best explained by selection that occurred in Africa prior to migration to the Americas. PMID:25242497
Bhatia, Gaurav; Tandon, Arti; Patterson, Nick; Aldrich, Melinda C; Ambrosone, Christine B; Amos, Christopher; Bandera, Elisa V; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Bock, Cathryn H; Caporaso, Neil; Casey, Graham; Deming, Sandra L; Diver, W Ryan; Gapstur, Susan M; Gillanders, Elizabeth M; Harris, Curtis C; Henderson, Brian E; Ingles, Sue A; Isaacs, William; De Jager, Phillip L; John, Esther M; Kittles, Rick A; Larkin, Emma; McNeill, Lorna H; Millikan, Robert C; Murphy, Adam; Neslund-Dudas, Christine; Nyante, Sarah; Press, Michael F; Rodriguez-Gil, Jorge L; Rybicki, Benjamin A; Schwartz, Ann G; Signorello, Lisa B; Spitz, Margaret; Strom, Sara S; Tucker, Margaret A; Wiencke, John K; Witte, John S; Wu, Xifeng; Yamamura, Yuko; Zanetti, Krista A; Zheng, Wei; Ziegler, Regina G; Chanock, Stephen J; Haiman, Christopher A; Reich, David; Price, Alkes L
The extent of recent selection in admixed populations is currently an unresolved question. We scanned the genomes of 29,141 African Americans and failed to find any genome-wide-significant deviations in local ancestry, indicating no evidence of selection influencing ancestry after admixture. A recent analysis of data from 1,890 African Americans reported that there was evidence of selection in African Americans after their ancestors left Africa, both before and after admixture. Selection after admixture was reported on the basis of deviations in local ancestry, and selection before admixture was reported on the basis of allele-frequency differences between African Americans and African populations. The local-ancestry deviations reported by the previous study did not replicate in our very large sample, and we show that such deviations were expected purely by chance, given the number of hypotheses tested. We further show that the previous study's conclusion of selection in African Americans before admixture is also subject to doubt. This is because the FST statistics they used were inflated and because true signals of unusual allele-frequency differences between African Americans and African populations would be best explained by selection that occurred in Africa prior to migration to the Americas.
Yang, Xin; Wang, Lixia; Feng, Hanli; Qi, Mingwei; Zhang, Zongze; Gao, Chong; Wang, Chunqun; Hu, Min; Fang, Rui; Li, Chengye
Gastrodiscidae species are neglected but significant paramphistomes in small ruminants, which can lead to considerable economic losses to the breeding industry of livestock. However, knowledge about molecular ecology, population genetics, and phylogenetic analysis is still limited. In the present study, we firstly sequenced and analyzed the full mitochondrial (mt) genome of Homalogaster paloniae (14,490 bp). The gene contents and organization of the H. paloniae mt genome is the same as that of other digeneans, such as Fasciola hepatica and Paramphistomum cervi. It is interesting that unlike other paramphistomes, H. paloniae is flat in shape which is similar with Fasciola, such as F. hepatica. Phylogenetic analysis of H. paloniae and other 17 selected digeneans using concatenated amino acid sequences of the 12 protein-coding genes showed that Gastrodiscidae is closely related to Paramphistomidae and Gastrothylacidae. The availability of the mt genome sequence of H. paloniae should provide an important foundation for further molecular study of Gastrodiscidae and other digeneans.
Tellier Laurent C
Full Text Available Abstract Background Mitochondria are a valuable resource for studying the evolutionary process and deducing phylogeny. A few mitochondria genomes have been sequenced, but a comprehensive picture of the domestication event for silkworm mitochondria remains to be established. In this study, we integrate the extant data, and perform a whole genome resequencing of Japanese wild silkworm to obtain breakthrough results in silkworm mitochondrial (mt population, and finally use these to deduce a more comprehensive phylogeny of the Bombycidae. Results We identified 347 single nucleotide polymorphisms (SNPs in the mt genome, but found no past recombination event to have occurred in the silkworm progenitor. A phylogeny inferred from these whole genome SNPs resulted in a well-classified tree, confirming that the domesticated silkworm, Bombyx mori, most recently diverged from the Chinese wild silkworm, rather than from the Japanese wild silkworm. We showed that the population sizes of the domesticated and Chinese wild silkworms both experience neither expansion nor contraction. We also discovered that one mt gene, named cytochrome b, shows a strong signal of positive selection in the domesticated clade. This gene is related to energy metabolism, and may have played an important role during silkworm domestication. Conclusions We present a comparative analysis on 41 mt genomes of B. mori and B. mandarina from China and Japan. With these, we obtain a much clearer picture of the evolution history of the silkworm. The data and analyses presented here aid our understanding of the silkworm in general, and provide a crucial insight into silkworm phylogeny.
Wang, X; Tolstonog, G; Shoeman, R L; Traub, P
Mouse vimentin intermediate filaments (IFs) reconstituted in vitro were analyzed for their capacity to select certain DNA sequences from a mixture of about 500-bp-long fragments of total mouse genomic DNA. The fragments preferentially bound by the IFs and enriched by several cycles of affinity binding and polymerase chain reaction (PCR) amplification were cloned and sequenced. In general, they were G-rich and highly repetitive in that they often contained Gn, (GT)n, and (GA)n repeat elements. Other, more complex repeat sequences were identified as well. Apart from the capacity to adopt a Z-DNA and triple helix configuration under superhelical tension, many fragments were potentially able to form cruciform structures and contained consensus binding sites for various transcription factors. All of these sequence elements are known to occur in introns and 5'/3'-flanking regions of genes and to play roles in DNA transcription, recombination and replication. A FASTA search of the EMBL data bank indeed revealed that sequences homologous to the mouse repetitive DNA fragments are commonly associated with gene-regulatory elements. Unexpectedly, vimentin IFs also bound a large number of apparently overlapping, AT-rich DNA fragments that could be aligned into a composite sequence highly homologous to the 234-bp consensus centromere repeat sequence of gamma-satellite DNA. Previous experiments have shown a high affinity of vimentin for G-rich, repetitive telomere DNA sequences, superhelical DNA, and core histones. Taken together, these data support the hypothesis that, after penetration of the double nuclear membrane via an as yet unidentified mechanism, vimentin IFs cooperatively fix repetitive DNA sequence elements in a differentiation-specific manner in the nuclear periphery subjacent to the nuclear lamina and thus participate in the organization of chromatin and in the control of transcription, replication, and recombination processes. This includes aspects of global
Full Text Available The combination therapy of the Artemisinin-derivative Artemether (ART with Lumefantrine (LM (Coartem® is an important malaria treatment regimen in many endemic countries. Resistance to Artemisinin has already been reported, and it is feared that LM resistance (LMR could also evolve quickly. Therefore molecular markers which can be used to track Coartem® efficacy are urgently needed. Often, stable resistance arises from initial, unstable phenotypes that can be identified in vitro. Here we have used the Plasmodium falciparum multidrug resistant reference strain V1S to induce LMR in vitro by culturing the parasite under continuous drug pressure for 16 months. The initial IC(50 (inhibitory concentration that kills 50% of the parasite population was 24 nM. The resulting resistant strain V1S(LM, obtained after culture for an estimated 166 cycles under LM pressure, grew steadily in 378 nM of LM, corresponding to 15 times the IC(50 of the parental strain. However, after two weeks of culturing V1S(LM in drug-free medium, the IC(50 returned to that of the initial, parental strain V1S. This transient drug tolerance was associated with major changes in gene expression profiles: using the PFSANGER Affymetrix custom array, we identified 184 differentially expressed genes in V1S(LM. Among those are 18 known and putative transporters including the multidrug resistance gene 1 (pfmdr1, the multidrug resistance associated protein and the V-type H+ pumping pyrophosphatase 2 (pfvp2 as well as genes associated with fatty acid metabolism. In addition we detected a clear selective advantage provided by two genomic loci in parasites grown under LM drug pressure, suggesting that all, or some of those genes contribute to development of LM tolerance--they may prove useful as molecular markers to monitor P. falciparum LM susceptibility.
Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines.
Spindel, Jennifer; Begum, Hasina; Akdemir, Deniz; Virk, Parminder; Collard, Bertrand; Redoña, Edilberto; Atlin, Gary; Jannink, Jean-Luc; McCouch, Susan R
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.
Genomic selection and association mapping in rice (Oryza sativa: effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines.
Full Text Available Genomic Selection (GS is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.
LI BaoSheng; YE JianPing; GUO YunHai; CHEN DeNiu; David Dian ZHANG; WEN XiaoHao; QIU ShiFan; OU XianJiao; DU ShuHuan; NIU DongFeng; YANG Yi
Contemporaneous with MIS3, the MGS3 segment of the Milanggouwan stratigraphic section in the Salawusu River Valley, Mu Us Desert, China contains fossil gastropods (terrestrial and freshwater snails) in strata 33LS, 35LS, 37FL and 39LS. Examination of these fossils revealed 11 species belonging to 8 families and 10 genera. They can be classified as: (1) assemblage of Gyraulus and Galba mainly consisting of Gyraulus convexiusculus, Gyraulus sibiricus, Galba pervia and Galba superegra Gredler,etc. (2) assemblage of Vallonia mainly consisting of terrestrial snails, such as Vallonia patens, Pupilla muscorum and Discus paupe, etc. Based on the dating results, and the living habits, living conditions,and geographic distribution of their extant species, we suggest that: the ages of 33LS, 35LS, 37FL, and 39LS are 26000, 29000, 33000 and 38000 a, respectively, corresponding well to the interstadial period in GRIP 4,5, 6 and 10 in terms of chronology and climatic characters; 33LS, 35LS and 39LS represent very warm-humid periods, while 37FL represents a less warm-humid period; the four periods of climatic fluctuations recorded in MGS3 were related to the strong impact of the summer monsoon in East Asia in Mu Us Desert of China during the interstadial of MIS3 on a global climatic background.
Contemporaneous with MIS3, the MGS3 segment of the Milanggouwan stratigraphic section in the Salawusu River Valley, Mu Us Desert, China contains fossil gastropods (terrestrial and freshwater snails) in strata 33LS, 35LS, 37FL and 39LS. Examination of these fossils revealed 11 species belonging to 8 families and 10 genera. They can be classified as: (1) assemblage of Gyraulus and Galba mainly consisting of Gyraulus convexiusculus, Gyraulus sibiricus, Galba pervia and Galba superegra Gredler, etc. (2) assemblage of Vallonia mainly consisting of terrestrial snails, such as Vallonia patens, Pupilla muscorum and Discus paupe, etc. Based on the dating results, and the living habits, living conditions, and geographic distribution of their extant species, we suggest that: the ages of 33LS, 35LS, 37FL, and 39LS are 26000, 29000, 33000 and 38000 a, respectively, corresponding well to the interstadial period in GRIP 4, 5, 6 and 10 in terms of chronology and climatic characters; 33LS, 35LS and 39LS represent very warm-humid periods, while 37FL represents a less warm-humid period; the four periods of climatic fluctuations recorded in MGS3 were related to the strong impact of the summer monsoon in East Asia in Mu Us Desert of China during the interstadial of MIS3 on a global climatic background.
Tom Den Abt
Full Text Available Isolation of mutants in populations of microorganisms has been a valuable tool in experimental genetics for decades. The main disadvantage, however, is the inability of isolating mutants in non-selectable polygenic traits. Most traits of organisms, however, are non-selectable and polygenic, including industrially important properties of microorganisms. The advent of powerful technologies for polygenic analysis of complex traits has allowed simultaneous identification of multiple causative mutations among many thousands of irrelevant mutations. We now show that this also applies to haploid strains of which the genome has been loaded with induced mutations so as to affect as many non-selectable, polygenic traits as possible. We have introduced about 900 mutations into single haploid yeast strains using multiple rounds of EMS mutagenesis, while maintaining the mating capacity required for genetic mapping. We screened the strains for defects in flavor production, an important non-selectable, polygenic trait in yeast alcoholic beverage production. A haploid strain with multiple induced mutations showing reduced ethyl acetate production in semi-anaerobic fermentation, was selected and the underlying quantitative trait loci (QTLs were mapped using pooled-segregant whole-genome sequence analysis after crossing with an unrelated haploid strain. Reciprocal hemizygosity analysis and allele exchange identified PMA1 and CEM1 as causative mutant alleles and TPS1 as a causative genetic background allele. The case of CEM1 revealed that relevant mutations without observable effect in the haploid strain with multiple induced mutations (in this case due to defective mitochondria can be identified by polygenic analysis as long as the mutations have an effect in part of the segregants (in this case those that regained fully functional mitochondria. Our results show that genomic saturation mutagenesis combined with complex trait polygenic analysis could be used
Taras K Oleksyk
Full Text Available When a selective sweep occurs in the chromosomal region around a target gene in two populations that have recently separated, it produces three dramatic genomic consequences: 1 decreased multi-locus heterozygosity in the region; 2 elevated or diminished genetic divergence (F(ST of multiple polymorphic variants adjacent to the selected locus between the divergent populations, due to the alternative fixation of alleles; and 3 a consequent regional increase in the variance of F(ST (S(2F(ST for the same clustered variants, due to the increased alternative fixation of alleles in the loci surrounding the selection target. In the first part of our study, to search for potential targets of directional selection, we developed and validated a resampling-based computational approach; we then scanned an array of 31 different-sized moving windows of SNP variants (5-65 SNPs across the human genome in a set of European and African American population samples with 183,997 SNP loci after correcting for the recombination rate variation. The analysis revealed 180 regions of recent selection with very strong evidence in either population or both. In the second part of our study, we compared the newly discovered putative regions to those sites previously postulated in the literature, using methods based on inspecting patterns of linkage disequilibrium, population divergence and other methodologies. The newly found regions were cross-validated with those found in nine other studies that have searched for selection signals. Our study was replicated especially well in those regions confirmed by three or more studies. These validated regions were independently verified, using a combination of different methods and different databases in other studies, and should include fewer false positives. The main strength of our analysis method compared to others is that it does not require dense genotyping and therefore can be used with data from population-based genome SNP scans
Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming
Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400
Plantae (Archaeplastida) are a natural group of organisms with plastids of primary endosymbiotic origin. Within this group, members of the red algae show evidence of a reduction of their genomic content. In this work, we designed a bioinformatics approach to investigate the few, sometimes incomplete, genomic datasets available for red algae, with the purpose of pointing out possible gene family losses and expansions. Our pipeline first populates a relational database with precomputed ortholog...
Yang Mary Qu; Chen Zhongxue; Yang Jack; Liu Qingzhong; Sung Andrew H; Huang Xudong
Abstract Background Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information...
Feng, Xiao-Jing; Jiang, Guo-Fang; Fan, Zhou
Identification of loci under divergent selection is a key step in understanding the evolutionary process because those loci are responsible for the genetic variations that affect fitness in different environments. Understanding how environmental forces give rise to adaptive genetic variation is a challenge in pest control. Here, we performed an amplified fragment length polymorphism (AFLP) genome scan in populations of the bamboo locust, Ceracris kiangsu, to search for candidate loci that are influenced by selection along an environmental gradient in southern China. In outlier locus detection, loci that demonstrate significantly higher or lower among-population genetic differentiation than expected under neutrality are identified as outliers. We used several outlier detection methods to study the features of C. kiangsu, including method DFDIST, BayeScan, and logistic regression. A total of 97 outlier loci were detected in the C. kiangsu genome with very high statistical supports. Moreover, the results suggested that divergent selection arising from environmental variation has been driven by differences in temperature, precipitation, humidity and sunshine. These findings illustrate that divergent selection and potential local adaptation are prevalent in locusts despite seemingly high levels of gene flow. Thus, we propose that native environments in each population may induce divergent natural selection.
Full Text Available Abstract Background The genus Listeria includes two closely related pathogenic and non-pathogenic species, L. monocytogenes and L. innocua. L. monocytogenes is an opportunistic human foodborne and animal pathogen that includes two common lineages. While lineage I is more commonly found among human listeriosis cases, lineage II appears to be overrepresented among isolates from foods and environmental sources. This study used the genome sequences for one L. innocua strain and four L. monocytogenes strains representing lineages I and II, to characterize the contributions of positive selection and recombination to the evolution of the L. innocua/L. monocytogenes core genome. Results Among the 2267 genes in the L. monocytogenes/L. innocua core genome, 1097 genes showed evidence for recombination and 36 genes showed evidence for positive selection. Positive selection was strongly associated with recombination. Specifically, 29 of the 36 genes under positive selection also showed evidence for recombination. Recombination was more common among isolates in lineage II than lineage I; this trend was confirmed by sequencing five genes in a larger isolate set. Positive selection was more abundant in the ancestral branch of lineage II (20 genes as compared to the ancestral branch of lineage I (9 genes. Additional genes under positive selection were identified in the branch separating the two species; for this branch, genes in the role category "Cell wall and membrane biogenesis" were significantly more likely to have evidence for positive selection. Positive selection of three genes was confirmed in a larger isolate set, which also revealed occurrence of multiple premature stop codons in one positively selected gene involved in flagellar motility (flaR. Conclusion While recombination and positive selection both contribute to evolution of L. monocytogenes, the relative contributions of these evolutionary forces seem to differ by L. monocytogenes lineages and
Full Text Available Sheep are among the major economically important livestock species worldwide because the animals produce milk, wool, skin, and meat. In the present study, the Illumina OvineSNP50 BeadChip was used to investigate genetic diversity and genome selection among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds from the United States. After quality-control filtering of SNPs (single nucleotide polymorphisms, we used 48,026 SNPs, including 46,850 SNPs on autosomes that were in Hardy-Weinberg equilibrium and 1,176 SNPs on chromosome × for analysis. Phylogenetic analysis based on all 46,850 SNPs clearly separated Suffolk from Rambouillet, Columbia, Polypay, and Targhee, which was not surprising as Rambouillet contributed to the synthesis of the later three breeds. Based on pair-wise estimates of F(ST, significant genetic differentiation appeared between Suffolk and Rambouillet (F(ST = 0.1621, while Rambouillet and Targhee had the closest relationship (F(ST = 0.0681. A scan of the genome revealed 45 and 41 differentially selected regions (DSRs between Suffolk and Rambouillet and among Rambouillet-related breed populations, respectively. Our data indicated that regions 13 and 24 between Suffolk and Rambouillet might be good candidates for evaluating breed differences. Furthermore, ovine genome v3.1 assembly was used as reference to link functionally known homologous genes to economically important traits covered by these differentially selected regions. In brief, our present study provides a comprehensive genome-wide view on within- and between-breed genetic differentiation, biodiversity, and evolution among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds. These results may provide new guidance for the synthesis of new breeds with different breeding objectives.
Zhang, Lifan; Mousel, Michelle R; Wu, Xiaolin; Michal, Jennifer J; Zhou, Xiang; Ding, Bo; Dodson, Michael V; El-Halawany, Nermin K; Lewis, Gregory S; Jiang, Zhihua
Sheep are among the major economically important livestock species worldwide because the animals produce milk, wool, skin, and meat. In the present study, the Illumina OvineSNP50 BeadChip was used to investigate genetic diversity and genome selection among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds from the United States. After quality-control filtering of SNPs (single nucleotide polymorphisms), we used 48,026 SNPs, including 46,850 SNPs on autosomes that were in Hardy-Weinberg equilibrium and 1,176 SNPs on chromosome × for analysis. Phylogenetic analysis based on all 46,850 SNPs clearly separated Suffolk from Rambouillet, Columbia, Polypay, and Targhee, which was not surprising as Rambouillet contributed to the synthesis of the later three breeds. Based on pair-wise estimates of F(ST), significant genetic differentiation appeared between Suffolk and Rambouillet (F(ST) = 0.1621), while Rambouillet and Targhee had the closest relationship (F(ST) = 0.0681). A scan of the genome revealed 45 and 41 differentially selected regions (DSRs) between Suffolk and Rambouillet and among Rambouillet-related breed populations, respectively. Our data indicated that regions 13 and 24 between Suffolk and Rambouillet might be good candidates for evaluating breed differences. Furthermore, ovine genome v3.1 assembly was used as reference to link functionally known homologous genes to economically important traits covered by these differentially selected regions. In brief, our present study provides a comprehensive genome-wide view on within- and between-breed genetic differentiation, biodiversity, and evolution among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds. These results may provide new guidance for the synthesis of new breeds with different breeding objectives.
Full Text Available Abstract Background At least three species of Borrelia burgdorferi sensu lato (Bbsl cause tick-borne Lyme disease. Previous work including the genome analysis of B. burgdorferi B31 and B. garinii PBi suggested a highly variable plasmid part. The frequent occurrence of duplicated sequence stretches, the observed plasmid redundancy, as well as the mainly unknown function and variability of plasmid encoded genes rendered the relationships between plasmids within and between species largely unresolvable. Results To gain further insight into Borreliae genome properties we completed the plasmid sequences of B. garinii PBi, added the genome of a further species, B. afzelii PKo, to our analysis, and compared for both species the genomes of pathogenic and apathogenic strains. The core of all Bbsl genomes consists of the chromosome and two plasmids collinear between all species. We also found additional groups of plasmids, which share large parts of their sequences. This makes it very likely that these plasmids are relatively stable and share common ancestors before the diversification of Borrelia species. The analysis of the differences between B. garinii PBi and B. afzelii PKo genomes of low and high passages revealed that the loss of infectivity is accompanied in both species by a loss of similar genetic material. Whereas B. garinii PBi suffered only from the break-off of a plasmid end, B. afzelii PKo lost more material, probably an entire plasmid. In both cases the vls gene locus encoding for variable surface proteins is affected. Conclusion The complete genome sequences of a B. garinii and a B. afzelii strain facilitate further comparative studies within the genus Borrellia. Our study shows that loss of infectivity can be traced back to only one single event in B. garinii PBi: the loss of the vls cassettes possibly due to error prone gene conversion. Similar albeit extended losses in B. afzelii PKo support the hypothesis that infectivity of Borrelia
Full Text Available Large genome-wide association studies (GWAS have identified many genetic loci associated with risk for myocardial infarction (MI and coronary artery disease (CAD. Concurrently, efforts such as the National Institutes of Health (NIH Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE Consortium have provided unprecedented data on functional elements of the human genome. In the present study, we systematically investigate the biological link between genetic variants associated with this complex disease and their impacts on gene function. First, we examined the heritability of MI/CAD according to genomic compartments. We observed that single nucleotide polymorphisms (SNPs residing within nearby regulatory regions show significant polygenicity and contribute between 59-71% of the heritability for MI/CAD. Second, we showed that the polygenicity and heritability explained by these SNPs are enriched in histone modification marks in specific cell types. Third, we found that a statistically higher number of 45 MI/CAD-associated SNPs that have been identified from large-scale GWAS studies reside within certain functional elements of the genome, particularly in active enhancer and promoter regions. Finally, we observed significant heterogeneity of this signal across cell types, with strong signals observed within adipose nuclei, as well as brain and spleen cell types. These results suggest that the genetic etiology of MI/CAD is largely explained by tissue-specific regulatory perturbation within the human genome.
Jesse M Engreitz
Full Text Available Chromosomal translocations are frequent features of cancer genomes that contribute to disease progression. These rearrangements result from formation and illegitimate repair of DNA double-strand breaks (DSBs, a process that requires spatial colocalization of chromosomal breakpoints. The "contact first" hypothesis suggests that translocation partners colocalize in the nuclei of normal cells, prior to rearrangement. It is unclear, however, the extent to which spatial interactions based on three-dimensional genome architecture contribute to chromosomal rearrangements in human disease. Here we intersect Hi-C maps of three-dimensional chromosome conformation with collections of 1,533 chromosomal translocations from cancer and germline genomes. We show that many translocation-prone pairs of regions genome-wide, including the cancer translocation partners BCR-ABL and MYC-IGH, display elevated Hi-C contact frequencies in normal human cells. Considering tissue specificity, we find that translocation breakpoints reported in human hematologic malignancies have higher Hi-C contact frequencies in lymphoid cells than those reported in sarcomas and epithelial tumors. However, translocations from multiple tissue types show significant correlation with Hi-C contact frequencies, suggesting that both tissue-specific and universal features of chromatin structure contribute to chromosomal alterations. Our results demonstrate that three-dimensional genome architecture shapes the landscape of rearrangements directly observed in human disease and establish Hi-C as a key method for dissecting these effects.
Vitezica, Zulma G; Varona, Luis; Legarra, Andres
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.
Meira, C T; Curi, R A; Farah, M M; de Oliveira, H N; Béltran, N A R; Silva, J A V; Mota, M D S da
Selection of Quarter Horses for different purposes has led to the formation of lines, including racing and cutting horses. The objective of this study was to identify genomic regions divergently selected in racing line of Quarter Horses in relation to cutting line applying relative extended haplotype homozygosity (REHH) analysis, an extension of extended haplotype homozygosity (EHH) analysis, and the fixation index (F ST) statistic. A total of 188 horses of both sexes, born between 1985 and 2009 and registered at the Brazilian Association of Quarter Horse Breeders, including 120 of the racing line and 68 of the cutting line, were genotyped using single nucleotide polymorphism arrays. On the basis of 27 genomic regions identified as selection signatures by REHH and F ST statistics, functional annotations of genes were made in order to identify those that could have been important during formation of the racing line and that could be used subsequently for the development of selection tools. Genes involved in muscle growth (n=8), skeletal growth (n=10), muscle energy metabolism (n=15), cardiovascular system (n=14) and nervous system (n=23) were identified, including the FKTN, INSR, GYS1, CLCN1, MYLK, SYK, ANG, CNTFR and HTR2B.
Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai
Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.
Ionizing radiation has long been known to induce heritable mutagenic change in DNA sequence. However, the genome-wide effect of radiation is not well understood. Here we report the molecular properties and frequency of mutations in phenotypically selected mutant lines isolated following exposure of the genetic model flowering plant Arabidopsis thaliana to fast neutrons (FNs). Previous studies suggested that FNs predominantly induce deletions longer than a kilobase in A. thaliana. However, we found a higher frequency of single base substitution than deletion mutations. While the overall frequency and molecular spectrum of fast-neutron (FN)-induced single base substitutions differed substantially from those of "background" mutations arising spontaneously in laboratory-grown plants, G:C>A:T transitions were favored in both. We found that FN-induced G:C>A:T transitions were concentrated at pyrimidine dinucleotide sites, suggesting that FNs promote the formation of mutational covalent linkages between adjacent pyrimidine residues. In addition, we found that FNs induced more single base than large deletions, and that these single base deletions were possibly caused by replication slippage. Our observations provide an initial picture of the genome-wide molecular profile of mutations induced in A. thaliana by FN irradiation and are particularly informative of the nature and extent of genome-wide mutation in lines selected on the basis of mutant phenotypes from FN-mutagenized A. thaliana populations.
Full Text Available Certain environmental microorganisms can cause severe human infections, even in the absence of an obvious requirement for transition through an animal host for replication ("accidental virulence". To understand this process, we compared eleven isolate genomes of Burkholderia pseudomallei (Bp, a tropical soil microbe and causative agent of the human and animal disease melioidosis. We found evidence for the existence of several new genes in the Bp reference genome, identifying 282 novel genes supported by at least two independent lines of supporting evidence (mRNA transcripts, database homologs, and presence of ribosomal binding sites and 81 novel genes supported by all three lines. Within the Bp core genome, 211 genes exhibited significant levels of positive selection (4.5%, distributed across many cellular pathways including carbohydrate and secondary metabolism. Functional experiments revealed that certain positively selected genes might enhance mammalian virulence by interacting with host cellular pathways or utilizing host nutrients. Evolutionary modifications improving Bp environmental fitness may thus have indirectly facilitated the ability of Bp to colonize and survive in mammalian hosts. These findings improve our understanding of the pathogenesis of melioidosis, and establish Bp as a model system for studying the genetics of accidental virulence.
Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai
Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.
Schneeberger, Valentina E; Allaj, Viola; Gardner, Eric E; Poirier, J T; Rudin, Charles M
Patient-derived xenograft (PDX) mouse models are increasingly used for preclinical therapeutic testing of human cancer. A limitation in molecular and genetic characterization of PDX tumors is the presence of integral murine stroma. This is particularly problematic for genomic sequencing of PDX models. Rapid and dependable approaches for quantitating stromal content and purifying the malignant human component of these tumors are needed. We used a recently developed technique exploiting species-specific polymerase chain reaction (PCR) amplicon length (ssPAL) differences to define the fractional composition of murine and human DNA, which was proportional to the fractional composition of cells in a series of lung cancer PDX lines. We compared four methods of human cancer cell isolation: fluorescence-activated cell sorting (FACS), an immunomagnetic mouse cell depletion (MCD) approach, and two distinct EpCAM-based immunomagnetic positive selection methods. We further analyzed DNA extracted from the resulting enriched human cancer cells by targeted sequencing using a clinically validated multi-gene panel. Stromal content varied widely among tumors of similar histology, but appeared stable over multiple serial tumor passages of an individual model. FACS and MCD were superior to either positive selection approach, especially in cases of high stromal content, and consistently allowed high quality human-specific genomic profiling. ssPAL is a dependable approach to quantitation of murine stromal content, and MCD is a simple, efficient, and high yield approach to human cancer cell isolation for genomic analysis of PDX tumors.
Full Text Available Selective sweep can cause genetic differentiation across populations, which allows for the identification of possible causative regions/genes underlying important traits. The pig has experienced a long history of allele frequency changes through artificial selection in the domestication process. We obtained an average of 329,482,871 sequence reads for 24 pigs from three pig breeds: Yorkshire (n = 5, Landrace (n = 13, and Duroc (n = 6. An average read depth of 11.7 was obtained using whole-genome resequencing on an Illumina HiSeq2000 platform. In this study, cross-population extended haplotype homozygosity and cross-population composite likelihood ratio tests were implemented to detect genes experiencing positive selection for the genome-wide resequencing data generated from three commercial pig breeds. In our results, 26, 7, and 14 genes from Yorkshire, Landrace, and Duroc, respectively were detected by two kinds of statistical tests. Significant evidence for positive selection was identified on genes ST6GALNAC2 and EPHX1 in Yorkshire, PARK2 in Landrace, and BMP6, SLA-DQA1, and PRKG1 in Duroc.These genes are reportedly relevant to lactation, reproduction, meat quality, and growth traits. To understand how these single nucleotide polymorphisms (SNPs related positive selection affect protein function, we analyzed the effect of non-synonymous SNPs. Three SNPs (rs324509622, rs80931851, and rs80937718 in the SLA-DQA1 gene were significant in the enrichment tests, indicating strong evidence for positive selection in Duroc. Our analyses identified genes under positive selection for lactation, reproduction, and meat-quality and growth traits in Yorkshire, Landrace, and Duroc, respectively.
Thomasen, Jørn Rind; Egger-Danner, C; Willam, A
progeny testing. Strong positive interaction effects between increased reliability of genomic predictions and more intensive use of young bulls exist. From an economic perspective a juvenile scheme is always advantageous. The main future focus area for the smaller dairy cattle breeds is to join forces...
Current advances in sequencing technologies and bioinformatics allow to determine a nearly complete genomic background of rice, a staple food for the poor people. Consequently, comprehensive databases of variation among thousands of varieties is currently being assembled and released. Proper analysi...
Villumsen, T M; Janss, L; Lund, M S
Reliabilities for genomic estimated breeding values (GEBV) were investigated by simulation for a typical dairy cattle breeding setting. Scenarios were simulated with different heritabilites (h2) and for different haplotype sizes, and seven generations with only genotypes were generated to investi...
Sarah D. Battenfield
Full Text Available Wheat ( L. cultivars must possess suitable end-use quality for release and consumer acceptability. However, breeding for quality traits is often considered a secondary target relative to yield largely because of amount of seed needed and expense. Without testing and selection, many undesirable materials are advanced, expending additional resources. Here, we develop and validate whole-genome prediction models for end-use quality phenotypes in the CIMMYT bread wheat breeding program. Model accuracy was tested using forward prediction on breeding lines ( = 5520 tested in unbalanced yield trials from 2009 to 2015 at Ciudad Obregon, Sonora, Mexico. Quality parameters included test weight, 1000-kernel weight, hardness, grain and flour protein, flour yield, sodium dodecyl sulfate sedimentation, Mixograph and Alveograph performance, and loaf volume. In general, prediction accuracy substantially increased over time as more data was available to train the model. Reflecting practical implementation of genomic selection (GS in the breeding program, forward prediction accuracies ( for quality parameters were assessed in 2015 and ranged from 0.32 (grain hardness to 0.62 (mixing time. Increased selection intensity was possible with GS since more entries can be genotyped than phenotyped and expected genetic gain was 1.4 to 2.7 times higher across all traits than phenotypic selection. Given the limitations in measuring many lines for quality, we conclude that GS is a powerful tool to facilitate early generation selection for end-use quality in wheat, leaving larger populations for selection on yield during advanced testing and leading to better gain for both quality and yield in bread wheat breeding programs.
Yuri Tani Utsunomiya
Full Text Available As the methodologies available for the detection of positive selection from genomic data vary in terms of assumptions and execution, weak correlations are expected among them. However, if there is any given signal that is consistently supported across different methodologies, it is strong evidence that the locus has been under past selection. In this paper, a straightforward frequentist approach based on the Stouffer Method to combine P-values across different tests for evidence of recent positive selection in common variations, as well as strategies for extracting biological information from the detected signals, were described and applied to high density single nucleotide polymorphism (SNP data generated from dairy and beef cattle (taurine and indicine. The ancestral Bovinae allele state of over 440,000 SNP is also reported. Using this combination of methods, highly significant (P<3.17×10(-7 population-specific sweeps pointing out to candidate genes and pathways that may be involved in beef and dairy production were identified. The most significant signal was found in the Cornichon homolog 3 gene (CNIH3 in Brown Swiss (P = 3.82×10(-12, and may be involved in the regulation of pre-ovulatory luteinizing hormone surge. Other putative pathways under selection are the glucolysis/gluconeogenesis, transcription machinery and chemokine/cytokine activity in Angus; calpain-calpastatin system and ribosome biogenesis in Brown Swiss; and gangliosides deposition in milk fat globules in Gyr. The composite method, combined with the strategies applied to retrieve functional information, may be a useful tool for surveying genome-wide selective sweeps and providing insights in to the source of selection.
Feofilov, A. G.; Kutepov, A. A.; Rezac, L.; Smith, M. D.
This paper describes a methodology for performing a temperature retrieval in the Martian atmosphere in the 50-90 km altitude range using spectrally integrated 15 micrometers C02 limb emissions measured by the Thermal Emission Spectrometer (TES), the thermal infrared spectrometer on board the Mars Global Surveyor (MGS). We demonstrate that temperature retrievals from limb observations in the 75-90 km altitude range require accounting for the non-local thermodynamic equilibrium (non-LTE) populations of the C02(v2) vibrational levels. Using the methodology described in the paper, we have retrieved approximately 1200 individual temperature profiles from MGS TES limb observations in the altitude range between 60 and 90 km. 0ur dataset of retrieved temperature profiles is available for download in supplemental materials of this paper. The temperature retrieval uncertainties are mainly caused by radiance noise, and are estimated to be about 2 K at 60 km and below, 4 K at 70 km, 7 K at 80 km, 10 K at 85 km, and 20 K at 90 km. We compare the retrieved profiles to Mars Climate Database temperature profiles and find good qualitative agreement. Quantitatively, our retrieved profiles are in general warmer and demonstrate strong variability with the following values for bias and standard deviations (in brackets) compared to the Martian Year 24 dataset of the Mars Climate Database: 6 (+/-20) K at 60 km, 7.5 (+/-25) K at 65 km, 9 (+/-27) K at 70 km, 9.5 (+/-27) K at 75 km, 10 (+/-28) K at 80 km, 11 (+/-29) K at 85 km, and 11.5 (+/-31) K at 90 km. Possible reasons for the positive temperature bias are discussed. carbon dioxide molecular vibrations
Peixoto, L A; Bhering, L L; Cruz, C D
Genomic selection is a useful technique to assist breeders in selecting the best genotypes accurately. Phenotypic selection in the F2 generation presents with low accuracy as each genotype is represented by one individual; thus, genomic selection can increase selection accuracy at this stage of the breeding program. This study aimed to establish the optimal number of individuals required to compose the training population and to establish the amount of markers necessary to obtain the maximum accuracy by genomic selection methods in F2 populations. F2 populations with 1000 individuals were simulated, and six traits were simulated with different heritability values (5, 20, 40, 60, 80 and 99%). Ridge regression best linear unbiased prediction was used in all analyses. Genomic selection models were set by varying the number of individuals in the training population (2 to 1000 individuals) and markers (2 to 3060 markers). Phenotypic accuracy, genotypic accuracy, genetic variance, residual variance, and heritability were evaluated. Greater the number of individuals in the training population, higher was the accuracy; the values of genotypic and residual variances and heritability were close to the optimum value. Higher the heritability of the trait, higher is the number of markers necessary to obtain maximum accuracy, ranging from 200 for the trait with 5% heritability to 900 for the trait with 99% heritability. Therefore, genomic selection models for prediction in F2 populations must consist of 200 to 900 markers of major effect on the trait and more than 600 individuals in the training population.
Pujolar, J. M.; Jacobsen, M. W.; Als, Thomas Damm;
with single-generation signatures of spatially varying selection acting on glass eels. After screening 50 354 SNPs, a total of 754 potentially locally selected SNPs were identified. Candidate genes for local selection constituted a wide array of functions, including calcium signalling, neuroactive ligand...
Alm, Eric; Shapiro, B. Jesse
Different microbial species are thought to occupy distinct ecological niches, subjecting each species to unique selective constraints, which may leave a recognizable signal in their genomes. Thus, it may be possible to extract insight into the genetic basis of ecological differences among lineages by identifying unusual patterns of substitutions in orthologous gene or protein sequences. We use the ratio of substitutions in slow versus fast-evolving sites (nucleotides in DNA, or amino acids in protein sequence) to quantify deviations from the typical pattern of selective constraint observed across bacterial lineages. We propose that elevated S:F in one branch (an excess of slow-site substitutions) can indicate a functionally-relevant change, due to either positive selection or relaxed evolutionary constraint. In a genome-wide comparative study of gamma-proteobacterial proteins, we find that cell-surface proteins involved with motility and secretion functions often have high S:F ratios, while information-processing genes do not. Change in evolutionary constraints in some species is evidenced by increased S:F ratios within functionally-related sets of genes (e.g., energy production in Pseudomonas fluorescens), while other species apparently evolve mostly by drift (e.g., uniformly elevated S:F across most genes in Buchnera spp.). Overall, S:F reveals several species-specific, protein-level changes with potential functional/ecological importance. As microbial genome projects yield more species-rich gene-trees, the S:F ratio will become an increasingly powerful tool for uncovering functional genetic differences among species.
Cericola, Fabio; Janss, Luc; Byrne, Stephen
the diagonal elements by estimating the amount of genetic variance caused by the reduction of the coverage depth. Secondly we developed a method to scale the relationship matrix by taking into account the overall amount of pairwise non-missing loci between all families. Rust resistance and heading date were...... investigated how this reduction of the coverage depth affects the genomic relationship matrices used to estimated breeding value of F2 family pools in perennial ryegrass. A total of 995 families were genotyped via GBS providing more than 1.8M allele frequency estimates for each family with an average coverage...... depth of 12.6 per marker. Simulated datasets with a progressively reduced depth showed an increasing level of missing values together with an overestimated genetic variance caused by inflated diagonals in the genomic relationship matrix. In order to address these drawbacks we first showed how to correct...
Cericola, Fabio; Janss, Luc; Byrne, Stephen
the diagonal elements by estimating the amount of genetic variance caused by the reduction of the coverage depth. Secondly we developed a method to scale the relationship matrix by taking into account the overall amount of pairwise non-missing loci between all families. Rust resistance and heading date were...... investigated how this reduction of the coverage depth affects the genomic relationship matrices used to estimated breeding value of F2 family pools in perennial ryegrass. A total of 995 families were genotyped via GBS providing more than 1.8M allele frequency estimates for each family with an average coverage...... depth of 12.6 per marker. Simulated datasets with a progressively reduced depth showed an increasing level of missing values together with an overestimated genetic variance caused by inflated diagonals in the genomic relationship matrix. In order to address these drawbacks we first showed how to correct...
Porteous, D J
This paper describes an efficient protocol for the screening of lambda genomic libraries, plaque and DNA purification, and probe characterization by a combination of new and recently described techniques. The protocol has allowed large numbers of human subchromosome-specific probes to be rapidly generated from an EMBL3 library of human-mouse somatic cell hybrid DNA. The protocol affords considerable savings in time and effort over previous procedures.
Cloned genes have been purified from recombinant DNA bacteriophage libraries by a method exploiting homologous reciprocal recombination in vivo. In this method 'probe' sequences are inserted in a very small plasmid vector and introduced into recombination-proficient bacterial cells. Genomic bacteriophage libraries are propagated on the cells, and phage bearing sequences homologous to the probe acquire an integrated copy of the plasmid by reciprocal recombination. Phage bearing integrated plas...
Dimauro, C; Cellesi, M; Pintus, M A; Macciotta, N P P
In genomic selection (GS) programmes, direct genomic values (DGV) are evaluated using information provided by high-density SNP chip. Being DGV accuracy strictly dependent on SNP density, it is likely that an increase in the number of markers per chip will result in severe computational consequences. Aim of present work was to test the effectiveness of principal component analysis (PCA) carried out by chromosome in reducing the marker dimensionality for GS purposes. A simulated data set of 5700 individuals with an equal number of SNP distributed over six chromosomes was used. PCs were extracted both genome-wide (ALL) and separately by chromosome (CHR) and used to predict DGVs. In the ALL scenario, the SNP variance-covariance matrix (S) was singular, positive semi-definite and contained null information which introduces 'spuriousness' in the derived results. On the contrary, the S matrix for each chromosome (CHR scenario) had a full rank. Obtained DGV accuracies were always better for CHR than ALL. Moreover, in the latter scenario, DGV accuracies became soon unsettled as the number of animals decreases, whereas in CHR, they remain stable till 900-1000 individuals. In real applications where a 54k SNP chip is used, the largest number of markers per chromosome is approximately 2500. Thus, a number of around 3000 genotyped animals could lead to reliable results when the original SNP variables are replaced by a reduced number of PCs. © 2011 Blackwell Verlag GmbH.
Liu, Tianfei; Qu, Hao; Luo, Chenglong; Li, Xuewei; Shu, Dingming; Lund, Mogens Sandø; Su, Guosheng
Newcastle disease (ND) and avian influenza (AI) are the most feared diseases in the poultry industry worldwide. They can cause flock mortality up to 100%, resulting in a catastrophic economic loss. This is the first study to investigate the feasibility of genomic selection for antibody response to Newcastle disease virus (Ab-NDV) and antibody response to Avian Influenza virus (Ab-AIV) in chickens. The data were collected from a crossbred population. Breeding values for Ab-NDV and Ab-AIV were estimated using a pedigree-based best linear unbiased prediction model (BLUP) and a genomic best linear unbiased prediction model (GBLUP). Single-trait and multiple-trait analyses were implemented. According to the analysis using the pedigree-based model, the heritability for Ab-NDV estimated from the single-trait and multiple-trait models was 0.478 and 0.487, respectively. The heritability for Ab-AIV estimated from the two models was 0.301 and 0.291, respectively. The estimated genetic correlation between the two traits was 0.438. A four-fold cross-validation was used to assess the accuracy of the estimated breeding values (EBV) in the two validation scenarios. In the family sample scenario each half-sib family is randomly allocated to one of four subsets and in the random sample scenario the individuals are randomly divided into four subsets. In the family sample scenario, compared with the pedigree-based model, the accuracy of the genomic prediction increased from 0.086 to 0.237 for Ab-NDV and from 0.080 to 0.347 for Ab-AIV. In the random sample scenario, the accuracy was improved from 0.389 to 0.427 for Ab-NDV and from 0.281 to 0.367 for Ab-AIV. The multiple-trait GBLUP model led to a slightly higher accuracy of genomic prediction for both traits. These results indicate that genomic selection for antibody response to ND and AI in chickens is promising.
Full Text Available Newcastle disease (ND and avian influenza (AI are the most feared diseases in the poultry industry worldwide. They can cause flock mortality up to 100%, resulting in a catastrophic economic loss. This is the first study to investigate the feasibility of genomic selection for antibody response to Newcastle disease virus (Ab-NDV and antibody response to Avian Influenza virus (Ab-AIV in chickens. The data were collected from a crossbred population. Breeding values for Ab-NDV and Ab-AIV were estimated using a pedigree-based best linear unbiased prediction model (BLUP and a genomic best linear unbiased prediction model (GBLUP. Single-trait and multiple-trait analyses were implemented. According to the analysis using the pedigree-based model, the heritability for Ab-NDV estimated from the single-trait and multiple-trait models was 0.478 and 0.487, respectively. The heritability for Ab-AIV estimated from the two models was 0.301 and 0.291, respectively. The estimated genetic correlation between the two traits was 0.438. A four-fold cross-validation was used to assess the accuracy of the estimated breeding values (EBV in the two validation scenarios. In the family sample scenario each half-sib family is randomly allocated to one of four subsets and in the random sample scenario the individuals are randomly divided into four subsets. In the family sample scenario, compared with the pedigree-based model, the accuracy of the genomic prediction increased from 0.086 to 0.237 for Ab-NDV and from 0.080 to 0.347 for Ab-AIV. In the random sample scenario, the accuracy was improved from 0.389 to 0.427 for Ab-NDV and from 0.281 to 0.367 for Ab-AIV. The multiple-trait GBLUP model led to a slightly higher accuracy of genomic prediction for both traits. These results indicate that genomic selection for antibody response to ND and AI in chickens is promising.
Rhys A Farrer
Full Text Available Pathogenic fungi constitute a growing threat to both plant and animal species on a global scale. Despite a clonal mode of reproduction dominating the population genetic structure of many fungi, putatively asexual species are known to adapt rapidly when confronted by efforts to control their growth and transmission. However, the mechanisms by which adaptive diversity is generated across a clonal background are often poorly understood. We sequenced a global panel of the emergent amphibian pathogen, Batrachochytrium dendrobatidis (Bd, to high depth and characterized rapidly changing features of its genome that we believe hold the key to the worldwide success of this organism. Our analyses show three processes that contribute to the generation of de novo diversity. Firstly, we show that the majority of wild isolates manifest chromosomal copy number variation that changes over short timescales. Secondly, we show that cryptic recombination occurs within all lineages of Bd, leading to large regions of the genome being in linkage equilibrium, and is preferentially associated with classes of genes of known importance for virulence in other pathosystems. Finally, we show that these classes of genes are under directional selection, and that this has predominantly targeted the Global Panzootic Lineage (BdGPL. Our analyses show that Bd manifests an unusually dynamic genome that may have been shaped by its association with the amphibian host. The rates of variation that we document likely explain the high levels of phenotypic variability that have been reported for Bd, and suggests that the dynamic genome of this pathogen has contributed to its success across multiple biomes and host-species.
Pahikkala, Tapio; Okser, Sebastian; Airola, Antti; Salakoski, Tapio; Aittokallio, Tero
Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a
Full Text Available Abstract Background Marine fishes have been shown to display low levels of genetic structuring and associated high levels of gene flow, suggesting shallow evolutionary trajectories and, possibly, limited or lacking adaptive divergence among local populations. We investigated variation in 98 gene-associated single nucleotide polymorphisms (SNPs for evidence of selection in local populations of Atlantic cod (Gadus morhua L. across the species distribution. Results Our global genome scan analysis identified eight outlier gene loci with very high statistical support, likely to be subject to directional selection in local demes, or closely linked to loci under selection. Likewise, on a regional south/north transect of central and eastern Atlantic populations, seven loci displayed strongly elevated levels of genetic differentiation. Selection patterns among populations appeared to be relatively widespread and complex, i.e. outlier loci were generally not only associated with one of a few divergent local populations. Even on a limited geographical scale between the proximate North Sea and Baltic Sea populations four loci displayed evidence of adaptive evolution. Temporal genome scan analysis applied to DNA from archived otoliths from a Faeroese population demonstrated stability of the intra-population variation over 24 years. An exploratory landscape genetic analysis was used to elucidate potential effects of the most likely environmental factors responsible for the signatures of local adaptation. We found that genetic variation at several of the outlier loci was better correlated with temperature and/or salinity conditions at spawning grounds at spawning time than with geographic distance per se. Conclusion These findings illustrate that adaptive population divergence may indeed be prevalent despite seemingly high levels of gene flow, as found in most marine fishes. Thus, results have important implications for our understanding of the interplay of
Amos Christopher I
Full Text Available Abstract Motivation Single nucleotide polymorphisms (SNPs are the most common type of genetic variation in humans. However, the factors that affect SNP density are poorly understood. The goal of this study was to estimate the relative effects of mutability and selection on SNP density in transcribed regions of human genes. It is important for prediction of the regions that harbor functional polymorphisms. Results We used frequency-validated SNPs resulting from single-nucleotide substitutions. SNPs were subdivided into five functional categories: (i 5' untranslated region (UTR SNPs, (ii 3' UTR SNPs, (iii synonymous SNPs, (iv SNPs producing conservative missense mutations, and (v SNPs producing radical missense mutations. Each of these categories was further subdivided into nine mutational categories on the basis of the single-nucleotide substitution type. Thus, 45 functional/mutational categories were analyzed. The relative mutation rate in each mutational category was estimated on the basis of published data. The proportion of segregating sites (PSSs for each functional/mutational category was estimated by dividing the observed number of SNPs by the number of potential sites in the genome for a given functional/mutational category. By analyzing each functional group separately, we found significant positive correlations between PSSs and relative mutation rates (Spearman's correlation coefficient, at least r = 0.96, df = 9, P P = 0.001, suggesting that selection affects SNP density in transcribed regions of the genome. We used analyses of variance and covariance to estimate the relative effects of selection (functional category and mutability (relative mutation rate on the PSSs and found that approximately 87% of variation in PSS was due to variation in the mutation rate and approximately 13% was due to selection, suggesting that the probability that a site located in a transcribed region of a gene is polymorphic mostly depends on the mutability
Full Text Available Genetic differences both between individuals and populations are studied for their evolutionary relevance and for their potential medical applications. Most of the genetic differentiation among populations are caused by random drift that should affect all loci across the genome in a similar manner. When a locus shows extraordinary high or low levels of population differentiation, this may be interpreted as evidence for natural selection. The most used measure of population differentiation was devised by Wright and is known as fixation index, or F(ST. We performed a genome-wide estimation of F(ST on about 4 millions of SNPs from HapMap project data. We demonstrated a heterogeneous distribution of F(ST values between autosomes and heterochromosomes. When we compared the F(ST values obtained in this study with another evolutionary measure obtained by comparative interspecific approach, we found that genes under positive selection appeared to show low levels of population differentiation. We applied a gene set approach, widely used for microarray data analysis, to detect functional pathways under selection. We found that one pathway related to antigen processing and presentation showed low levels of F(ST, while several pathways related to cell signalling, growth and morphogenesis showed high F(ST values. Finally, we detected a signature of selection within genes associated with human complex diseases. These results can help to identify which process occurred during human evolution and adaptation to different environments. They also support the hypothesis that common diseases could have a genetic background shaped by human evolution.
Moon, S.; Kim, T.H.; Lee, K.T.; Kwak, W.; Lee, T.; Lee, S.W.; Kim, M.J.; Cho, K.; Kim, N.; Chung, W.H.; Sung, S.; Park, T.; Cho, S.; Groenen, M.A.M.; Nielsen, R.; Kim, Y.; Kim, H.
Background: Animal domestication involved drastic phenotypic changes driven by strong artificial selection and also resulted in new populations of breeds, established by humans. This study aims to identify genes that show evidence of recent artificial selection during pig domestication. Results: Who
Cericola, Fabio; Janss, Luc; Byrne, Stephen
Genotyping by sequencing (GBS) allows generating up to millions of molecular markers with a cost per sample which is proportional to the level of multiplexing. Increasing the sample multiplexing decreases the genotyping price but also reduces the numbers of reads per marker. In this work we...... depth of 12.6 per marker. Simulated datasets with a progressively reduced depth showed an increasing level of missing values together with an overestimated genetic variance caused by inflated diagonals in the genomic relationship matrix. In order to address these drawbacks we first showed how to correct...
Sun, Xinli; Jia, Qi; Guo, Yuchun; Zheng, Xiujuan; Liang, Kangjing
To investigate the selective pressures acting on the protein-coding genes during the differentiation of indica and japonica, all of the possible orthologous genes between the Nipponbare and 93–11 genomes were identified and compared with each other. Among these genes, 8,530 pairs had identical sequences, and 27,384 pairs shared more than 90% sequence identity. Only 2,678 pairs of genes displaying a Ka/Ks ratio significantly greater than one were revealed, and most of these genes contained only nonsynonymous sites. The genes without synonymous site were further analyzed with the SNP data of 1529 O. sativa and O. rufipogon accessions, and 1068 genes were identified to be under positive selection during the differentiation of indica and temperate japonica. The positively selected genes (PSGs) are unevenly distributed on 12 chromosomes, and the proteins encoded by the PSGs are dominant with binding, transferase and hydrolase activities, and especially enriched in the plant responses to stimuli, biological regulations, and transport processes. Meanwhile, the most PSGs of the known function and/or expression were involved in the regulation of biotic/abiotic stresses. The evidence of pervasive positive selection suggested that many factors drove the differentiation of indica and japonica, which has already started in wild rice but is much lower than in cultivated rice. Lower differentiation and less PSGs revealed between the Or-It and Or-IIIt wild rice groups implied that artificial selection provides greater contribution on the differentiation than natural selection. In addition, the phylogenetic tree constructed with positively selected sites showed that the japonica varieties exhibited more diversity than indica on differentiation, and Or-III of O. rufipogon exhibited more than Or-I. PMID:25774680
Full Text Available VKORC1 (vitamin K epoxide reductase complex subunit 1, 16p11.2 is the main genetic determinant of human response to oral anticoagulants of antivitamin K type (AVK. This gene was recently suggested to be a putative target of positive selection in East Asian populations. In this study, we genotyped the HGDP-CEPH Panel for six VKORC1 SNPs and downloaded chromosome 16 genotypes from the HGDP-CEPH database in order to characterize the geographic distribution of footprints of positive selection within and around this locus. A unique VKORC1 haplotype carrying the promoter mutation associated with AVK sensitivity showed especially high frequencies in all the 17 HGDP-CEPH East Asian population samples. VKORC1 and 24 neighboring genes were found to lie in a 505 kb region of strong linkage disequilibrium in these populations. Patterns of allele frequency differentiation and haplotype structure suggest that this genomic region has been submitted to a near complete selective sweep in all East Asian populations and only in this geographic area. The most extreme scores of the different selection tests are found within a smaller 45 kb region that contains VKORC1 and three other genes (BCKDK, MYST1 (KAT8, and PRSS8 with different functions. Because of the strong linkage disequilibrium, it is not possible to determine if VKORC1 or one of the three other genes is the target of this strong positive selection that could explain present-day differences among human populations in AVK dose requirement. Our results show that the extended region surrounding a presumable single target of positive selection should be analyzed for genetic variation in a wide range of genetically diverse populations in order to account for other neighboring and confounding selective events and the hitchhiking effect.
Patillon, Blandine; Luisi, Pierre; Blanché, Hélène; Patin, Etienne; Cann, Howard M; Génin, Emmanuelle; Sabbagh, Audrey
VKORC1 (vitamin K epoxide reductase complex subunit 1, 16p11.2) is the main genetic determinant of human response to oral anticoagulants of antivitamin K type (AVK). This gene was recently suggested to be a putative target of positive selection in East Asian populations. In this study, we genotyped the HGDP-CEPH Panel for six VKORC1 SNPs and downloaded chromosome 16 genotypes from the HGDP-CEPH database in order to characterize the geographic distribution of footprints of positive selection within and around this locus. A unique VKORC1 haplotype carrying the promoter mutation associated with AVK sensitivity showed especially high frequencies in all the 17 HGDP-CEPH East Asian population samples. VKORC1 and 24 neighboring genes were found to lie in a 505 kb region of strong linkage disequilibrium in these populations. Patterns of allele frequency differentiation and haplotype structure suggest that this genomic region has been submitted to a near complete selective sweep in all East Asian populations and only in this geographic area. The most extreme scores of the different selection tests are found within a smaller 45 kb region that contains VKORC1 and three other genes (BCKDK, MYST1 (KAT8), and PRSS8) with different functions. Because of the strong linkage disequilibrium, it is not possible to determine if VKORC1 or one of the three other genes is the target of this strong positive selection that could explain present-day differences among human populations in AVK dose requirement. Our results show that the extended region surrounding a presumable single target of positive selection should be analyzed for genetic variation in a wide range of genetically diverse populations in order to account for other neighboring and confounding selective events and the hitchhiking effect.
Previously we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative enabling exploitation...
Williamson, Scott H.; Hernandez, Ryan; Fledel-Alon, Adi
for patterns of polymorphism in the presence of both population size change and natural selection. If data are available from different functional classes of variation, and a priori information suggests that mutations in one of those classes are selectively neutral, then the putatively neutral class can...... this method to a large polymorphism data set from 301 human genes and find (i) widespread negative selection acting on standing nonsynonymous variation, (ii) that the fitness effects of nonsynonymous mutations are well predicted by several measures of amino acid exchangeability, especially site......-specific methods, and (iii) strong evidence for very recent population growth....
Polyadenylation of pre-mRNAs, a critical step in eukaryotic gene expression, is mediated by cis elements collectively called the polyadenylation signal. Genome-wide analysis of such polyadenylation signals was missing in fission yeast, even though it is an important model organism. We demonstrate that the canonical AATAAA motif is the most frequent and functional polyadenylation signal in Schizosaccharomyces pombe. Using analysis of RNA-Seq data sets from cells grown under various physiological conditions, we identify 3\\' UTRs for nearly 90% of the yeast genes. Heterogeneity of cleavage sites is common, as is alternative polyadenylation within and between conditions. We validated the computationally identified sequence elements likely to promote polyadenylation by functional assays, including qRT-PCR and 3\\'RACE analysis. The biological importance of the AATAAA motif is underlined by functional analysis of the genes containing it. Furthermore, it has been shown that convergent genes require trans elements, like cohesin for efficient transcription termination. Here we show that convergent genes lacking cohesin (on chromosome 2) are generally associated with longer overlapping mRNA transcripts. Our bioinformatic and experimental genome-wide results are summarized and can be accessed and customized in a user-friendly database Pomb(A).
Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088
Kijas, James W; Lenstra, Johannes A; Hayes, Ben; Boitard, Simon; Porto Neto, Laercio R; San Cristobal, Magali; Servin, Bertrand; McCulloch, Russell; Whan, Vicki; Gietzen, Kimberly; Paiva, Samuel; Barendse, William; Ciani, Elena; Raadsma, Herman; McEwan, John; Dalrymple, Brian
Through their domestication and subsequent selection, sheep have been adapted to thrive in a diverse range of environments. To characterise the genetic consequence of both domestication and selection, we genotyped 49,034 SNP in 2,819 animals from a diverse collection of 74 sheep breeds. We find the majority of sheep populations contain high SNP diversity and have retained an effective population size much higher than most cattle or dog breeds, suggesting domestication occurred from a broad genetic base. Extensive haplotype sharing and generally low divergence time between breeds reveal frequent genetic exchange has occurred during the development of modern breeds. A scan of the genome for selection signals revealed 31 regions containing genes for coat pigmentation, skeletal morphology, body size, growth, and reproduction. We demonstrate the strongest selection signal has occurred in response to breeding for the absence of horns. The high density map of genetic variability provides an in-depth view of the genetic history for this important livestock species.
James W Kijas
Full Text Available Through their domestication and subsequent selection, sheep have been adapted to thrive in a diverse range of environments. To characterise the genetic consequence of both domestication and selection, we genotyped 49,034 SNP in 2,819 animals from a diverse collection of 74 sheep breeds. We find the majority of sheep populations contain high SNP diversity and have retained an effective population size much higher than most cattle or dog breeds, suggesting domestication occurred from a broad genetic base. Extensive haplotype sharing and generally low divergence time between breeds reveal frequent genetic exchange has occurred during the development of modern breeds. A scan of the genome for selection signals revealed 31 regions containing genes for coat pigmentation, skeletal morphology, body size, growth, and reproduction. We demonstrate the strongest selection signal has occurred in response to breeding for the absence of horns. The high density map of genetic variability provides an in-depth view of the genetic history for this important livestock species.
Westenberg, Marcel; Soedling, Helen M; Mann, Derek A; Nicholson, Linda J; Dolphin, Colin T
Recombineering is employed to modify large DNA clones such as fosmids, BACs and PACs. Subtle and seamless modifications can be achieved using counter-selection strategies in which a donor cassette carrying both positive and negative markers inserted in the target clone is replaced by the desired sequence change. We are applying counter-selection recombineering to modify bacmid bMON14272, a recombinant baculoviral genome, as we wish to engineer the virus into a therapeutically useful gene delivery vector with cell targeting characteristics. Initial attempts to replace gp64 with Fusion (F) genes from other baculoviruses resulted in many rearranged clones in which the counter-selection cassette had been deleted. Bacmid bMON14272 contains nine highly homologous regions (hrs) and deletions were mapped to recombination between hr pairs. Recombineering modifications were attempted to decrease intramolecular recombination and/or increase recombineering efficiency. Of these only the use of longer homology arms on the donor molecule proved effective permitting seamless modification. bMON14272, because of the presence of the hr sequences, can be considered equivalent to a highly repetitive BAC and, as such, the optimized method detailed here should prove useful to others applying counter-selection recombineering to modify BACs or PACs containing similar regions of significant repeating homologies.
Garlapow, Megan E; Everett, Logan J; Zhou, Shanshan; Gearhart, Alexander W; Fay, Kairsten A; Huang, Wen; Morozova, Tatiana V; Arya, Gunjan H; Turlapati, Lavanya; St Armour, Genevieve; Hussain, Yasmeen N; McAdams, Sarah E; Fochler, Sophia; Mackay, Trudy F C
Food consumption is an essential component of animal fitness; however, excessive food intake in humans increases risk for many diseases. The roles of neuroendocrine feedback loops, food sensing modalities, and physiological state in regulating food intake are well understood, but not the genetic basis underlying variation in food consumption. Here, we applied ten generations of artificial selection for high and low food consumption in replicate populations of Drosophila melanogaster. The phenotypic response to selection was highly asymmetric, with significant responses only for increased food consumption and minimal correlated responses in body mass and composition. We assessed the molecular correlates of selection responses by DNA and RNA sequencing of the selection lines. The high and low selection lines had variants with significantly divergent allele frequencies within or near 2081 genes and 3526 differentially expressed genes in one or both sexes. A total of 519 genes were both genetically divergent and differentially expressed between the divergent selection lines. We performed functional analyses of the effects of RNAi suppression of gene expression and induced mutations for 27 of these candidate genes that have human orthologs and the strongest statistical support, and confirmed that 25 (93 %) affected the mean and/or variance of food consumption.
Nery, Mariana F; Arroyo, José Ignacio; Opazo, Juan C
The hemoglobin of jawed vertebrates is a heterotetramer protein that contains two α- and two β-chains, which are encoded by members of α- and β-globin gene families. Given the hemoglobin role in mediating an adaptive response to chronic hypoxia, it is likely that this molecule may have experienced a selective pressure during the evolution of cetaceans, which have to deal with hypoxia tolerance during prolonged diving. This selective pressure could have generated a complex history of gene turnover in these clusters and/or changes in protein structure themselves. Accordingly, we aimed to characterize the genomic organization of α- and β-globin gene clusters in two cetacean species and to detect a possible role of positive selection on them using a phylogenetic framework. Maximum likelihood and Bayesian phylogeny reconstructions revealed that both cetacean species had retained a similar complement of putatively functional genes. For the α-globin gene cluster, the killer whale presents a complement of genes composed of HBZ, HBK, and two functional copies of HBA and HBQ genes, whereas the dolphin possesses HBZ, HBK, HBA and HBQ genes, and one HBA pseudogene. For the β-globin gene cluster, both species retained a complement of four genes, two early expressed genes-HBE and HBH-and two adult expressed genes-HBD and HBB. Our natural selection analysis detected two positively selected sites in the HBB gene (56 and 62) and four in HBA (15, 21, 49, 120). Interestingly, only the genes that are expressed during the adulthood showed the signature of positive selection.
Kadarmideen, Haja; Do, Duy Ngoc
not been fully explored. The future of livestock breeding focuses on both product quality and productivity, animal welfare, disease resistance and reducing environmental pollution. Among the breeding tools, molecular genetics and genomics and modern reproductive techniques such ovum-pick up and in vitro......Global livestock production has increased substantially during the last decades, in both number of animals and productivity. Meanwhile, the human population is projected to reach 9.6 billions by 2050 and most of the increase in the projection takes place in developing countries. Rapid population...... growth will increase the demand for food as well as animal products, particularly in emerging economic giants like India. Moreover, the urbanization has considerable impact on patterns of food consumption in general and on demand for livestock products, in particular and the increased income growth led...
Full Text Available Some mammals breed throughout the year, while others breed only at certain times of year. These differences in reproductive behavior can be explained by evolution. We identified positively-selected genes in two sets of species with different degrees of relatedness including seasonal and non-seasonal breeding species, using branch-site models. After stringent filtering by sum of pairs scoring, we revealed that more genes underwent positive selection in seasonal compared with non-seasonal breeding species. Positively-selected genes were verified by cDNA mapping of the positive sites with the corresponding cDNA sequences. The design of the evolutionary analysis can effectively lower the false-positive rate and thus identify valid positive genes. Validated, positively-selected genes, including CGA, DNAH1, INVS, and CD151, were related to reproductive behaviors such as spermatogenesis and cell proliferation in non-seasonal breeding species. Genes in seasonal breeding species, including THRAP3, TH1L, and CMTM6, may be related to the evolution of sperm and the circadian rhythm system. Identification of these positively-selected genes might help to identify the molecular mechanisms underlying seasonal and non-seasonal reproductive behaviors.
Steven E Schutzer
Full Text Available BACKGROUND: Burkholderia mallei is an understudied biothreat agent responsible for glanders which can be lethal in humans and animals. Research with this pathogen has been hampered in part by constraints of Select Agent regulations for safety reasons. Whole genomic sequencing (WGS is an apt approach to characterize newly discovered or poorly understood microbial pathogens. METHODOLOGY/PRINCIPAL FINDINGS: We performed WGS on a strain of B. mallei, SAVP1, previously pathogenic, that was experimentally infected in 6 equids (4 ponies, 1 mule, 1 donkey, natural hosts, for purposes of producing antibodies. Multiple high inocula were used in some cases. Unexpectedly SAVP1 appeared to be avirulent in the ponies and mule, and attenuated in the donkey, but induced antibodies. We determined the genome sequence of SAVP1 and compared it to a strain that was virulent in horses and a human. In comparison, this phenotypic avirulent SAVP1 strain was missing multiple genes including all the animal type III secretory system (T3SS complex of genes demonstrated to be essential for virulence in mice and hamster models. The loss of these genes in the SAVP1 strain appears to be the consequence of a multiple gene deletion across insertion sequence (IS elements in the B. mallei genome. Therefore, the strain by itself is unlikely to revert naturally to its virulent phenotype. There were other genes present in one strain and not the other and vice-versa. CONCLUSION/SIGNIFICANCE: The discovery that this strain of B. mallei was both avirulent in the natural host ponies, and did not possess T3SS associated genes may be fortuitous to advance biodefense research. The deleted virulence-essential T3SS is not likely to be re-acquired naturally. These findings may provide a basis for exclusion of SAVP1 from the Select Agent regulation or at least discussion of what else would be required for exclusion. This exclusion could accelerate research by investigators not possessing BSL-3
Esfandyari, Hadi; Sørensen, Anders Christian; Bijma, Piter
. Optimization of the GS model raises the question of whether marker effects should be estimated from data on the pure lines or crossbreds. Therefore, the first objective of this study was to compare response to selection of crossbreds by simulating a two-way crossbreeding program with either a purebred...... between quantitative trait loci (QTL) and single nucleotide polymorphisms (SNPs) can differ between breeds, which causes apparent effects of SNPs to be line-dependent. Thus, our second objective was to compare response to GS based on crossbred phenotypes when the line origin of alleles was taken...... into account or not in the estimation of breeding values. Results Training on crossbred animals yielded a larger response to selection in crossbred offspring compared to training on both pure lines separately or on both pure lines combined into a single reference population. Response to selection in crossbreds...
Calvin, W. M.; Titus, T. N.; Mahoney, S. A.
There is a long history of telescopic and spacecraft observations of the polar regions of Mars. The finely laminated ice deposits and surrounding layered terrains are commonly thought to contain a record of past climate conditions and change. Understanding the basic nature of the deposits and their mineral and ice constituents is a continued focus of current and future orbited missions. Unresolved issues in Martian polar science include a) the unusual nature of the CO2 ice deposits ("Swiss Cheese", "slab ice" etc.) b) the relationship of the ice deposits to underlying layered units (which differs from the north to the south), c) understanding the seasonal variations and their connections to the finely laminated units observed in high-resolution images and d) the relationship of dark materials in the wind-swept lanes and reentrant valleys to the surrounding dark dune and surface materials. Our work focuses on understanding these issues in relationship to the north residual ice cap. Recent work using Mars Global Surveyor (MGS) data sets have described evolution of the seasonal CO2 frost deposits. In addition, the north polar residual ice cap exhibits albedo variations between Mars years and within the summer season. The Thermal Emission Spectrometer (TES) data set can augment these observations providing additional constraints such as temperature evolution and spectral properties associated with ice and rocky materials. Exploration of these properties is the subject of our current study.
Jonci N Wolff
Full Text Available Numts are an integral component of many eukaryote genomes offering a snapshot of the evolutionary process that led from the incorporation of an α-proteobacterium into a larger eukaryotic cell some 1.8 billion years ago. Although numt sequence can be harnessed as molecular marker, these sequences often remain unidentified and are mistaken for genuine mtDNA leading to erroneous interpretation of mtDNA data sets. It is therefore indispensable that during the process of amplifying and sequencing mitochondrial genes, preventive measures are taken to ensure the exclusion of numts to guarantee the recovery of genuine mtDNA. This applies to mtDNA analyses in general but especially to studies where mtDNAs are sequenced de novo as the launch pad for subsequent mtDNA-based research. By using a combination of dilution series and nested rolling circle amplification (RCA, we present a novel strategy to selectively amplify mtDNA and exclude the amplification of numt sequence. We have successfully applied this strategy to de novo sequence the mtDNA of the Black Field Cricket Teleogryllus commodus, a species known to contain numts. Aligning our assembled sequence to the reference genome of Teleogryllus emma (GenBank EU557269.1 led to the identification of a numt sequence in the reference sequence. This unexpected result further highlights the need of a reliable and accessible strategy to eliminate this source of error.
Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas
of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing...
Lin, Michael F; Kheradpour, Pouya; Washietl, Stefan
synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian...
Robertson, David S; Prevost, A Toby; Bowden, Jack
The problem of selection bias has long been recognized in the analysis of two-stage trials, where promising candidates are selected in stage 1 for confirmatory analysis in stage 2. To efficiently correct for bias, uniformly minimum variance conditionally unbiased estimators (UMVCUEs) have been proposed for a wide variety of trial settings, but where the population parameter estimates are assumed to be independent. We relax this assumption and derive the UMVCUE in the multivariate normal setting with an arbitrary known covariance structure. One area of application is the estimation of odds ratios (ORs) when combining a genome-wide scan with a replication study. Our framework explicitly accounts for correlated single nucleotide polymorphisms, as might occur due to linkage disequilibrium. We illustrate our approach on the measurement of the association between 11 genetic variants and the risk of Crohn's disease, as reported in Parkes and others (2007. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat. Gen. 39: (7), 830-832.), and show that the estimated ORs can vary substantially if both selection and correlation are taken into account.
Wendt, Toni; Holm, Preben Bach; Starker, Colby G; Christian, Michelle; Voytas, Daniel F; Brinch-Pedersen, Henrik; Holme, Inger Bæksted
Transcription activator-like effector nucleases (TALENs) enable targeted mutagenesis in a variety of organisms. The primary advantage of TALENs over other sequence-specific nucleases, namely zinc finger nucleases and meganucleases, lies in their ease of assembly, reliability of function, and their broad targeting range. Here we report the assembly of several TALENs for a specific genomic locus in barley. The cleavage activity of individual TALENs was first tested in vivo using a yeast-based, single-strand annealing assay. The most efficient TALEN was then selected for barley transformation. Analysis of the resulting transformants showed that TALEN-induced double strand breaks led to the introduction of short deletions at the target site. Additional analysis revealed that each barley transformant contained a range of different mutations, indicating that mutations occurred independently in different cells.
López-Wilchis, Ricardo; Del Río-Portilla, Miguel Ángel; Guevara-Chumacero, Luis Manuel
We described the complete mitochondrial genome (mitogenome) of the Wagner's mustached bat, Pteronotus personatus, a species belonging to the family Mormoopidae, and compared it with other published mitogenomes of bats (Chiroptera). The mitogenome of P. personatus was 16,570 bp long and contained a typically conserved structure including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (D-loop). Most of the genes were encoded on the H-strand, except for eight tRNA and the ND6 genes. The order of protein-coding and rRNA genes was highly conserved in all mitogenomes. All protein-coding genes started with an ATG codon, except for ND2, ND3, and ND5, which initiated with ATA, and terminated with the typical stop codon TAA/TAG or the codon AGA. Phylogenetic trees constructed using Maximum Parsimony, Maximum Likelihood, and Bayesian inference methods showed an identical topology and indicated the monophyly of different families of bats (Mormoopidae, Phyllostomidae, Vespertilionidae, Rhinolophidae, and Pteropopidae) and the existence of two major clades corresponding to the suborders Yangochiroptera and Yinpterochiroptera. The mitogenome sequence provided here will be useful for further phylogenetic analyses and population genetic studies in mormoopid bats.
Full Text Available The objective of this study was to evaluate the usefulness of comprehensive chromosome screening (CCS using array comparative genomic hybridization (aCGH. The study included 1420 CCS cycles for recurrent miscarriage (n=203; repetitive implantation failure (n=188; severe male factor (n=116; previous trisomic pregnancy (n=33; and advanced maternal age (n=880. CCS was performed in cycles with fresh oocytes and embryos (n=774; mixed cycles with fresh and vitrified oocytes (n=320; mixed cycles with fresh and vitrified day-2 embryos (n=235; and mixed cycles with fresh and vitrified day-3 embryos (n=91. Day-3 embryo biopsy was performed and analyzed by aCGH followed by day-5 embryo transfer. Consistent implantation (range: 40.5–54.2% and pregnancy rates per transfer (range: 46.0–62.9% were obtained for all the indications and independently of the origin of the oocytes or embryos. However, a lower delivery rate per cycle was achieved in women aged over 40 years (18.1% due to the higher percentage of aneuploid embryos (85.3% and lower number of cycles with at least one euploid embryo available per transfer (40.3%. We concluded that aneuploidy is one of the major factors which affect embryo implantation.
Beaulieu, J; Doerksen, T; Clément, S; MacKay, J; Bousquet, J
Genomic selection (GS) is of interest in breeding because of its potential for predicting the genetic value of individuals and increasing genetic gains per unit of time. To date, very few studies have reported empirical results of GS potential in the context of large population sizes and long breeding cycles such as for boreal trees. In this study, we assessed the effectiveness of marker-aided selection in an undomesticated white spruce (Picea glauca (Moench) Voss) population of large effective size using a GS approach. A discovery population of 1694 trees representative of 214 open-pollinated families from 43 natural populations was phenotyped for 12 wood and growth traits and genotyped for 6385 single-nucleotide polymorphisms (SNPs) mined in 2660 gene sequences. GS models were built to predict estimated breeding values using all the available SNPs or SNP subsets of the largest absolute effects, and they were validated using various cross-validation schemes. The accuracy of genomic estimated breeding values (GEBVs) varied from 0.327 to 0.435 when the training and the validation data sets shared half-sibs that were on average 90% of the accuracies achieved through traditionally estimated breeding values. The trend was also the same for validation across sites. As expected, the accuracy of GEBVs obtained after cross-validation with individuals of unknown relatedness was lower with about half of the accuracy achieved when half-sibs were present. We showed that with the marker densities used in the current study, predictions with low to moderate accuracy could be obtained within a large undomesticated population of related individuals, potentially resulting in larger gains per unit of time with GS than with the traditional approach.
Bustamente, Carlos D.; Fledel-Alon, Adi; Williamson, Scott
Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection 1, 2, 3, 4 . The extent to which weak negative and positive darwini......, show an excess of rapidly evolving genes, whereas others, such as cytoskeletal proteins, show an excess of genes with extensive amino acid polymorphism within humans and yet little amino acid divergence between humans and chimpanzees....
Teng, Shaolei; Yang, Jack Y; Wang, Liangjiang
Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression.
Moura, Andre E; Kenny, John G; Chaudhuri, Roy; Hughes, Margaret A; J Welch, Andreanna; Reisinger, Ryan R; de Bruyn, P J Nico; Dahlheim, Marilyn E; Hall, Neil; Hoelzel, A Rus
The evolution of diversity in the marine ecosystem is poorly understood, given the relatively high potential for connectivity, especially for highly mobile species such as whales and dolphins. The killer whale (Orcinus orca) has a worldwide distribution, and individual social groups travel over a wide geographic range. Even so, regional populations have been shown to be genetically differentiated, including among different foraging specialists (ecotypes) in sympatry. Given the strong matrifocal social structure of this species together with strong resource specializations, understanding the process of differentiation will require an understanding of the relative importance of both genetic drift and local adaptation. Here we provide a high-resolution analysis based on nuclear single-nucleotide polymorphic markers and inference about differentiation at both neutral loci and those potentially under selection. We find that all population comparisons, within or among foraging ecotypes, show significant differentiation, including populations in parapatry and sympatry. Loci putatively under selection show a different pattern of structure compared to neutral loci and are associated with gene ontology terms reflecting physiologically relevant functions (e.g. related to digestion). The pattern of differentiation for one ecotype in the North Pacific suggests local adaptation and shows some fixed differences among sympatric ecotypes. We suggest that differential habitat use and resource specializations have promoted sufficient isolation to allow differential evolution at neutral and functional loci, but that the process is recent and dependent on both selection and drift.
Dahl, Fredrik; Gullberg, Mats; Stenberg, Johan; Landegren, Ulf; Nilsson, Mats
We present a method to specifically select large sets of DNA sequences for parallel amplification by PCR using target-specific oligonucleotide constructs, so-called selectors. The selectors are oligonucleotide duplexes with single-stranded target-complementary end-sequences that are linked by a general sequence motif. In the selection process, a pool of selectors is combined with denatured restriction digested DNA. Each selector hybridizes to its respective target, forming individual circular complexes that are covalently closed by enzymatic ligation. Non-circularized fragments are removed by exonucleolysis, enriching for the selected fragments. The general sequence that is introduced into the circularized fragments allows them to be amplified in parallel using a universal primer pair. The procedure avoids amplification artifacts associated with conventional multiplex PCR where two primers are used for each target, thereby reducing the number of amplification reactions needed for investigating large sets of DNA sequences. We demonstrate the specificity, reproducibility and flexibility of this process by performing a 96-plex amplification of an arbitrary set of specific DNA sequences, followed by hybridization to a cDNA microarray. Eighty-nine percent of the selectors generated PCR products that hybridized to the expected positions on the array, while little or no amplification artifacts were observed. PMID:15860768
韩晓光; 赵志军; 蔡郁知
在同一模拟系统中同时使用MGS与VR-Forces平台,存在实体信息格式转换问题.为实现在MGS平台中显示VR-Forces中实体信息,分析了VR-Forces平台中实体信息的组织方式,基于XML技术构建了VR-Forces平台与MGS平台间的实体ID编码映射关系,实现了VR-Forces平台中实体经纬度、首向角、敌我关系等属性信息的获取方法,解决了基于MGS开发的模拟仿真训练程序与基于VR-Forces开发的模拟训练程序共存的问题,为后续具有类似功能需求的模拟系统开发提供了技术支持.%To use MGS and VR-Forces in the same simulation system, there is an entity information transform problem. To display VR-Forces entity information on MGS, the entity in heritance structure of VR-Forces is analyzed, building a Mapping Table of entity ID between VR-Forces and MGS platforms on XML technology. To achieve the methods to get latitude,heading,the enemy relationship from VR-Forces, the coexistence problem of MGS and VR-Forces are solved. The technical support for the development of simulation system with the similar function is provided.
Caballero, Susana; Duchêne, Sebastian; Garavito, Manuel F.; Slikas, Beth; Baker, C. Scott
A small number of cetaceans have adapted to an entirely freshwater environment, having colonized rivers in Asia and South America from an ancestral origin in the marine environment. This includes the ‘river dolphins’, early divergence from the odontocete lineage, and two species of true dolphins (Family Delphinidae). Successful adaptation to the freshwater environment may have required increased demands in energy involved in processes such as the mitochondrial osmotic balance. For this reason, riverine odontocetes provide a compelling natural experiment in adaptation of mammals from marine to freshwater habitats. Here we present initial evidence of positive selection in the NADH dehydrogenase subunit 2 of riverine odontocetes by analyses of full mitochondrial genomes, using tests of selection and protein structure modeling. The codon model with highest statistical support corresponds to three discrete categories for amino acid sites, those under positive, neutral, and purifying selection. With this model we found positive selection at site 297 of the NADH dehydrogenase subunit 2 (dN/dS>1.0,) leading to a substitution of an Ala or Val from the ancestral state of Thr. A phylogenetic reconstruction of 27 cetacean mitogenomes showed that an Ala substitution has evolved at least four times in cetaceans, once or more in the three ‘river dolphins’ (Families Pontoporidae, Lipotidae and Inidae), once in the riverine Sotalia fluviatilis (but not in its marine sister taxa), once in the riverine Orcaella brevirostris from the Mekong River (but not in its marine sister taxa) and once in two other related marine dolphins. We located the position of this amino acid substitution in an alpha-helix channel in the trans-membrane domain in both the E. coli structure and Sotalia fluviatilis model. In E. coli this position is located in a helix implicated in a proton translocation channel of respiratory complex 1 and may have a similar role in the NADH dehydrogenases of
Full Text Available A small number of cetaceans have adapted to an entirely freshwater environment, having colonized rivers in Asia and South America from an ancestral origin in the marine environment. This includes the 'river dolphins', early divergence from the odontocete lineage, and two species of true dolphins (Family Delphinidae. Successful adaptation to the freshwater environment may have required increased demands in energy involved in processes such as the mitochondrial osmotic balance. For this reason, riverine odontocetes provide a compelling natural experiment in adaptation of mammals from marine to freshwater habitats. Here we present initial evidence of positive selection in the NADH dehydrogenase subunit 2 of riverine odontocetes by analyses of full mitochondrial genomes, using tests of selection and protein structure modeling. The codon model with highest statistical support corresponds to three discrete categories for amino acid sites, those under positive, neutral, and purifying selection. With this model we found positive selection at site 297 of the NADH dehydrogenase subunit 2 (dN/dS>1.0, leading to a substitution of an Ala or Val from the ancestral state of Thr. A phylogenetic reconstruction of 27 cetacean mitogenomes showed that an Ala substitution has evolved at least four times in cetaceans, once or more in the three 'river dolphins' (Families Pontoporidae, Lipotidae and Inidae, once in the riverine Sotalia fluviatilis (but not in its marine sister taxa, once in the riverine Orcaella brevirostris from the Mekong River (but not in its marine sister taxa and once in two other related marine dolphins. We located the position of this amino acid substitution in an alpha-helix channel in the trans-membrane domain in both the E. coli structure and Sotalia fluviatilis model. In E. coli this position is located in a helix implicated in a proton translocation channel of respiratory complex 1 and may have a similar role in the NADH dehydrogenases of
Yang, Dong; Zhu, Xiangcheng; Wu, Xueyun; Feng, Zhiyang; Huang, Lei; Shen, Ben; Xu, Zhinan
iso-Migrastatin (iso-MGS) has been actively pursued recently as an outstanding candidate of antimetastasis agents. Having characterized the iso-MGS biosynthetic gene cluster from its native producer Streptomyces platensis NRRL 18993, we have recently succeeded in producing iso-MGS in five selected heterologous Streptomyces hosts, albeit the low titers failed to meet expectations and cast doubt on the utility of this novel technique for large-scale production. To further explore and capitalize on the production capacity of these hosts, a thorough investigation of these five engineered strains with three fermentation media for iso-MGS production was undertaken. Streptomyces albus J1074 and Streptomyces lividans K4-114 were found to be preferred heterologous hosts, and subsequent analysis of carbon and nitrogen sources revealed that sucrose and yeast extract were ideal for iso-MGS production. After the initial optimization, the titers of iso-MGS in all five hosts were considerably improved by 3–18-fold in the optimized R2YE medium. Furthermore, the iso-MGS titer of S. albus J1074 (pBS11001) was significantly improved to 186.7 mg/L by a hybrid medium strategy. Addition of NaHCO3 to the latter finally afforded an optimized iso-MGS titer of 213.8 mg/L, about 5-fold higher than the originally reported system. With S. albus J1074 (pBS11001) as a model host, the expression of iso-MGS gene cluster in four different media was systematically studied via the quantitative RT–PCR technology. The resultant comparison revealed the correlation of gene expression and iso-MGS production for the first time; synchronous expression of the whole gene cluster was crucial for optimal iso-MGS production. These results reveal new insights into the iso-MGS biosynthetic machinery in heterologous hosts and provide the primary data to realize large-scale production of iso-MGS for further preclinical studies. PMID:21132287
Rasmussen, L H; Dargis, R; Højholt, K; Christensen, J J; Skovgaard, O; Justesen, U S; Rosenvinge, F S; Moser, C; Lukjancenko, O; Rasmussen, S; Nielsen, X C
Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed the most distinct clustering.
Full Text Available The Lohmann Selected Leghorn (LSL and Lohmann Brown (LB layer lines have been selected for high egg production since more than 50 years and belong to the worldwide leading commercial layer lines. The objectives of the present study were to characterize the molecular processes that are different among these two layer lines using whole genome RNA expression profiles. The hens were kept in the newly developed small group housing system Eurovent German with two different group sizes. Differential expression was observed for 6,276 microarray probes (FDR adjusted P-value <0.05 among the two layer lines LSL and LB. A 2-fold or greater change in gene expression was identified on 151 probe sets. In LSL, 72 of the 151 probe sets were up- and 79 of them were down-regulated. Gene ontology (GO enrichment analysis accounting for biological processes evinced 18 GO-terms for the 72 probe sets with higher expression in LSL, especially those taking part in immune system processes and membrane organization. A total of 32 enriched GO-terms were determined among the 79 down-regulated probe sets of LSL. Particularly, these terms included phosphorus metabolic processes and signaling pathways. In conclusion, the phenotypic differences among the two layer lines LSL and LB are clearly reflected in their gene expression profiles of the cerebrum. These novel findings provide clues for genes involved in economically important line characteristics of commercial laying hens.
Tollenaere, C; Duplantier, J-M; Rahalison, L; Ranjalahy, M; Brouat, C
The black rat (Rattus rattus) is the main reservoir of plague (Yersinia pestis infection) in Madagascar's rural zones. Black rats are highly resistant to plague within the plague focus (central highland), whereas they are susceptible where the disease is absent (low altitude zone). To better understand plague wildlife circulation and host evolution in response to a highly virulent pathogen, we attempted to determine genetic markers associated with plague resistance in this species. To this purpose, we combined a population genomics approach and an association study, both performed on 249 AFLP markers, in Malagasy R. rattus. Simulated distributions of genetic differentiation were compared to observed data in four independent pairs, each consisting of one population from the plague focus and one from the plague-free zone. We found 22 loci (9% of 249) with higher differentiation in at least two independent population pairs or with combining P-values over the four pairs significant. Among the 22 outlier loci, 16 presented significant association with plague zone (plague focus vs. plague-free zone). Population genetic structure inferred from outlier loci was structured by plague zone, whereas the neutral loci dataset revealed structure by geography (eastern vs. western populations). A phenotype association study revealed that two of the 22 loci were significantly associated with differentiation between dying and surviving rats following experimental plague challenge. The 22 outlier loci identified in this study may undergo plague selective pressure either directly or more probably indirectly due to hitchhiking with selected loci. © 2010 Blackwell Publishing Ltd.
Full Text Available Abstract Background Human genetic variation produces the wide range of phenotypic differences that make us individual. However, little is known about the distribution of variation in the most conserved functional regions of the human genome. We examined whether different subsets of the conserved human genome have been subjected to similar levels of selective constraint within the human population. We used set theory and high performance computing to carry out an analysis of the density of Single Nucleotide Polymorphisms (SNPs within the evolutionary conserved human genome, at three different selective stringencies, intersected with exonic, intronic and intergenic coordinates. Results We demonstrate that SNP density across the genome is significantly reduced in conserved human sequences. Unexpectedly, we further demonstrate that, despite being conserved to the same degree, SNP density differs significantly between conserved subsets. Thus, both the conserved exonic and intronic genomes contain a significantly reduced density of SNPs compared to the conserved intergenic component. Furthermore the intronic and exonic subsets contain almost identical densities of SNPs indicating that they have been constrained to the same degree. Conclusion Our findings suggest the presence of a selective linkage between the exonic and intronic subsets and ascribes increased significance to the role of introns in human health. In addition, the identification of increased plasticity within the conserved intergenic subset suggests an important role for this subset in the adaptation and diversification of the human population.
Buch, Line Hjortø; Kargo, Morten; Berg, Peer;
Today, almost all reference populations consist of progeny tested bulls. However, older progeny tested bulls do not have reliable estimated breeding values (EBV) for new traits. Thus, to be able to select for these new traits, it is necessary to build a reference population. We used a deterministic...... prediction model to test the hypothesis that the value of cows in reference populations depends on the availability of phenotypic records. To test the hypothesis, we investigated different strategies of building a reference population for a new functional trait over a 10-year period. The trait was either...... recorded on a large scale (30 000 cows per year) or on a small scale (2000 cows per year). For large-scale recording, we compared four scenarios where the reference population consisted of 30 sires; 30 sires and 170 test bulls; 30 sires and 2000 cows; or 30 sires, 2000 cows and 170 test bulls in the first...
Full Text Available Abstract Background Rapid response to selection was previously observed in mice selected for high levels of inter-male aggression based on number of attacks displayed in a novel social interaction test after isolation housing. Attack levels in this high aggression line (NC900 increased significantly within just four generations of selective breeding, suggesting the presence of a locus with large effect. We conducted an experiment using a small (n ≈ 100 F2 cross between the ICR-derived, non-inbred NC900 strain and the low aggression inbred strain C57BL/6J, genotyped for 154 fully informative SNPs, to determine if a locus with large effect controls the high-aggression selection trait. A second goal was to use high density SNP genotyping (n = 549,000 in the parental strains to characterize residual patterns of heterozygosity within NC900, and evaluate regions that are identical by descent (IBD between NC900 and C57BL/6J, to determine what impacts these may have on accuracy and resolution of quantitative trait locus (QTL mapping in the F2 cross. Results No evidence for a locus with major effect on aggressive behavior in mice was identified. However, several QTL with genomewide significance were mapped for aggression on chromosomes 7 and 19 and other social behavior traits on chromosomes 4, 7, 14, and 19. High density genotyping revealed that 28% of the genome is still segregating among the six NC900 females used to originate the F2 cross, and that segregating regions are present on every chromosome but are of widely different sizes. Regions of IBD between NC900 and C57BL/6J are found on every chromosome but are most prominent on chromosomes 10, 16 and X. No significant differences were found for amounts of heterozygosity or prevalence of IBD in QTL regions relative to global analysis. Conclusions While no major gene was identified to explain the rapid selection response in the NC900 line, transgressive variation (i.e. where the allele from the C57
Full Text Available Mycoplasma bovis is a major pathogen causing arthritis, respiratory disease and mastitis in cattle. A better understanding of its genetic features and evolution might represent evidences of surviving host environments. In this study, multiple factors influencing synonymous codon usage patterns in M. bovis (three strains' genomes were analyzed. The overall nucleotide content of genes in the M. bovis genome is AT-rich. Although the G and C contents at the third codon position of genes in the leading strand differ from those in the lagging strand (p<0.05, the 59 synonymous codon usage patterns of genes in the leading strand are highly similar to those in the lagging strand. The over-represented codons and the under-represented codons were identified. A comparison of the synonymous codon usage pattern of M. bovis and cattle (susceptible host indicated the independent formation of synonymous codon usage of M. bovis. Principal component analysis revealed that (i strand-specific mutational bias fails to affect the synonymous codon usage pattern in the leading and lagging strands, (ii mutation pressure from nucleotide content plays a role in shaping the overall codon usage, and (iii the major trend of synonymous codon usage has a significant correlation with the gene expression level that is estimated by the codon adaptation index. The plot of the effective number of codons against the G+C content at the third codon position also reveals that mutation pressure undoubtedly contributes to the synonymous codon usage pattern of M. bovis. Additionally, the formation of the overall codon usage is determined by certain evolutionary selections for gene function classification (30S protein, 50S protein, transposase, membrane protein, and lipoprotein and translation elongation region of genes in M. bovis. The information could be helpful in further investigations of evolutionary mechanisms of the Mycoplasma family and heterologous expression of its functionally
Full Text Available We have performed a metabolite quantitative trait locus (mQTL study of the (1H nuclear magnetic resonance spectroscopy ((1H NMR metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by (1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs. Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10(-11
genomic regions. Two of the three hit regions lie within haplotype blocks (at 2p13.1 and 10q24.2 that carry the genetic signature of strong, recent, positive selection in European populations. Genes NAT8 and PYROXD2, both with relatively uncharacterized functional roles, are good candidates for mediating the corresponding mQTL associations. The study's longitudinal twin design allowed detailed variance-components analysis of the sources of population variation in metabolite levels. The mQTLs explained 40%-64% of biological population variation in the corresponding metabolites' concentrations. These effect sizes are stronger than those reported in a recent, targeted mQTL study of metabolites in serum using the targeted-metabolomics Biocrates platform. By re-analysing our plasma samples using the Biocrates platform, we replicated the mQTL findings of the previous study and discovered a previously uncharacterized yet substantial familial component of variation in metabolite levels in addition to the heritability contribution from
Full Text Available Domestication and selection for important performance traits can impact the genome, which is most often reflected by reduced heterozygosity in and surrounding genes related to traits affected by selection. In this study, analysis of the genomic impact caused by domestication and artificial selection was conducted by investigating the signatures of selection using single nucleotide polymorphisms (SNPs in channel catfish (Ictalurus punctatus. A total of 8.4 million candidate SNPs were identified by using next generation sequencing. On average, the channel catfish genome harbors one SNP per 116 bp. Approximately 6.6 million, 5.3 million, 4.9 million, 7.1 million and 6.7 million SNPs were detected in the Marion, Thompson, USDA103, Hatchery strain, and wild population, respectively. The allele frequencies of 407,861 SNPs differed significantly between the domestic and wild populations. With these SNPs, 23 genomic regions with putative selective sweeps were identified that included 11 genes. Although the function for the majority of the genes remain unknown in catfish, several genes with known function related to aquaculture performance traits were included in the regions with selective sweeps. These included hypoxia-inducible factor 1β. HIFιβ.. and the transporter gene ATP-binding cassette sub-family B member 5 (ABCB5. HIF1β. is important for response to hypoxia and tolerance to low oxygen levels is a critical aquaculture trait. The large numbers of SNPs identified from this study are valuable for the development of high-density SNP arrays for genetic and genomic studies of performance traits in catfish.
Sun, Luyang; Liu, Shikai; Wang, Ruijia; Jiang, Yanliang; Zhang, Yu; Zhang, Jiaren; Bao, Lisui; Kaltenboeck, Ludmilla; Dunham, Rex; Waldbieser, Geoff; Liu, Zhanjiang
Domestication and selection for important performance traits can impact the genome, which is most often reflected by reduced heterozygosity in and surrounding genes related to traits affected by selection. In this study, analysis of the genomic impact caused by domestication and artificial selection was conducted by investigating the signatures of selection using single nucleotide polymorphisms (SNPs) in channel catfish (Ictalurus punctatus). A total of 8.4 million candidate SNPs were identified by using next generation sequencing. On average, the channel catfish genome harbors one SNP per 116 bp. Approximately 6.6 million, 5.3 million, 4.9 million, 7.1 million and 6.7 million SNPs were detected in the Marion, Thompson, USDA103, Hatchery strain, and wild population, respectively. The allele frequencies of 407,861 SNPs differed significantly between the domestic and wild populations. With these SNPs, 23 genomic regions with putative selective sweeps were identified that included 11 genes. Although the function for the majority of the genes remain unknown in catfish, several genes with known function related to aquaculture performance traits were included in the regions with selective sweeps. These included hypoxia-inducible factor 1β· HIFιβ ¨ and the transporter gene ATP-binding cassette sub-family B member 5 (ABCB5). HIF1β· is important for response to hypoxia and tolerance to low oxygen levels is a critical aquaculture trait. The large numbers of SNPs identified from this study are valuable for the development of high-density SNP arrays for genetic and genomic studies of performance traits in catfish. PMID:25313648
Cohn Zachary A
Full Text Available Abstract Background Cartilage plays a fundamental role in the development of the human skeleton. Early in embryogenesis, mesenchymal cells condense and differentiate into chondrocytes to shape the early skeleton. Subsequently, the cartilage anlagen differentiate to form the growth plates, which are responsible for linear bone growth, and the articular chondrocytes, which facilitate joint function. However, despite the multiplicity of roles of cartilage during human fetal life, surprisingly little is known about its transcriptome. To address this, a whole genome microarray expression profile was generated using RNA isolated from 18–22 week human distal femur fetal cartilage and compared with a database of control normal human tissues aggregated at UCLA, termed Celsius. Results 161 cartilage-selective genes were identified, defined as genes significantly expressed in cartilage with low expression and little variation across a panel of 34 non-cartilage tissues. Among these 161 genes were cartilage-specific genes such as cartilage collagen genes and 25 genes which have been associated with skeletal phenotypes in humans and/or mice. Many of the other cartilage-selective genes do not have established roles in cartilage or are novel, unannotated genes. Quantitative RT-PCR confirmed the unique pattern of gene expression observed by microarray analysis. Conclusion Defining the gene expression pattern for cartilage has identified new genes that may contribute to human skeletogenesis as well as provided further candidate genes for skeletal dysplasias. The data suggest that fetal cartilage is a complex and transcriptionally active tissue and demonstrate that the set of genes selectively expressed in the tissue has been greatly underestimated.
Minelli, Cosetta; De Grandi, Alessandro; Weichenberger, Christian X; Gögele, Martin; Modenese, Mirko; Attia, John; Barrett, Jennifer H; Boehnke, Michael; Borsani, Giuseppe; Casari, Giorgio; Fox, Caroline S; Freina, Thomas; Hicks, Andrew A; Marroni, Fabio; Parmigiani, Giovanni; Pastore, Andrea; Pattaro, Cristian; Pfeufer, Arne; Ruggeri, Fabrizio; Schwienbacher, Christine; Taliun, Daniel; Pramstaller, Peter P; Domingues, Francisco S; Thompson, John R
Biological plausibility and other prior information could help select genome-wide association (GWA) findings for further follow-up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts' opinions and empirical evidence to estimate the relative importance of 15 types of information at the single-nucleotide polymorphism (SNP) and gene levels. Opinions were elicited from 10 experts using a two-round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNPs established as being associated with seven disease traits through GWA meta-analysis and independent replication, with the corresponding frequency in a randomly selected set of SNPs. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta-analysis or more than one study as conferring the highest relative probability of true association, whereas previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, although location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research.
Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Raychoudhury, Rhitoban; Lavrov, Dennis V.; Werren, John H.
We sequenced the nearly complete mtDNA of 3 species of parasitic wasps, Nasonia vitripennis (2 strains), Nasonia giraulti, and Nasonia longicornis, including all 13 protein-coding genes and the 2 rRNAs, and found unusual patterns of mitochondrial evolution. The Nasonia mtDNA has a unique gene order compared with other insect mtDNAs due to multiple rearrangements. The mtDNAs of these wasps also show nucleotide substitution rates over 30 times faster than nuclear protein-coding genes, indicating among the highest substitution rates found in animal mitochondria (normally <10 times faster). A McDonald and Kreitman test shows that the between-species frequency of fixed replacement sites relative to silent sites is significantly higher compared with within-species polymorphisms in 2 mitochondrial genes of Nasonia, atp6 and atp8, indicating directional selection. Consistent with this interpretation, the Ka/Ks (nonsynonymous/synonymous substitution rates) ratios are higher between species than within species. In contrast, cox1 shows a signature of purifying selection for amino acid sequence conservation, although rates of amino acid substitutions are still higher than for comparable insects. The mitochondrial-encoded polypeptides atp6 and atp8 both occur in F0F1ATP synthase of the electron transport chain. Because malfunction in this fundamental protein severely affects fitness, we suggest that the accelerated accumulation of replacements is due to beneficial mutations necessary to compensate mild-deleterious mutations fixed by random genetic drift or Wolbachia sweeps in the fast evolving mitochondria of Nasonia. We further propose that relatively high rates of amino acid substitution in some mitochondrial genes can be driven by a “Compensation-Draft Feedback”; increased fixation of mildly deleterious mutations results in selection for compensatory mutations, which lead to fixation of additional deleterious mutations in nonrecombining mitochondrial genomes, thus
Oliveira, Deodoro C S G; Raychoudhury, Rhitoban; Lavrov, Dennis V; Werren, John H
We sequenced the nearly complete mtDNA of 3 species of parasitic wasps, Nasonia vitripennis (2 strains), Nasonia giraulti, and Nasonia longicornis, including all 13 protein-coding genes and the 2 rRNAs, and found unusual patterns of mitochondrial evolution. The Nasonia mtDNA has a unique gene order compared with other insect mtDNAs due to multiple rearrangements. The mtDNAs of these wasps also show nucleotide substitution rates over 30 times faster than nuclear protein-coding genes, indicating among the highest substitution rates found in animal mitochondria (normally mitochondrial genes of Nasonia, atp6 and atp8, indicating directional selection. Consistent with this interpretation, the Ka/Ks (nonsynonymous/synonymous substitution rates) ratios are higher between species than within species. In contrast, cox1 shows a signature of purifying selection for amino acid sequence conservation, although rates of amino acid substitutions are still higher than for comparable insects. The mitochondrial-encoded polypeptides atp6 and atp8 both occur in F0F1ATP synthase of the electron transport chain. Because malfunction in this fundamental protein severely affects fitness, we suggest that the accelerated accumulation of replacements is due to beneficial mutations necessary to compensate mild-deleterious mutations fixed by random genetic drift or Wolbachia sweeps in the fast evolving mitochondria of Nasonia. We further propose that relatively high rates of amino acid substitution in some mitochondrial genes can be driven by a "Compensation-Draft Feedback"; increased fixation of mildly deleterious mutations results in selection for compensatory mutations, which lead to fixation of additional deleterious mutations in nonrecombining mitochondrial genomes, thus accelerating the process of amino acid substitutions.
Rasmussen-Torvik, Laura J.; Alonso, Alvaro; Li, Man; Kao, Wen; Köttgen, Anna; Yan, Yuer; Couper, David; Boerwinkle, Eric; Bielinski, Suzette J.; Pankow, James S.
Although GWAS have been performed in longitudinal studies, most used only a single trait measure. GWAS of fasting glucose have generally included only normoglycemic individuals. We examined the impact of both repeated measures and sample selection on GWAS in ARIC, a study which obtained four longitudinal measures of fasting glucose and included both individuals with and without prevalent diabetes. The sample included Caucasians and the Affymetrix 6.0 chip was used for genotyping. Sample sizes for GWAS analyses ranged from 8372 (first study visit) to 5782 (average fasting glucose). Candidate SNP analyses with SNPs identified through fasting glucose or diabetes GWAS were conducted in 9133 individuals, including 761 with prevalent diabetes. For a constant sample size, smaller p-values were obtained for the average measure of fasting glucose compared to values at any single visit, and two additional significant GWAS signals were detected. For four candidate SNPs (rs780094, rs10830963, rs7903146, and rs4607517), the strength of association between genotype and glucose was significantly (p-interaction fasting glucose candidate SNPs (rs780094, rs10830963, rs560887, rs4607517, rs13266634) the association with measured fasting glucose was more significant in the smaller sample without prevalent diabetes than in the larger combined sample of those with and without diabetes. This analysis demonstrates the potential utility of averaging trait values in GWAS studies and explores the advantage of using only individuals without prevalent diabetes in GWAS of fasting glucose. PMID:20839289
Vandermies, Marie; Denies, Olivia; Nicaud, Jean-Marc; Fickers, Patrick
We report here on EYK1, encoding erythrulose kinase, as an efficient catabolic selectable marker for genome editing in Y. lipolytica. Compared to auxotrophic markers, EYK1 increases the growth rate of transformants and allows improved efficiency of transformation. The utility of the marker EYK1 in a replicative vector was also demonstrated. Copyright © 2017 Elsevier B.V. All rights reserved.
Jacobsen, Lars Magnus W.; Rodrigues da Fonseca, Rute Andreia; Bernatchez, Louis
Several studies have recently reported evidence for positive selection acting on the mitochondrial genome (mitogenome), emphasizing its potential role in adaptive divergence and speciation. In this study we searched 107 full mitogenomes of recently diverged species and lineages of whitefish (Core...
McConnochie, T. H.; Smith, M. D.
Mars Global Surveyor Thermal Emission Spectrometer (MGS-TES) nadir-soundings have been used to derive atmospheric temperatures up to roughly 40 km [Conrath et al., JGR 105 2000, Smith et al., JGR 106, 2001], and MGS-TES limb soundings have been used to extend the atmospheric temperature data set to > 60 km in altitude [Smith et al., JGR 106, 2001]. The ~40 - ~65 km altitude range probed by the MGS-TES limb sounding is particularly important for capturing key dynamical features such as the warm winter polar mesosphere [e.g., Smith et al., JGR 106, 2001; McCleese et al., Nature Geoscience 1, 2008], and the response of thermal tides to dust opacity [e.g. Wilson and Hamilton, J. Atmos. Sci. 53, 1996]. Thus accurate and precise temperature profiles at these altitudes are particularly important for constraining global circulation models. They are also critical for interpreting observations of mesospheric condensate aerosols [e.g., Määttänen et al., Icarus 209, 2010; McConnochie et al., Icarus 210, 2010)]. We have indentified correlated noise components in the MGS-TES limb sounding radiances that propagate into very large uncertainties in the retrieved temperatures. We have also identified a slowly varying radiance bias in the limb sounding radiances. Note that the nadir-sounding-based MGS-TES atmospheric temperatures currently available from the Planetary Data System are not affected by either of these issues. These two issues affect the existing MGS-TES limb sounding temperature data set are as follows: Considering, for example, the 1.5 Pascal pressure level (which typically falls between 50 and 60 km altitude), correlated-noise induced standard errors for individual limb-sounding temperature retrievals were 3 - 5 K in Mars Year 24, rising to 5 - 15 K in Mars Year 25 and 10 - 15 K in Mars Year 26 and 27. The radiance bias, although consistent on ~10-sol time scales, is highly variable over the course of the MGS-TES mission. It results in temperatures (at the 1
Wang, Yi; Zhang, Zhong-Tian; Seo, Seung-Oh; Lynn, Patrick; Lu, Ting; Jin, Yong-Su; Blaschek, Hans P
CRISPR-Cas9 has been demonstrated as a transformative genome engineering tool for many eukaryotic organisms; however, its utilization in bacteria remains limited and ineffective. Here we explored Streptococcus pyogenes CRISPR-Cas9 for genome editing in Clostridium beijerinckii (industrially significant but notorious for being difficult to metabolically engineer) as a representative attempt to explore CRISPR-Cas9 for genome editing in microorganisms that previously lacked sufficient genetic tools. By combining inducible expression of Cas9 and plasmid-borne editing templates, we successfully achieved gene deletion and integration with high efficiency in single steps. We further achieved single nucleotide modification by applying innovative two-step approaches, which do not rely on availability of Protospacer Adjacent Motif sequences. Severe vector integration events were observed during the genome engineering process, which is likely difficult to avoid but has never been reported by other researchers for the bacterial genome engineering based on homologous recombination with plasmid-borne editing templates. We then further successfully employed CRISPR-Cas9 as an efficient tool for selecting desirable "clean" mutants in this study. The approaches we developed are broadly applicable and will open the way for precise genome editing in diverse microorganisms.
Pierella Karlusich, Juan J; Ceccoli, Romina D; Graña, Martín; Romero, Héctor; Carrillo, Néstor
Oxidative stress and iron limitation represent the grim side of life in an oxygen-rich atmosphere. The versatile electron transfer shuttle ferredoxin, an iron-sulfur protein, is particularly sensitive to these hardships, and its downregulation under adverse conditions severely compromises survival of phototrophs. Replacement of ferredoxin by a stress-resistant isofunctional carrier, flavin-containing flavodoxin, is a widespread strategy employed by photosynthetic microorganisms to overcome environmental adversities. The flavodoxin gene was lost in the course of plant evolution, but its reintroduction in transgenic plants confers increased tolerance to environmental stress and iron starvation, raising the question as to why a genetic asset with obvious adaptive value was not kept by natural selection. Phylogenetic analyses reveal that the evolutionary history of flavodoxin is intricate, with several horizontal gene transfer events between distant organisms, including Eukarya, Bacteria, and Archaea. The flavodoxin gene is unevenly distributed in most algal lineages, with flavodoxin-containing species being overrepresented in iron-limited regions and scarce or absent in iron-rich environments. Evaluation of cyanobacterial genomic and metagenomic data yielded essentially the same results, indicating that there was little selection pressure to retain flavodoxin in iron-rich coastal/freshwater phototrophs. Our results show a highly dynamic evolution pattern of flavodoxin tightly connected to the bioavailability of iron. Evidence presented here also indicates that the high concentration of iron in coastal and freshwater habitats may have facilitated the loss of flavodoxin in the freshwater ancestor of modern plants during the transition of photosynthetic organisms from the open oceans to the firm land.
Emma L Duncan
Full Text Available Osteoporotic fracture is a major cause of morbidity and mortality worldwide. Low bone mineral density (BMD is a major predisposing factor to fracture and is known to be highly heritable. Site-, gender-, and age-specific genetic effects on BMD are thought to be significant, but have largely not been considered in the design of genome-wide association studies (GWAS of BMD to date. We report here a GWAS using a novel study design focusing on women of a specific age (postmenopausal women, age 55-85 years, with either extreme high or low hip BMD (age- and gender-adjusted BMD z-scores of +1.5 to +4.0, n = 1055, or -4.0 to -1.5, n = 900, with replication in cohorts of women drawn from the general population (n = 20,898. The study replicates 21 of 26 known BMD-associated genes. Additionally, we report suggestive association of a further six new genetic associations in or around the genes CLCN7, GALNT3, IBSP, LTBP3, RSPO3, and SOX4, with replication in two independent datasets. A novel mouse model with a loss-of-function mutation in GALNT3 is also reported, which has high bone mass, supporting the involvement of this gene in BMD determination. In addition to identifying further genes associated with BMD, this study confirms the efficiency of extreme-truncate selection designs for quantitative trait association studies.
Bagos, Pantelis G
In genetic association studies (GAS) as well as in genome-wide association studies (GWAS), the mode of inheritance (dominant, additive and recessive) is usually not known a priori. Assuming an incorrect mode of inheritance may lead to substantial loss of power, whereas on the other hand, testing all possible models may result in an increased type I error rate. The situation is even more complicated in the meta-analysis of GAS or GWAS, in which individual studies are synthesized to derive an overall estimate. Meta-analysis increases the power to detect weak genotype effects, but heterogeneity and incompatibility between the included studies complicate things further. In this review, we present a comprehensive summary of the statistical methods used for robust analysis and genetic model selection in GAS and GWAS. We then discuss the application of such methods in the context of meta-analysis. We describe the theoretical properties of the various methods and the foundations on which they are based. We also present the available software implementations of the described methods. Finally, since only few of the available robust methods have been applied in the meta-analysis setting, we present some simple extensions that allow robust meta-analysis of GAS and GWAS. Possible extensions and proposals for future work are also discussed.
Full Text Available Two methods of SNPs pre-selection based on single marker regression for the estimation of genomic breeding values (G-EBVs were compared using simulated data provided by the XII QTL-MAS workshop: i Bonferroni correction of the significance threshold and ii Permutation test to obtain the reference distribution of the null hypothesis and identify significant markers at P<0.01 and P<0.001 significance thresholds. From the set of markers significant at P<0.001, random subsets of 50% and 25% markers were extracted, to evaluate the effect of further reducing the number of significant SNPs on G-EBV predictions. The Bonferroni correction method allowed the identification of 595 significant SNPs that gave the best G-EBV accuracies in prediction generations (82.80%. The permutation methods gave slightly lower G-EBV accuracies even if a larger number of SNPs resulted significant (2,053 and 1,352 for 0.01 and 0.001 significance thresholds, respectively. Interestingly, halving or dividing by four the number of SNPs significant at P<0.001 resulted in an only slightly decrease of G-EBV accuracies. The genetic structure of the simulated population with few QTL carrying large effects, might have favoured the Bonferroni method.
Liu, Hui-Yun; Zhao, Qi; Zhang, Tian-Peng; Wu, Yue; Xiong, Yun-Xia; Wang, Shi-Ke; Ge, Yuan-Long; He, Jin-Hui; Lv, Peng; Ou, Tian-Miao; Tan, Jia-Heng; Li, Ding; Gu, Lian-Quan; Ren, Jian; Zhao, Yong; Huang, Zhi-Shu
G-quadruplexes are specialized secondary structures in nucleic acids that possess significant conformational polymorphisms. The precise G-quadruplex conformations in vivo and their relevance to biological functions remain controversial and unclear, especially for telomeric G-quadruplexes. Here, we report a novel single-chain variable fragment (scFv) antibody, D1, with high binding selectivity for parallel G-quadruplexes in vitro and in vivo. Genome-wide chromatin immunoprecipitation using D1 and deep-sequencing revealed the consensus sequence for parallel G-quadruplex formation, which is characterized by G-rich sequence with a short loop size (G-quadruplex was identified and its formation was regulated by small molecular ligands targeting and telomere replication. Together, parallel G-quadruplex specific antibody D1 was found to be a valuable tool for determination of G-quadruplex and its conformation, which will prompt further studies on the structure of G-quadruplex and its biological implication in vivo.
Guillot, Gilles; Vitalis, Renaud; Rouzic, Arnaud le;
Genomic regions (or loci) displaying outstanding correlation with some environmental variables are likely to be under selection and this is the rationale of recent methods of identifying selected loci and retrieving functional information about them. To be efficient, such methods need to be able...... to disentangle the potential effect of environmental variables from the confounding effect of population history. For the routine analysis of genome-wide datasets, one also needs fast inference and model selection algorithms. We propose a method based on an explicit spatial model which is an instance of spatial...... generalized linear mixed model (SGLMM). For inference, we make use of the INLA–SPDE theoretical and computational framework developed by Rue et al. (2009) and Lindgren et al. (2011). The method we propose allows one to quantify the correlation between genotypes and environmental variables. It works...
Moradi Mohammad Hossein
Full Text Available Abstract Background Identification of genomic regions that have been targets of selection for phenotypic traits is one of the most important and challenging areas of research in animal genetics. However, currently there are relatively few genomic regions identified that have been subject to positive selection. In this study, a genome-wide scan using ~50,000 Single Nucleotide Polymorphisms (SNPs was performed in an attempt to identify genomic regions associated with fat deposition in fat-tail breeds. This trait and its modification are very important in those countries grazing these breeds. Results Two independent experiments using either Iranian or Ovine HapMap genotyping data contrasted thin and fat tail breeds. Population differentiation using FST in Iranian thin and fat tail breeds revealed seven genomic regions. Almost all of these regions overlapped with QTLs that had previously been identified as affecting fat and carcass yield traits in beef and dairy cattle. Study of selection sweep signatures using FST in thin and fat tail breeds sampled from the Ovine HapMap project confirmed three of these regions located on Chromosomes 5, 7 and X. We found increased homozygosity in these regions in favour of fat tail breeds on chromosome 5 and X and in favour of thin tail breeds on chromosome 7. Conclusions In this study, we were able to identify three novel regions associated with fat deposition in thin and fat tail sheep breeds. Two of these were associated with an increase of homozygosity in the fat tail breeds which would be consistent with selection for mutations affecting fat tail size several thousand years after domestication.
Full Text Available Dalam proses pembelajaran siswa dituntut untuk aktif melaluiaktivitas-aktivitas yang membangun kerja kelompok dan dalam waktu singkatmembuat mereka berfikir tentang materi pelajaran. Keterlibatan siswa secara aktif dalam pembelajaran biologi sangat diperlukan, sehingga apa yang dipelajari akan lebih tertanam dalam pikiran siswa. PenerapanMonopoly Games Smartpada pembelajaranBiologi merupakan salah satu alternatif untuk meningkatkan motivasi dan kreatifitas siswa serta dapat mengurangi kejenuhan belajar siswa, sehingga mendapatkanhasilbelajar siswa yang memuaskan. Adapun yang menjadi tujuan penelitian ini adalah:(1 Untuk mengkaji penerapan media MGS (Monopoly Games Smart pada materi Ekosistem di MTs Al-Wahdah Sumber. (2 Untuk mengkaji hasil belajar siswa pada pembelajaran biologi materi Ekosistem dengan menggunakan media MGS (Monopoly Games Smart di MTs Al-Wahdah Sumber. (3 Untuk mengkaji respon siswa terhadap penggunaan media MGS (Monopoly Games Smart dalam pembelajaran biologi khususnya pada materi Ekosistem di Mts Al-Wahdah Sumber. Penelitian ini dilakukan di MTs Al-Wahdah dengan teknik pengumpulan data yang digunakan adalah instrumen tes (pre-test dan post-test untuk mengukur hasil belajar siswa, observasi untuk mengetahui aktivitas siswa dan angketuntuk mengetahui respon siswa terhadap media pembelajaran. Hasil penelitian menunjukan bahwa (1 berdasarkan hasil analisis observasi, aktivitas siswa meningkat setelah diterapkan media Monopoly Games Smart. (2 Hasil belajar siswa kelas eksperimen menggunakan media pembelajaran Monopoly Games Smart nilai rata-rata pretest sebesar 39, posttest 78 dan n-gain 0,62.. Terbukti dari hasil perhitungan uji T menggunakan SPSS 16 diperoleh nilai Sig. 0,000 < (0,05 yang berarti terdapat peningkatan hasil belajar biologi siswa. (3 berdasarkan hasil analisis angket mengenai respon siswa terhadap penerapan media hampir dari siswa (82% dengan kriteria sangat kuat, menyukai penerapan strategi pembelajaran Monopoly
Lenz, Patrick R N; Beaulieu, Jean; Mansfield, Shawn D; Clément, Sébastien; Desponts, Mireille; Bousquet, Jean
Genomic selection (GS) uses information from genomic signatures consisting of thousands of genetic markers to predict complex traits. As such, GS represents a promising approach to accelerate tree breeding, which is especially relevant for the genetic improvement of boreal conifers characterized by long breeding cycles. In the present study, we tested GS in an advanced-breeding population of the boreal black spruce (Picea mariana [Mill.] BSP) for growth and wood quality traits, and concurrently examined factors affecting GS model accuracy. The study relied on 734 25-year-old trees belonging to 34 full-sib families derived from 27 parents and that were established on two contrasting sites. Genomic profiles were obtained from 4993 Single Nucleotide Polymorphisms (SNPs) representative of as many gene loci distributed among the 12 linkage groups common to spruce. GS models were obtained for four growth and wood traits. Validation using independent sets of trees showed that GS model accuracy was high, related to trait heritability and equivalent to that of conventional pedigree-based models. In forward selection, gains per unit of time were three times higher with the GS approach than with conventional selection. In addition, models were also accurate across sites, indicating little genotype-by-environment interaction in the area investigated. Using information from half-sibs instead of full-sibs led to a significant reduction in model accuracy, indicating that the inclusion of relatedness in the model contributed to its higher accuracies. About 500 to 1000 markers were sufficient to obtain GS model accuracy almost equivalent to that obtained with all markers, whether they were well spread across the genome or from a single linkage group, further confirming the implication of relatedness and potential long-range linkage disequilibrium (LD) in the high accuracy estimates obtained. Only slightly higher model accuracy was obtained when using marker subsets that were
McLaughlin, Richard N; Young, Janet M; Yang, Lei; Neme, Rafik; Wichman, Holly A; Malik, Harmit S
Mammalian genomes comprise many active and fossilized retroelements. The obligate requirement for retroelement integration affords host genomes an opportunity to 'domesticate' retroelement genes for their own purpose, leading to important innovations in genome defense and placentation. While many such exaptations involve retroviruses, the L1TD1 gene is the only known domesticated gene whose protein-coding sequence is almost entirely derived from a LINE-1 (L1) retroelement. Human L1TD1 has been shown to play an important role in pluripotency maintenance. To investigate how this role was acquired, we traced the origin and evolution of L1TD1. We find that L1TD1 originated in the common ancestor of eutherian mammals, but was lost or pseudogenized multiple times during mammalian evolution. We also find that L1TD1 has evolved under positive selection during primate and mouse evolution, and that one prosimian L1TD1 has 'replenished' itself with a more recent L1 ORF1 from the prosimian genome. These data suggest that L1TD1 has been recurrently selected for functional novelty, perhaps for a role in genome defense. L1TD1 loss is associated with L1 extinction in several megabat lineages, but not in sigmodontine rodents. We hypothesize that L1TD1 could have originally evolved for genome defense against L1 elements. Later, L1TD1 may have become incorporated into pluripotency maintenance in some lineages. Our study highlights the role of retroelement gene domestication in fundamental aspects of mammalian biology, and that such domesticated genes can adopt different functions in different lineages.
Full Text Available Abstract Background In recent years, the development of structural genomics has generated a growing interest in obtaining haploid plants. The use of homozygous lines presents a significant advantage for the accomplishment of sequencing projects. Commercial citrus species are characterized by high heterozygosity, making it difficult to assemble large genome sequences. Thus, the International Citrus Genomic Consortium (ICGC decided to establish a reference whole citrus genome sequence from a homozygous plant. Due to the existence of important molecular resources and previous success in obtaining haploid clementine plants, haploid clementine was selected as the target for the implementation of the reference whole genome citrus sequence. Results To obtain haploid clementine lines we used the technique of in situ gynogenesis induced by irradiated pollen. Flow cytometry, chromosome counts and SSR marker (Simple Sequence Repeats analysis facilitated the identification of six different haploid lines (2n = x = 9, one aneuploid line (2n = 2x+4 = 22 and one doubled haploid plant (2n = 2x = 18 of 'Clemenules' clementine. One of the haploids, obtained directly from an original haploid embryo, grew vigorously and produced flowers after four years. This is the first haploid plant of clementine that has bloomed and we have, for the first time, characterized the histology of haploid and diploid flowers of clementine. Additionally a double haploid plant was obtained spontaneously from this haploid line. Conclusion The first haploid plant of 'Clemenules' clementine produced directly by germination of a haploid embryo, which grew vigorously and produced flowers, has been obtained in this work. This haploid line has been selected and it is being used by the ICGC to establish the reference sequence of the nuclear genome of citrus.
Aleza, Pablo; Juárez, José; Hernández, María; Pina, José A; Ollitrault, Patrick; Navarro, Luis
In recent years, the development of structural genomics has generated a growing interest in obtaining haploid plants. The use of homozygous lines presents a significant advantage for the accomplishment of sequencing projects. Commercial citrus species are characterized by high heterozygosity, making it difficult to assemble large genome sequences. Thus, the International Citrus Genomic Consortium (ICGC) decided to establish a reference whole citrus genome sequence from a homozygous plant. Due to the existence of important molecular resources and previous success in obtaining haploid clementine plants, haploid clementine was selected as the target for the implementation of the reference whole genome citrus sequence. To obtain haploid clementine lines we used the technique of in situ gynogenesis induced by irradiated pollen. Flow cytometry, chromosome counts and SSR marker (Simple Sequence Repeats) analysis facilitated the identification of six different haploid lines (2n = x = 9), one aneuploid line (2n = 2x+4 = 22) and one doubled haploid plant (2n = 2x = 18) of 'Clemenules' clementine. One of the haploids, obtained directly from an original haploid embryo, grew vigorously and produced flowers after four years. This is the first haploid plant of clementine that has bloomed and we have, for the first time, characterized the histology of haploid and diploid flowers of clementine. Additionally a double haploid plant was obtained spontaneously from this haploid line. The first haploid plant of 'Clemenules' clementine produced directly by germination of a haploid embryo, which grew vigorously and produced flowers, has been obtained in this work. This haploid line has been selected and it is being used by the ICGC to establish the reference sequence of the nuclear genome of citrus.
Sargent, D J; Passey, T; Surbanovski, N; Lopez Girona, E; Kuchta, P; Davik, J; Harrison, R; Passey, A; Whitehouse, A B; Simpson, D W
The linkage maps of the cultivated strawberry, Fragaria × ananassa (2n = 8x = 56) that have been reported to date have been developed predominantly from AFLPs, along with supplementation with transferrable microsatellite (SSR) markers. For the investigation of the inheritance of morphological characters in the cultivated strawberry and for the development of tools for marker-assisted breeding and selection, it is desirable to populate maps of the genome with an abundance of transferrable molecular markers such as microsatellites (SSRs) and gene-specific markers. Exploiting the recent release of the genome sequence of the diploid F. vesca, and the publication of an extensive number of polymorphic SSR markers for the genus Fragaria, we have extended the linkage map of the 'Redgauntlet' × 'Hapil' (RG × H) mapping population to include a further 330 loci, generated from 160 primer pairs, to create a linkage map for F. × ananassa containing 549 loci, 490 of which are transferrable SSR or gene-specific markers. The map covers 2140.3 cM in the expected 28 linkage groups for an integrated map (where one group is composed of two separate male and female maps), which represents an estimated 91% of the cultivated strawberry genome. Despite the relative saturation of the linkage map on the majority of linkage groups, regions of apparent extensive homozygosity were identified in the genomes of 'Redgauntlet' and 'Hapil' which may be indicative of allele fixation during the breeding and selection of modern F. × ananassa cultivars. The genomes of the octoploid and diploid Fragaria are largely collinear, but through comparison of mapped markers on the RG × H linkage map to their positions on the genome sequence of F. vesca, a number of inversions were identified that may have occurred before the polyploidisation event that led to the evolution of the modern octoploid strawberry species.
Richard N McLaughlin
Full Text Available Mammalian genomes comprise many active and fossilized retroelements. The obligate requirement for retroelement integration affords host genomes an opportunity to 'domesticate' retroelement genes for their own purpose, leading to important innovations in genome defense and placentation. While many such exaptations involve retroviruses, the L1TD1 gene is the only known domesticated gene whose protein-coding sequence is almost entirely derived from a LINE-1 (L1 retroelement. Human L1TD1 has been shown to play an important role in pluripotency maintenance. To investigate how this role was acquired, we traced the origin and evolution of L1TD1. We find that L1TD1 originated in the common ancestor of eutherian mammals, but was lost or pseudogenized multiple times during mammalian evolution. We also find that L1TD1 has evolved under positive selection during primate and mouse evolution, and that one prosimian L1TD1 has 'replenished' itself with a more recent L1 ORF1 from the prosimian genome. These data suggest that L1TD1 has been recurrently selected for functional novelty, perhaps for a role in genome defense. L1TD1 loss is associated with L1 extinction in several megabat lineages, but not in sigmodontine rodents. We hypothesize that L1TD1 could have originally evolved for genome defense against L1 elements. Later, L1TD1 may have become incorporated into pluripotency maintenance in some lineages. Our study highlights the role of retroelement gene domestication in fundamental aspects of mammalian biology, and that such domesticated genes can adopt different functions in different lineages.
Leplat, Florian Jean Victor
Manganese (Mn) deficiency remains an unsolved nutritional problem affecting crop production worldwide. The tolerance to Mn limiting conditions, known as Mn efficiency, is a quantitative abiotic stress trait, generally controlled by several genes. However the underlying genetic background of Mn...... efficiency remained elusive. This PhD study aimed to understand better the genetic determination of the trait and propose new insights for plant breeding purposes. Two genome-wide approaches were used in a winter barley collection to characterize the genetic control of the trait. First, a Genome......-wide association study (GWAS) and chlorophyll a fluorescence phenotyping allowed to identify several QTLs involved in the plant response to Mn deficiency. Multiple candidate coding genes were fund, among which, photosystem II PsbP subunit, germin-like proteins or Mn-Superoxide Dismutase. It supports the Mn...
Trinh, Hien; Nguyen, Khoa Truong; Nguyen, Lam Van; Pham, Huy Quang; Huong, Can Thu; Xuan, Tran Dang; Anh, La Hoang; Caccamo, Mario; Ayling, Sarah; Diep, Nguyen Thuy; Trung, Khuat Huu
Next generation sequencing technologies have provided numerous opportunities for application in the study of whole plant genomes. In this study, we present the sequencing and bioinformatic analyses of five typical rice landraces including three indica and two japonica with potential blast resistance. A total of 688.4 million 100 bp paired-end reads have yielded approximately 30-fold coverage to compare with the Nipponbare reference genome. Among them, a small number of reads were mapped to both chromosomes and organellar genomes. Over two million and eight hundred thousand single nucleotide polymorphisms (SNPs) and insertions and deletions (InDels) in indica and japonica lines have been determined, which potentially have significant impacts on multiple transcripts of genes. SNP deserts, contiguous SNP-low regions, were found on chromosomes 1, 4, and 5 of all genomes of rice examined. Based on the distribution of SNPs per 100 kilobase pairs, the phylogenetic relationships among the landraces have been constructed. This is the first step towards revealing several salient features of rice genomes in Vietnam and providing significant information resources to further marker-assisted selection (MAS) in rice breeding programs. PMID:28265566
Full Text Available Genetic recombination is a major contributor to the ongoing diversification of HIV. It is clearly apparent that across the HIV-genome there are defined recombination hot and cold spots which tend to co-localise both with genomic secondary structures and with either inter-gene boundaries or intra-gene domain boundaries. There is also good evidence that most recombination breakpoints that are detectable within the genes of natural HIV recombinants are likely to be minimally disruptive of intra-protein amino acid contacts and that these breakpoints should therefore have little impact on protein folding. Here we further investigate the impact on patterns of genetic recombination in HIV of selection favouring the maintenance of functional RNA and protein structures. We confirm that chimaeric Gag p24, reverse transcriptase, integrase, gp120 and Nef proteins that are expressed by natural HIV-1 recombinants have significantly lower degrees of predicted folding disruption than randomly generated recombinants. Similarly, we use a novel single-stranded RNA folding disruption test to show that there is significant, albeit weak, evidence that natural HIV recombinants tend to have genomic secondary structures that more closely resemble parental structures than do randomly generated recombinants. These results are consistent with the hypothesis that natural selection has acted both in the short term to purge recombinants with disrupted RNA and protein folds, and in the longer term to modify the genome architecture of HIV to ensure that recombination prone sites correspond with those where recombination will be minimally deleterious.
Full Text Available African trypanosomes are mammalian pathogens that must regularly change their protein coat to survive in the host bloodstream. Chronic trypanosome infections are potentiated by their ability to access a deep genomic repertoire of Variant Surface Glycoprotein (VSG genes and switch from the expression of one VSG to another. Switching VSG expression is largely based in DNA recombination events that result in chromosome translocations between an acceptor site, which houses the actively transcribed VSG, and a donor gene, drawn from an archive of more than 2,000 silent VSGs. One element implicated in these duplicative gene conversion events is a DNA repeat of approximately 70 bp that is found in long regions within each BES and short iterations proximal to VSGs within the silent archive. Early observations showing that 70-bp repeats can be recombination boundaries during VSG switching led to the prediction that VSG-proximal 70-bp repeats provide recombinatorial homology. Yet, this long held assumption had not been tested and no specific function for the conserved 70-bp repeats had been demonstrated. In the present study, the 70-bp repeats were genetically manipulated under conditions that induce gene conversion. In this manner, we demonstrated that 70-bp repeats promote access to archival VSGs. Synthetic repeat DNA sequences were then employed to identify the length, sequence, and directionality of repeat regions required for this activity. In addition, manipulation of the 70-bp repeats allowed us to observe a link between VSG switching and the cell cycle that had not been appreciated. Together these data provide definitive support for the long-standing hypothesis that 70-bp repeats provide recombinatorial homology during switching. Yet, the fact that silent archival VSGs are selected under these conditions suggests the 70-bp repeats also direct DNA pairing and recombination machinery away from the closest homologs (silent BESs and toward the rest of
Full Text Available Abstract Background For most organisms, developing hundreds of genetic markers spanning the whole genome still requires excessive if not unrealistic efforts. In this context, there is an obvious need for methodologies allowing the low-cost, fast and high-throughput genotyping of virtually any species, such as the Diversity Arrays Technology (DArT. One of the crucial steps of the DArT technique is the genome complexity reduction, which allows obtaining a genomic representation characteristic of the studied DNA sample and necessary for subsequent genotyping. In this article, using the mosquito Aedes aegypti as a study model, we describe a new genome complexity reduction method taking advantage of the abundance of miniature inverted repeat transposable elements (MITEs in the genome of this species. Results Ae. aegypti genomic representations were produced following a two-step procedure: (1 restriction digestion of the genomic DNA and simultaneous ligation of a specific adaptor to compatible ends, and (2 amplification of restriction fragments containing a particular MITE element called Pony using two primers, one annealing to the adaptor sequence and one annealing to a conserved sequence motif of the Pony element. Using this protocol, we constructed a library comprising more than 6,000 DArT clones, of which at least 5.70% were highly reliable polymorphic markers for two closely related mosquito strains separated by only a few generations of artificial selection. Within this dataset, linkage disequilibrium was low, and marker redundancy was evaluated at 2.86% only. Most of the detected genetic variability was observed between the two studied mosquito strains, but individuals of the same strain could still be clearly distinguished. Conclusion The new complexity reduction method was particularly efficient to reveal genetic polymorphisms in Ae. egypti. Overall, our results testify of the flexibility of the DArT genotyping technique and open new
Watermelon (Citrullus lanatus var. lanatus) contains 88% water, sugars, and several important health-related compounds, including lycopene, citrulline, arginine, and glutathione. The current genetic diversity study uses microsatellites with known map positions to identify genomic regions that under...
Holocene millennium-scale climatic variations as recorded by Rb and Sr concentrations for the MGS1 stratigraphical segment of Milanggouwan section in the Salawusu River Valley of Southeast Mu Us Desert%萨拉乌苏河流域MGS1 Rb和Sr记录的全新世千年尺度气候变化
牛东风; 李保生; 魏建国; 温小浩; 舒培仙; 司月君
萨拉乌苏河流域米浪沟湾剖面全新世地层MGS1层段记录了11个由风成砂与河流相或湖沼相构成的沉积旋回。对该层段63个样品的Rb、Sr数据进行了分析，结果显示由沙丘砂至上覆河湖相Rb和Sr含量由低增高，而Rb/Sr比值的分布则显示出与Rb和Sr含量变化相反的趋势，三者与平均粒径Mz(f)的相关系数都在0.43以上。研究表明MGS1至少经历了11次冷干和11次暖湿的气候波动。米浪沟湾剖面MGS1记录的千年尺度气候波动既是对东亚季风环流演变历史的体现，同时也是对全球气候与环境变化的响应。%The MGS1 stratigraphical segment of Milanggouwan section is located in the Salawusu River Valley of southeast Mu Us Desert. The segment documents 11 sedimentary cycles consisting of aeolian facies and fluvial facies or lacustrine-swamp facies. Totally 63 samples were analyzed for rubidium (Rb) and strontium (Sr) concentrations. The results show that Rb and Sr concentrations increase as the sediments vary from aeolian facies to lacustrine-swamp facies, however, variation of the Rb/Sr ratios shows reversed trends in contrast to that of Rb and Sr concentrations. The line correlation coefficients of Rb and Sr concentrations and Rb/Sr ratios with mean grain size are all above 0.43. All these indicate that the MGS1 stratigraphic segment at least records 11 cold-dry and 11 warm-humid millennium-scale climatic oscillations, which represent the evolution history of East Asian monsoon circulation and show good correspondence to global climatic and environment variations.
Pedersen, Louise Dybdahl; Kargo, Morten; Berg, Peer
. However, when all young bull candidates were born following MOET, the results showed that the use of Y-semen in the breeding nucleus tended to decrease the rate of inbreeding as it enabled GS to increase within-family selection. This implies that the benefit from using sexed semen in a modern dairy cattle......The aim of this study was to test whether the use of X-semen in a dairy cattle population using genomic selection (GS) and multiple ovulation and embryo transfer (MOET) increases the selection intensity on cow dams and thereby the genetic gain in the entire population. Also, the dynamics of using...... different types of sexed semen (X, Y or conventional) in the nucleus were investigated. The stochastic simulation study partly supported the hypothesis as the genetic gain in the entire population was elevated when X-semen was used in the production population as GS exploited the higher selection intensity...
Chuanxiao; Xie; Jianfeng; Weng; Wenguo; Liu; Cheng; Zou; Zhuanfang; Hao; Wenxue; Li; Minshun; Li; Xiaosen; Guo; Gengyun; Zhang; Yunbi; Xu; Xinhai; Li; Shihuang; Zhang
Artificial selection during domestication and post-domestication improvement results in loss of genetic diversity near target loci. However, the genetic locus associated with cob glume color and the nature of the genomic pattern surrounding it was elusive and the selection effect in that region was not clear. An association mapping panel consisting of 283 diverse modern temperate maize elite lines was genotyped by a chip containing over 55,000 evenly distributed SNPs. Ten-fold resequencing at the target region on 40 of the panel lines and 47 tropical lines was also undertaken. A genome-wide association study(GWAS) for cob glume color confirmed the P1 locus, which is located on the short arm of chromosome 1, with a-log10 P value for surrounding SNPs higher than the Bonferroni threshold(α/n, α < 0.001) when a mixed linear model(MLM) was implemented. A total of 26 markers were identified in a 0.78 Mb region surrounding the P1 locus, including 0.73 Mb and 0.05 Mb upstream and downstream of the P1 gene, respectively. A clear linkage disequilibrium(LD) block was found and LD decayed very rapidly with increasing physical distance surrounding the P1 locus. The estimates of π and Tajima’s D were significantly(P < 0.001) lower at both ends compared to the locus. Upon comparison of temperate and tropical lines at much finer resolution by resequencing(180-fold finer than chip SNPs), a more structured LD block pattern was found among the 40 resequenced temperate lines. All evidence indicates that the P1 locus in temperate maize has not undergone neutral evolution but has been subjected to artificial selection during post-domestication selection or improvement. The information and analytical results generated in this study provide insights as to how breeding efforts have affected genome evolution in crop plants.
Full Text Available Detecting loci under selection is an important task in evolutionary biology. In conservation genetics detecting selection is key to investigating adaptation to the spread of infectious disease. Loci under selection can be detected on a spatial scale, accounting for differences in demographic history among populations, or on a temporal scale, tracing changes in allele frequencies over time. Here we use these two approaches to investigate selective responses to the spread of an infectious cancer--devil facial tumor disease (DFTD--that since 1996 has ravaged the Tasmanian devil (Sarcophilus harrisii. Using time-series 'restriction site associated DNA' (RAD markers from populations pre- and post DFTD arrival, and DFTD free populations, we infer loci under selection due to DFTD and investigate signatures of selection that are incongruent among methods, populations, and times. The lack of congruence among populations influenced by DFTD with respect to inferred loci under selection, and the direction of that selection, fail to implicate a consistent selective role for DFTD. Instead genetic drift is more likely driving the observed allele frequency changes over time. Our study illustrates the importance of applying methods with different performance optima e.g. accounting for population structure and background selection, and assessing congruence of the results.
Full Text Available Why some species become successful invaders is an important issue in invasive biology. However, limited genomic resources make it very difficult for identifying candidate genes involved in invasiveness. Mikania micrantha H.B.K. (Asteraceae, one of the world's most invasive weeds, has adapted rapidly in response to novel environments since its introduction to southern China. In its genome, we expect to find outlier loci under selection for local adaptation, critical to dissecting the molecular mechanisms of invasiveness. An explorative amplified fragment length polymorphism (AFLP genome scan was used to detect candidate loci under selection in 28 M. micrantha populations across its entire introduced range in southern China. We also estimated population genetic parameters, bottleneck signatures, and linkage disequilibrium. In binary characters, such as presence or absence of AFLP bands, if all four character combinations are present, it is referred to as a character incompatibility. Since character incompatibility is deemed to be rare in populations with extensive asexual reproduction, a character incompatibility analysis was also performed in order to infer the predominant mating system in the introduced M. micrantha populations. Out of 483 AFLP loci examined using stringent significance criteria, 14 highly credible outlier loci were identified by Dfdist and Bayescan. Moreover, remarkable genetic variation, multiple introductions, substantial bottlenecks and character compatibility were found to occur in M. micrantha. Thus local adaptation at the genome level indeed exists in M. micrantha, and may represent a major evolutionary mechanism of successful invasion. Interactions between genetic diversity, multiple introductions, and reproductive modes contribute to increase the capacity of adaptive evolution.
Plomion, C.; Chancerel, E.; Endelman, J.; Lamy, J.B.; Mandrou, E.; Lesur, I.; Ehrenmann, F.; Isik, F.; Bink, M.C.A.M.; Heerwaarden, van J.; Bouffier, L.
BACKGROUND: The accessibility of high-throughput genotyping technologies has contributed greatly to the development of genomic resources in non-model organisms. High-density genotyping arrays have only recently been developed for some economically important species such as conifers. The potential
Frantz, L.A.F.; Schraiber, J.G.; Madsen, O.; Megens, H.J.W.C.; Cagan, A.; Bosse, M.; Paudel, Y.; Crooijmans, R.P.M.A.; Larson, G.; Groenen, M.A.M.
Traditionally, the process of domestication is assumed to be initiated by humans, involve few individuals and rely on reproductive isolation between wild and domestic forms. We analyzed pig domestication using over 100 genome sequences and tested whether pig domestication followed a traditional line
Plomion, C.; Chancerel, E.; Endelman, J.; Lamy, J.B.; Mandrou, E.; Lesur, I.; Ehrenmann, F.; Isik, F.; Bink, M.C.A.M.; Heerwaarden, van J.; Bouffier, L.
BACKGROUND: The accessibility of high-throughput genotyping technologies has contributed greatly to the development of genomic resources in non-model organisms. High-density genotyping arrays have only recently been developed for some economically important species such as conifers. The potential fo
Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...
Bakker, Freek T.; Lei, Di; Yu, Jiaying
Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...
We studied genomic variation in a previously selected collection of isogenic Mycobacterium tuberculosis laboratory strains subjected to one or two rounds of antibiotic selection. Whole genome sequencing analysis identified eleven single, unique mutations (four synonymous, six non-synonymous, one intergenic), in addition to drug resistance-conferring mutations, that were fixed in the genomes of six monoresistant strains. Eight loci, present as minority variants (five non-synonymous, three synonymous) in the genome of the susceptible parent strain, became fixed in the genomes of multiple daughter strains. None of these mutations are known to be involved with drug resistance. Our results confirm previously observed genomic stability for M. tuberculosis, although the parent strain had accumulated allelic variants at multiple locations in an antibiotic-free in vitro environment. It is therefore likely to assume that these so-called hitchhiking mutations were co-selected and fixed in multiple daughter strains during antibiotic selection. The presence of multiple allelic variations, accumulated under non-selective conditions, which become fixed during subsequent selective steps, deserves attention. The wider availability of \\'deep\\' sequencing methods could help to detect multiple bacterial (sub)populations within patients with high resolution and would therefore be useful in assisting in the detailed investigation of transmission chains.