Roche; Cong; Chen; Hanna; Gustine; Sherwood; Ozias-Akins
Twelve molecular markers linked to pseudogamous apospory, a form of gametophytic apomixis, were previously isolated from Pennisetum squamulatum Fresen. No recombination between these markers was found in a segregating population of 397 individuals (Ozias-Akins et al. 1998, Proc. Natl Acad. Sci. USA, 95, 5127-5132). The objective of the present study was to test if these markers were also linked to the aposporous mode of reproduction in two small segregating populations of Cenchrus ciliaris (= Pennisetum ciliare (L.)Link), another apomictic grass species. Among 12 markers (sequence characterized amplified regions, SCARs), six were scored as dominant markers between aposporous and sexual C. ciliaris genotypes (presence/absence, respectively). Five were always linked to apospory and one showed a low level of recombination in 84 progenies. Restriction fragment length polymorphisms (RFLPs) were observed between sexual and apomictic phenotypes for three of the six remaining SCARs from P. squamulatum when used as probes. No recombination was observed in the F1 progenies. Preliminary data from megabase DNA analysis and sequencing in both species indicate that an apospory-specific genomic region (ASGR) is highly conserved between the two species. Although C. ciliaris has a smaller genome size to P. squamulatum, a higher copy number for markers linked to apospory found in the former may impair the progress of positional cloning of gene(s) for apomixis in this species.
Conner, Joann A; Gunawan, Gunawati; Ozias-Akins, Peggy
Apomixis enables the clonal propagation of maternal genotypes through seed. If apomixis could be harnessed via genetic engineering or introgression, it would have a major economic impact for agricultural crops. In the grass species Pennisetum squamulatum and Cenchrus ciliaris (syn. P. ciliare), apomixis is controlled by a single dominant "locus", the apospory-specific genomic region (ASGR). For P. squamulatum, 18 published sequenced characterized amplified region (SCAR) markers have been identified which always co-segregate with apospory. Six of these markers are conserved SCARs in the closely related species, C. ciliaris and co-segregate with the trait. A screen of progeny from a cross of sexual × apomictic C. ciliaris genotypes identified a plant, A8, retaining two of the six ASGR-linked SCAR markers. Additional and newly identified ASGR-linked markers were generated to help identify the extent of recombination within the ASGR. Based on analysis of missing markers, the A8 recombinant plant has lost a significant portion of the ASGR but continues to form aposporous embryo sacs. Seedlings produced from aposporous embryo sacs are 6× in ploidy level and hence the A8 recombinant does not express parthenogenesis. The recombinant A8 plant represents a step forward in reducing the complexity of the ASGR locus to determine the factor(s) required for aposporous embryo sac formation and documents the separation of expression of the two components of apomixis in C. ciliaris.
A segment of the apospory-specific genomic region is highly microsyntenic not only between the apomicts Pennisetum squamulatum and buffelgrass, but also with a rice chromosome 11 centromeric-proximal genomic region.
Gualtieri, Gustavo; Conner, Joann A; Morishige, Daryl T; Moore, L David; Mullet, John E; Ozias-Akins, Peggy
Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory.
A Segment of the Apospory-Specific Genomic Region Is Highly Microsyntenic Not Only between the Apomicts Pennisetum squamulatum and Buffelgrass, But Also with a Rice Chromosome 11 Centromeric-Proximal Genomic Region1[W
Gualtieri, Gustavo; Conner, Joann A.; Morishige, Daryl T.; Moore, L. David; Mullet, John E.; Ozias-Akins, Peggy
Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory. PMID:16415213
Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren
GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often
Full Text Available Abstract Background Apomixis is an intriguing trait in plants that results in maternal clones through seed reproduction. Apomixis is an elusive, but potentially revolutionary, trait for plant breeding and hybrid seed production. Recent studies arguing that apomicts are not evolutionary dead ends have generated further interest in the evolution of asexual flowering plants. Results In the present study, we investigate karyotypic variation in a single chromosome responsible for transmitting apomixis, the Apospory-Specific Genomic Region carrier chromosome, in relation to species phylogeny in the genera Pennisetum and Cenchrus. A 1 kb region from the 3' end of the ndhF gene and a 900 bp region from trnL-F were sequenced from 12 apomictic and eight sexual species in the genus Pennisetum and allied genus Cenchrus. An 800 bp region from the Apospory-Specific Genomic Region also was sequenced from the 12 apomicts. Molecular cytological analysis was conducted in sixteen Pennisetum and two Cenchrus species. Our results indicate that the Apospory-Specific Genomic Region is shared by all apomictic species while it is absent from all sexual species or cytotypes. Contrary to our previous observations in Pennisetum squamulatum and Cenchrus ciliaris, retrotransposon sequences of the Opie-2-like family were not closely associated with the Apospory-Specific Genomic Region in all apomictic species, suggesting that they may have been accumulated after the Apospory-Specific Genomic Region originated. Conclusions Given that phylogenetic analysis merged Cenchrus and newly investigated Pennisetum species into a single clade containing a terminal cluster of Cenchrus apomicts, the presumed monophyletic origin of Cenchrus is supported. The Apospory-Specific Genomic Region likely preceded speciation in Cenchrus and its lateral transfer through hybridization and subsequent chromosome repatterning may have contributed to further speciation in the two genera.
Alexander, Roger P; Fang, Gang; Rozowsky, Joel; Snyder, Michael; Gerstein, Mark B
Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.
Rasmussen, Simon; Nielsen, Henrik Bjørn; Jarmer, Hanne Østergaard
The majority of all genes have so far been identified and annotated systematically through in silico gene finding. Here we report the finding of 3662 strand-specific transcriptionally active regions (TARs) in the genome of Bacillus subtilis by the use of tiling arrays. We have measured the genome...
Seo, Beomseok; Kim, Chuna; Hills, Mark; Sung, Sanghyun; Kim, Hyesook; Kim, Eunkyeong; Lim, Daisy S; Oh, Hyun-Seok; Choi, Rachael Mi Jung; Chun, Jongsik; Shim, Jaegal; Lee, Junho
Cells surviving crisis are often tumorigenic and their telomeres are commonly maintained through the reactivation of telomerase. However, surviving cells occasionally activate a recombination-based mechanism called alternative lengthening of telomeres (ALT). Here we establish stably maintained survivors in telomerase-deleted Caenorhabditis elegans that escape from sterility by activating ALT. ALT survivors trans-duplicate an internal genomic region, which is already cis-duplicated to chromosome ends, across the telomeres of all chromosomes. These 'Template for ALT' (TALT) regions consist of a block of genomic DNA flanked by telomere-like sequences, and are different between two genetic background. We establish a model that an ancestral duplication of a donor TALT region to a proximal telomere region forms a genomic reservoir ready to be incorporated into telomeres on ALT activation.
The genome of Saccharomyces cerevisiae contains several duplicated regions. The recent sequencing results of several yeast species suggest that the duplicated regions found in the modern Saccharomyces species are probably the result of a single gross duplication, as well as a series of sporadic...
Lopera, Juan G.; Falendysz, Elizabeth A.; Rocke, Tonie E.; Osorio, Jorge E.
Monkeypox virus (MPXV) is an emerging pathogen from Africa that causes disease similar to smallpox. Two clades with different geographic distributions and virulence have been described. Here, we utilized bioinformatic tools to identify genomic regions in MPXV containing multiple virulence genes and explored their roles in pathogenicity; two selected regions were then deleted singularly or in combination. In vitro and in vivostudies indicated that these regions play a significant role in MPXV replication, tissue spread, and mortality in mice. Interestingly, while deletion of either region led to decreased virulence in mice, one region had no effect on in vitro replication. Deletion of both regions simultaneously also reduced cell culture replication and significantly increased the attenuation in vivo over either single deletion. Attenuated MPXV with genomic deletions present a safe and efficacious tool in the study of MPX pathogenesis and in the identification of genetic factors associated with virulence.
Johnson Todd A
Full Text Available Abstract Background The strong linkage disequilibrium (LD recently found in genic or exonic regions of the human genome demonstrated that LD can be increased by evolutionary mechanisms that select for functionally important loci. This suggests that LD might be stronger in regions conserved among species than in non-conserved regions, since regions exposed to natural selection tend to be conserved. To assess this hypothesis, we used genome-wide polymorphism data from the HapMap project and investigated LD within DNA sequences conserved between the human and mouse genomes. Results Unexpectedly, we observed that LD was significantly weaker in conserved regions than in non-conserved regions. To investigate why, we examined sequence features that may distort the relationship between LD and conserved regions. We found that interspersed repeats, and not other sequence features, were associated with the weak LD tendency in conserved regions. To appropriately understand the relationship between LD and conserved regions, we removed the effect of repetitive elements and found that the high degree of sequence conservation was strongly associated with strong LD in coding regions but not with that in non-coding regions. Conclusion Our work demonstrates that the degree of sequence conservation does not simply increase LD as predicted by the hypothesis. Rather, it implies that purifying selection changes the polymorphic patterns of coding sequences but has little influence on the patterns of functional units such as regulatory elements present in non-coding regions, since the former are generally restricted by the constraint of maintaining a functional protein product across multiple exons while the latter may exist more as individually isolated units.
Gregersen, Vivi Raundahl; Bertelsen, Henriette Pasgaard; Poulsen, Nina Aagaard
The cheese renneting process is affected by a number of factors associated to milk composition and a number of Danish Holsteins has previously been identified to have poor milk coagulation ability. Therefore, the aim of this study was to identify genomic regions affecting the technological...
Full Text Available GRAbB (Genomic Region Assembly by Baiting is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome, extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a, as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04, Fedora (23, CentOS (7.1.1503 and Mac OS X (10.7. Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/.
Katherine S Pollard
Full Text Available Comparative genomics allow us to search the human genome for segments that were extensively changed in the last approximately 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome.
Eliot C Bush
Full Text Available An important challenge for human evolutionary biology is to understand the genetic basis of human-chimpanzee differences. One influential idea holds that such differences depend, to a large extent, on adaptive changes in gene expression. An important step in assessing this hypothesis involves gaining a better understanding of selective constraint on noncoding regions of hominid genomes. In noncoding sequence, functional elements are frequently small and can be separated by large nonfunctional regions. For this reason, constraint in hominid genomes is likely to be patchy. Here we use conservation in more distantly related mammals and amniotes as a way of identifying small sequence windows that are likely to be functional. We find that putatively functional noncoding elements defined in this manner are subject to significant selective constraint in hominids.
Full Text Available An important challenge for human evolutionary biology is to understand the genetic basis of human-chimpanzee differences. One influential idea holds that such differences depend, to a large extent, on adaptive changes in gene expression. An important step in assessing this hypothesis involves gaining a better understanding of selective constraint on noncoding regions of hominid genomes. In noncoding sequence, functional elements are frequently small and can be separated by large nonfunctional regions. For this reason, constraint in hominid genomes is likely to be patchy. Here we use conservation in more distantly related mammals and amniotes as a way of identifying small sequence windows that are likely to be functional. We find that putatively functional noncoding elements defined in this manner are subject to significant selective constraint in hominids.
Sørensen, Lars P; Janss, Luc; Madsen, Per
was used. There was a clear difference in the region-wise patterns of genomic correlation among combinations of traits, with distinctive peaks indicating the presence of pleiotropic QTL. CONCLUSIONS: The results show that it is possible to estimate, genome-wide and region-wise genomic (co)variances......BACKGROUND: Multi-trait genomic models in a Bayesian context can be used to estimate genomic (co)variances, either for a complete genome or for genomic regions (e.g. per chromosome) for the purpose of multi-trait genomic selection or to gain further insight into the genomic architecture of related...... with a common prior distribution for the marker allele substitution effects and estimation of the hyperparameters in this prior distribution from the progeny means data. From the Markov chain Monte Carlo samples of the allele substitution effects, genomic (co)variances were calculated on a whole-genome level...
Höglund, Johanna; Guldbrandtsen, Bernt; Lund, Mogens Sandø
6 QTL were detected for FTI: one QTL on each of BTA7, BTA20, BTA23, BTA25, and two QTL on BTA9 (QTL9–1 and QTL9–2). In the second step, ICF showed association with the QTL regions on BTA7, QTL9–2 QTL2 on BTA9, and BTA25, AIS for cows on BTA20 and BTA23, AIS for heifers on QTL9–2 on BTA9, IFL...... for cows on BTA20, BTA23 and BTA25, IFL for heifers on BTA7 and QTL9-2 on BTA9, NRR for heifers on BTA7 and BTA23, and NRR for cows on BTA23. Conclusion: The genome wide association study presented here revealed 6 genomic regions associated with FTI. Screening these 6 QTL regions for the underlying female...... quantitative trait locus regions were re-analyzed using a linear mixed model (animal model) for both FTI and its component traits AIS, NRR, IFL and ICF. The underlying traits were analyzed separately for heifers (first parity cows) and cows (later parity cows) for AIS, NRR, and IFL. Results: In the first step...
Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.
Jin, Eun-Heui; Zhang, Enji; Ko, Youngkwon; Sim, Woo Seog; Moon, Dong Eon; Yoon, Keon Jung; Hong, Jang Hee; Lee, Won Hyung
Complex regional pain syndrome (CRPS) is a chronic, progressive, and devastating pain syndrome characterized by spontaneous pain, hyperalgesia, allodynia, altered skin temperature, and motor dysfunction. Although previous gene expression profiling studies have been conducted in animal pain models, there genome-wide expression profiling in the whole blood of CRPS patients has not been reported yet. Here, we successfully identified certain pain-related genes through genome-wide expression profiling in the blood from CRPS patients. We found that 80 genes were differentially expressed between 4 CRPS patients (2 CRPS I and 2 CRPS II) and 5 controls (cut-off value: 1.5-fold change and pCRPS patients and 18 controls by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). We focused on the MMP9 gene that, by qRT-PCR, showed a statistically significant difference in expression in CRPS patients compared to controls with the highest relative fold change (4.0±1.23 times and p = 1.4×10−4). The up-regulation of MMP9 gene in the blood may be related to the pain progression in CRPS patients. Our findings, which offer a valuable contribution to the understanding of the differential gene expression in CRPS may help in the understanding of the pathophysiology of CRPS pain progression. PMID:24244504
Sapkota, Sirjan; Conner, Joann A.; Hanna, Wayne W.; Simon, Bindu; Fengler, Kevin; Deschamps, St?phane; Cigan, Mark; Ozias-Akins, Peggy
Apomixis, or clonal propagation through seed, is a trait identified within multiple species of the grass family (Poaceae). The genetic locus controlling apomixis in Pennisetum squamulatum (syn Cenchrus squamulatus) and Cenchrus ciliaris (syn Pennisetum ciliare, buffelgrass) is the apospory-specific genomic region (ASGR). Previously, the ASGR was shown to be highly conserved but inverted in marker order between P. squamulatum and C. ciliaris based on fluorescence in situ hybridization (FISH) a...
Full Text Available Methanopyrus spp. are usually isolated from harsh niches, such as high osmotic pressure and extreme temperature. However, the molecular mechanisms for their environmental adaption are poorly understood. Archaeal species is commonly considered as primitive organism. The evolutional placement of archaea is a fundamental and intriguing scientific question. We sequenced the genomes of Methanopyrus strains SNP6 and KOL6 isolated from the Atlantic and Iceland, respectively. Comparative genomic analysis revealed genetic diversity and instability implicated in niche adaption, including a number of transporter- and integrase/transposase-related genes. Pan-genome analysis also defined the gene pool of Methanopyrus spp., in addition of ~120-Kb genomic region of plasticity impacting cognate genomic architecture. We believe that Methanopyrus genomics could facilitate efficient investigation/recognition of archaeal phylogenetic diverse patterns, as well as improve understanding of biological roles and significance of these versatile microbes.
Haanes, E J; Tomlinson, C C
Canine herpesvirus (CHV) is an alpha-herpesvirus of limited pathogenicity in healthy adult dogs and infectivity of the virus appears to be largely limited to cells of canine origin. CHV's low virulence and species specificity make it an attractive candidate for a recombinant vaccine vector to protect dogs against a variety of pathogens. As part of the analysis of the CHV genome, the authors determined the complete nucleotide sequence of the CHV US region as well as portions of the flanking inverted repeats. Seven full open reading frames (ORFs) encoding proteins larger than 100 amino acids were identified within, or partially within the CHV US: cUS2, cUS3, cUS4, cUS6, cUS7, cUS8 and cUS9; which are homologs of the herpes simplex virus type-1 US2; protein kinase; gG, gD, gI, gE; and US9 genes, respectively. An eighth ORF was identified in the inverted repeat region, cIR6, a homolog of the equine herpesvirus type-1 IR6 gene. The authors identified and mapped most of the major transcripts for the predicted CHV US ORFs by Northern analysis.
Kantor, G.J.; Deiss-Tolbert, D.M.
Size separation after UV-endonuclease digestion of DNA from UV-irradiated human cells using denaturing conditions fractionates the genome based on cyclobutane pyrimidine dimer content. We have examined the largest molecules available (50-80 kb; about 5% of the DNA) after fractionation and those of average size (5-15 kb) for content of some specific genes. We find that the largest molecules are not a representative sampling of the genome. Three contiguous genes located in a G+C-rich isochore (tyrosine hydroxylase, insulin, insulin-like growth factor II) have concentrations two to three times greater in the largest molecules. This shows that this genomic region has fewer pyrimidine dimers than most other genomic regions. In contrast, the β-actin genomic region, which has a similar G+C content, has an equal concentration in both fractions as do the p53 and β-globin genomic regions, which are A+T-rich. These data show that DNA damage in the form of cyclobutane pyrimidine dimers occurs with different probabilities in specific isochores. Part of the reason may be the relative G-C content, but other factors must play a significant role. We also report that the transcriptionally inactive insulin region is repaired at the genome-overall rate in normal cells and is not repaired in xeroderma pigmentosum complementation group C cells. (author)
Full Text Available Complex regional pain syndrome (CRPS is a chronic, progressive, and devastating pain syndrome characterized by spontaneous pain, hyperalgesia, allodynia, altered skin temperature, and motor dysfunction. Although previous gene expression profiling studies have been conducted in animal pain models, there genome-wide expression profiling in the whole blood of CRPS patients has not been reported yet. Here, we successfully identified certain pain-related genes through genome-wide expression profiling in the blood from CRPS patients. We found that 80 genes were differentially expressed between 4 CRPS patients (2 CRPS I and 2 CRPS II and 5 controls (cut-off value: 1.5-fold change and p<0.05. Most of those genes were associated with signal transduction, developmental processes, cell structure and motility, and immunity and defense. The expression levels of major histocompatibility complex class I A subtype (HLA-A29.1, matrix metalloproteinase 9 (MMP9, alanine aminopeptidase N (ANPEP, l-histidine decarboxylase (HDC, granulocyte colony-stimulating factor 3 receptor (G-CSF3R, and signal transducer and activator of transcription 3 (STAT3 genes selected from the microarray were confirmed in 24 CRPS patients and 18 controls by quantitative reverse transcription-polymerase chain reaction (qRT-PCR. We focused on the MMP9 gene that, by qRT-PCR, showed a statistically significant difference in expression in CRPS patients compared to controls with the highest relative fold change (4.0±1.23 times and p = 1.4×10(-4. The up-regulation of MMP9 gene in the blood may be related to the pain progression in CRPS patients. Our findings, which offer a valuable contribution to the understanding of the differential gene expression in CRPS may help in the understanding of the pathophysiology of CRPS pain progression.
Zhang, Yan-Cong; Lin, Kui
Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828
Feltus Frank A
Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of
Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W
In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Pollard, Katherine S; Salama, Sofie R; King, Bryan
Comparative genomics allow us to search the human genome for segments that were extensively changed in the last approximately 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202...... genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements...... contributed to accelerated evolution of the fastest evolving elements in the human genome....
Full Text Available Speciation is a continuous process and analysis of species pairs at different stages of divergence provides insight into how it unfolds. Previous genomic studies on young species pairs have revealed peaks of divergence and heterogeneous genomic differentiation. Yet less known is how localised peaks of differentiation progress to genome-wide divergence during the later stages of speciation in the presence of persistent gene flow. Spanning the speciation continuum, stickleback species pairs are ideal for investigating how genomic divergence builds up during speciation. However, attention has largely focused on young postglacial species pairs, with little knowledge of the genomic signatures of divergence and introgression in older stickleback systems. The Japanese stickleback species pair, composed of the Pacific Ocean three-spined stickleback (Gasterosteus aculeatus and the Japan Sea stickleback (G. nipponicus, which co-occur in the Japanese islands, is at a late stage of speciation. Divergence likely started well before the end of the last glacial period and crosses between Japan Sea females and Pacific Ocean males result in hybrid male sterility. Here we use coalescent analyses and Approximate Bayesian Computation to show that the two species split approximately 0.68-1 million years ago but that they have continued to exchange genes at a low rate throughout divergence. Population genomic data revealed that, despite gene flow, a high level of genomic differentiation is maintained across the majority of the genome. However, we identified multiple, small regions of introgression, occurring mainly in areas of low recombination rate. Our results demonstrate that a high level of genome-wide divergence can establish in the face of persistent introgression and that gene flow can be localized to small genomic regions at the later stages of speciation with gene flow.
Ravinet, Mark; Yoshida, Kohta; Shigenobu, Shuji; Toyoda, Atsushi; Fujiyama, Asao; Kitano, Jun
Speciation is a continuous process and analysis of species pairs at different stages of divergence provides insight into how it unfolds. Previous genomic studies on young species pairs have revealed peaks of divergence and heterogeneous genomic differentiation. Yet less known is how localised peaks of differentiation progress to genome-wide divergence during the later stages of speciation in the presence of persistent gene flow. Spanning the speciation continuum, stickleback species pairs are ideal for investigating how genomic divergence builds up during speciation. However, attention has largely focused on young postglacial species pairs, with little knowledge of the genomic signatures of divergence and introgression in older stickleback systems. The Japanese stickleback species pair, composed of the Pacific Ocean three-spined stickleback (Gasterosteus aculeatus) and the Japan Sea stickleback (G. nipponicus), which co-occur in the Japanese islands, is at a late stage of speciation. Divergence likely started well before the end of the last glacial period and crosses between Japan Sea females and Pacific Ocean males result in hybrid male sterility. Here we use coalescent analyses and Approximate Bayesian Computation to show that the two species split approximately 0.68-1 million years ago but that they have continued to exchange genes at a low rate throughout divergence. Population genomic data revealed that, despite gene flow, a high level of genomic differentiation is maintained across the majority of the genome. However, we identified multiple, small regions of introgression, occurring mainly in areas of low recombination rate. Our results demonstrate that a high level of genome-wide divergence can establish in the face of persistent introgression and that gene flow can be localized to small genomic regions at the later stages of speciation with gene flow.
Singer Peter A
Full Text Available Abstract Background While innovations in medicine, science and technology have resulted in improved health and quality of life for many people, the benefits of modern medicine continue to elude millions of people in many parts of the world. To assess the potential of genomics to address health needs in EMR, the World Health Organization's Eastern Mediterranean Regional Office and the University of Toronto Joint Centre for Bioethics jointly organized a Genomics and Public Health Policy Executive Course, held September 20th–23rd, 2003, in Muscat, Oman. The 4-day course was sponsored by WHO-EMRO with additional support from the Canadian Program in Genomics and Global Health. The overall objective of the course was to collectively explore how to best harness genomics to improve health in the region. This article presents the course findings and recommendations for genomics policy in EMR. Methods The course brought together senior representatives from academia, biotechnology companies, regulatory bodies, media, voluntary, and legal organizations to engage in discussion. Topics covered included scientific advances in genomics, followed by innovations in business models, public sector perspectives, ethics, legal issues and national innovation systems. Results A set of recommendations, summarized below, was formulated for the Regional Office, the Member States and for individuals. • Advocacy for genomics and biotechnology for political leadership; • Networking between member states to share information, expertise, training, and regional cooperation in biotechnology; coordination of national surveys for assessment of health biotechnology innovation systems, science capacity, government policies, legislation and regulations, intellectual property policies, private sector activity; • Creation in each member country of an effective National Body on genomics, biotechnology and health to: - formulate national biotechnology strategies - raise
Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.
Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...
Liu, Yawen; Gao, Hui; Marstrand, Troels Torben
, but there are also regions that are bound by ERalpha only in the presence of ERbeta, as well as regions that are selectively bound by either receptor. Analysis of bound regions shows that regions bound by ERalpha have distinct properties in terms of genome landscape, sequence features, and conservation compared...
Full Text Available Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Variant calling often produces large data sets that cannot be realistically validated and which may contain large numbers of false-positives. Errors in the reference assembly increase the number of false-positives. While resources are available to aid in the filtering of variants from human data, for other species these do not yet exist and strict filtering techniques must be employed which are more likely to exclude true-positives. This work assesses the accuracy of the pig reference genome (Sscrofa10.2 using whole genome sequencing reads from the Duroc sow whose genome the assembly was based on. Indicators of structural variation including high regional coverage, unexpected insert sizes, improper pairing and homozygous variants were used to identify low quality (LQ regions of the assembly. Low coverage (LC regions were also identified and analyzed separately. The LQ regions covered 13.85% of the genome, the LC regions covered 26.6% of the genome and combined (LQLC they covered 33.07% of the genome. Over half of dbSNP variants were located in the LQLC regions. Of CNVRs identified in a previous study, 86.3% were located in the LQLC regions. The regions were also enriched for gene predictions from RNA-seq data with 42.98% falling in the LQLC regions. Excluding variants in the LQ, LC or LQLC from future analyses will help reduce the number of false-positive variant calls. Researchers using WGS data should be aware that the current pig reference genome does not give an accurate representation of the copy number of alleles in the original Duroc sow’s genome.
Maeda, Toyoki; Chijiiwa, Yoshiharu; Tsuji, Hideo; Sakoda, Saburo; Tani, Kenzaburo; Suzuki, Tomokazu
In this study, a mouse genomic region is identified that undergoes DNA rearrangement and yields circular DNA in brain during embryogenesis. External region-directed inverse polymerase chain reaction on circular DNA extracted from late embryonic brain tissue repeatedly detected DNA of this region containing recombination joints. Wide-range genomic PCR and digestion-circularization PCR analysis showed this region underwent recombination accompanied with deletion of intervening sequences, including the circularized regions. This region was mapped by fluorescence in situ hybridization to C1 on mouse chromosome 16, where no gene and no physiological DNA rearrangement had been identified. DNA sequence in the region has segmental homology to an orthologous region on human chromosome 3q.13. These observations demonstrated somatic DNA recombination yielding genomic deletions in brain during embryogenesis
Mourier, Tobias; Willerslev, Eske
in generating and maintaining retroelement-free regions in the human genome. METHODOLOGY/PRINCIPAL FINDINGS: Based on the known transcriptional properties of retroelements, we expect long interspersed elements (LINEs) to be able to display a high degree of transcriptional interference. In contrast, we expect......BACKGROUND: Eukaryotic genomes are scattered with retroelements that proliferate through retrotransposition. Although retroelements make up around 40 percent of the human genome, large regions are found to be completely devoid of retroelements. This has been hypothesised to be a result of genomic...... activity of LINEs has been identified previously. CONCLUSIONS/SIGNIFICANCE: Our observations are consistent with the notion that selection against transcriptional interference has contributed to the maintenance and/or generation of retroelement-free regions in the human genome....
Anthon, Christian; Tafer, Hakim; Havgaard, Jakob H
BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However......, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome...
The objective of this research was to identify genomic regions associated with clinical mastitis (MAST) in US Holsteins using producer-reported data. Genome-wide association studies (GWAS) were performed on deregressed PTA using GEMMA v. 0.94. Genotypes included 60,671 SNP for all predictor bulls (n...
In the first study of its kind, an international team of genomics researchers has identified new regions of the human genome that are associated with skin color variation in some African populations, opening new avenues for research on skin diseases and cancer in all populations.
Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.
Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins. Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...
Greally, John M
To test whether regions undergoing genomic imprinting have unique genomic characteristics, imprinted and nonimprinted human loci were compared for nucleotide and retroelement composition. Maternally and paternally expressed subgroups of imprinted genes were found to differ in terms of guanine and cytosine, CpG, and retroelement content, indicating a segregation into distinct genomic compartments. Imprinted regions have been normally permissive to L1 long interspersed transposable element retroposition during mammalian evolution but universally and significantly lack short interspersed transposable elements (SINEs). The primate-specific Alu SINEs, as well as the more ancient mammalian-wide interspersed repeat SINEs, are found at significantly low densities in imprinted regions. The latter paleogenomic signature indicates that the sequence characteristics of currently imprinted regions existed before the mammalian radiation. Transitions from imprinted to nonimprinted genomic regions in cis are characterized by a sharp inflection in SINE content, demonstrating that this genomic characteristic can help predict the presence and extent of regions undergoing imprinting. During primate evolution, SINE accumulation in imprinted regions occurred at a decreased rate compared with control loci. The constraint on SINE accumulation in imprinted regions may be mediated by an active selection process. This selection could be because of SINEs attracting and spreading methylation, as has been found at other loci. Methylation-induced silencing could lead to deleterious consequences at imprinted loci, where inactivation of one allele is already established, and expression is often essential for embryonic growth and survival.
Full Text Available Abstract Background The scope of our understanding of the evolutionary history between viruses and animals is limited. The fact that the recent availability of many complete insect virus genomes and vertebrate genomes as well as the ability to screen these sequences makes it possible to gain a new perspective insight into the evolutionary interaction between insect viruses and vertebrates. This study is to determine the possibility of existence of sequence identity between the genomes of insect viruses and vertebrates, attempt to explain this phenomenon in term of genetic mobile element, and try to investigate the evolutionary relationship between these short regions of identity among these species. Results Some of studied insect viruses contain variable numbers of short regions of sequence identity to the genomes of vertebrate with nucleotide sequence length from 28 bp to 124 bp. They are found to locate in multiple sites of the vertebrate genomes. The ontology of animal genes with identical regions involves in several processes including chromatin remodeling, regulation of apoptosis, signaling pathway, nerve system development and some enzyme-like catalysis. Phylogenetic analysis reveals that at least some short regions of sequence identity in the genomes of vertebrate are derived the ancestral of insect viruses. Conclusion Short regions of sequence identity were found in the vertebrates and insect viruses. These sequences played an important role not only in the long-term evolution of vertebrates, but also in promotion of insect virus. This typical win-win strategy may come from natural selection.
Fan, Gaowei; Li, Jinming
The scope of our understanding of the evolutionary history between viruses and animals is limited. The fact that the recent availability of many complete insect virus genomes and vertebrate genomes as well as the ability to screen these sequences makes it possible to gain a new perspective insight into the evolutionary interaction between insect viruses and vertebrates. This study is to determine the possibility of existence of sequence identity between the genomes of insect viruses and vertebrates, attempt to explain this phenomenon in term of genetic mobile element, and try to investigate the evolutionary relationship between these short regions of identity among these species. Some of studied insect viruses contain variable numbers of short regions of sequence identity to the genomes of vertebrate with nucleotide sequence length from 28 bp to 124 bp. They are found to locate in multiple sites of the vertebrate genomes. The ontology of animal genes with identical regions involves in several processes including chromatin remodeling, regulation of apoptosis, signaling pathway, nerve system development and some enzyme-like catalysis. Phylogenetic analysis reveals that at least some short regions of sequence identity in the genomes of vertebrate are derived the ancestral of insect viruses. Short regions of sequence identity were found in the vertebrates and insect viruses. These sequences played an important role not only in the long-term evolution of vertebrates, but also in promotion of insect virus. This typical win-win strategy may come from natural selection.
Full Text Available BACKGROUND: Eukaryotic genomes are scattered with retroelements that proliferate through retrotransposition. Although retroelements make up around 40 percent of the human genome, large regions are found to be completely devoid of retroelements. This has been hypothesised to be a result of genomic regions being intolerant to insertions of retroelements. The inadvertent transcriptional activity of retroelements may affect neighbouring genes, which in turn could be detrimental to an organism. We speculate that such retroelement transcription, or transcriptional interference, is a contributing factor in generating and maintaining retroelement-free regions in the human genome. METHODOLOGY/PRINCIPAL FINDINGS: Based on the known transcriptional properties of retroelements, we expect long interspersed elements (LINEs to be able to display a high degree of transcriptional interference. In contrast, we expect short interspersed elements (SINEs to display very low levels of transcriptional interference. We find that genomic regions devoid of long interspersed elements (LINEs are enriched for protein-coding genes, but that this is not the case for regions devoid of short interspersed elements (SINEs. This is expected if genes are subject to selection against transcriptional interference. We do not find microRNAs to be associated with genomic regions devoid of either SINEs or LINEs. We further observe an increased relative activity of genes overlapping LINE-free regions during early embryogenesis, where activity of LINEs has been identified previously. CONCLUSIONS/SIGNIFICANCE: Our observations are consistent with the notion that selection against transcriptional interference has contributed to the maintenance and/or generation of retroelement-free regions in the human genome.
Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.
Full Text Available DNA methylation at CpG islands (CGIs is one of the most intensively studied epigenetic mechanisms. It is fundamental for cellular differentiation and control of transcriptional potential. DNA methylation is involved also in several processes that are central to evolutionary biology, including phenotypic plasticity and evolvability. In this study, we explored the relationship between CpG islands methylation and signatures of selective pressure in Homo Sapiens, using a computational biology approach. By analyzing methylation data of 25 cell lines from the Encyclopedia of DNA Elements (ENCODE Consortium, we compared the DNA methylation of CpG islands in genomic regions under selective pressure with the methylation of CpG islands in the remaining part of the genome. To define genomic regions under selective pressure, we used three different methods, each oriented to provide distinct information about selective events. Independently of the method and of the cell type used, we found evidences of undermethylation of CGIs in human genomic regions under selective pressure. Additionally, by analyzing SNP frequency in CpG islands, we demonstrated that CpG islands in regions under selective pressure show lower genetic variation. Our findings suggest that the CpG islands in regions under selective pressure seem to be somehow more "protected" from methylation when compared with other regions of the genome.
Full Text Available Abstract Background Temperature adaptation is one of the most important determinants of distribution and population size of organisms in nature. Recently, quantitative trait loci (QTL mapping and gene expression profiling approaches have been used for detecting candidate genes for heat resistance. However, the resolution of QTL mapping is not high enough to examine the individual effects of various genes in each QTL. Heat stress-responsive genes, characterized by gene expression profiling studies, are not necessarily responsible for heat resistance. Some of these genes may be regulated in association with the heat stress response of other genes. Results To evaluate which heat-responsive genes are potential candidates for heat resistance with higher resolution than previous QTL mapping studies, we performed genome-wide deficiency screen for QTL for heat resistance. We screened 439 isogenic deficiency strains from the DrosDel project, covering 65.6% of the Drosophila melanogaster genome in order to map QTL for thermal resistance. As a result, we found 19 QTL for heat resistance, including 3 novel QTL outside the QTL found in previous studies. Conclusion The QTL found in this study encompassed 19 heat-responsive genes found in the previous gene expression profiling studies, suggesting that they were strong candidates for heat resistance. This result provides new insights into the genetic architecture of heat resistance. It also emphasizes the advantages of genome-wide deficiency screen using isogenic deficiency libraries.
Full Text Available Inbreeding has long been recognized as a primary cause of fitness reduction in both wild and domesticated populations. Consanguineous matings cause inheritance of haplotypes that are identical by descent (IBD and result in homozygous stretches along the genome of the offspring. Size and position of regions of homozygosity (ROHs are expected to correlate with genomic features such as GC content and recombination rate, but also direction of selection. Thus, ROHs should be non-randomly distributed across the genome. Therefore, demographic history may not fully predict the effects of inbreeding. The porcine genome has a relatively heterogeneous distribution of recombination rate, making Sus scrofa an excellent model to study the influence of both recombination landscape and demography on genomic variation. This study utilizes next-generation sequencing data for the analysis of genomic ROH patterns, using a comparative sliding window approach. We present an in-depth study of genomic variation based on three different parameters: nucleotide diversity outside ROHs, the number of ROHs in the genome, and the average ROH size. We identified an abundance of ROHs in all genomes of multiple pigs from commercial breeds and wild populations from Eurasia. Size and number of ROHs are in agreement with known demography of the populations, with population bottlenecks highly increasing ROH occurrence. Nucleotide diversity outside ROHs is high in populations derived from a large ancient population, regardless of current population size. In addition, we show an unequal genomic ROH distribution, with strong correlations of ROH size and abundance with recombination rate and GC content. Global gene content does not correlate with ROH frequency, but some ROH hotspots do contain positive selected genes in commercial lines and wild populations. This study highlights the importance of the influence of demography and recombination on homozygosity in the genome to understand
Christina L. Zheng
Full Text Available Somatic mutations in cancer are more frequent in heterochromatic and late-replicating regions of the genome. We report that regional disparities in mutation density are virtually abolished within transcriptionally silent genomic regions of cutaneous squamous cell carcinomas (cSCCs arising in an XPC−/− background. XPC−/− cells lack global genome nucleotide excision repair (GG-NER, thus establishing differential access of DNA repair machinery within chromatin-rich regions of the genome as the primary cause for the regional disparity. Strikingly, we find that increasing levels of transcription reduce mutation prevalence on both strands of gene bodies embedded within H3K9me3-dense regions, and only to those levels observed in H3K9me3-sparse regions, also in an XPC-dependent manner. Therefore, transcription appears to reduce mutation prevalence specifically by relieving the constraints imposed by chromatin structure on DNA repair. We model this relationship among transcription, chromatin state, and DNA repair, revealing a new, personalized determinant of cancer risk.
Khan, Aziz; Mathelier, Anthony
A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited. To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets. Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene , with the web application available at https://asntech.shinyapps.io/intervene .
Kommadath, A.; Nie, H.; Groenen, M.A.M.; Pas, te M.F.W.; Veerkamp, R.F.; Smits, M.A.
Eukaryotic genes are distributed along chromosomes as clusters of highly expressed genes termed RIDGEs (Regions of IncreaseD Gene Expression) and lowly expressed genes termed anti-RIDGEs, interspersed among genes expressed at intermediate levels or not expressed. Previous studies based on this
Sergey I Nikolaev
Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.
Roe Bruce A
Full Text Available Abstract Background Recent genome sequencing enables mega-base scale comparisons between related genomes. Comparisons between animals, plants, fungi, and bacteria demonstrate extensive synteny tempered by rearrangements. Within the legume plant family, glimpses of synteny have also been observed. Characterizing syntenic relationships in legumes is important in transferring knowledge from model legumes to crops that are important sources of protein, fixed nitrogen, and health-promoting compounds. Results We have uncovered two large soybean regions exhibiting synteny with M. truncatula and with a network of segmentally duplicated regions in Arabidopsis. In all, syntenic regions comprise over 500 predicted genes spanning 3 Mb. Up to 75% of soybean genes are colinear with M. truncatula, including one region in which 33 of 35 soybean predicted genes with database support are colinear to M. truncatula. In some regions, 60% of soybean genes share colinearity with a network of A. thaliana duplications. One region is especially interesting because this 500 kbp segment of soybean is syntenic to two paralogous regions in M. truncatula on different chromosomes. Phylogenetic analysis of individual genes within these regions demonstrates that one is orthologous to the soybean region, with which it also shows substantially denser synteny and significantly lower levels of synonymous nucleotide substitutions. The other M. truncatula region is inferred to be paralogous, presumably resulting from a duplication event preceding speciation. Conclusion The presence of well-defined M. truncatula segments showing orthologous and paralogous relationships with soybean allows us to explore the evolution of contiguous genomic regions in the context of ancient genome duplication and speciation events.
Avila Cobos, Francisco; Anckaert, Jasper; Volders, Pieter-Jan; Everaert, Celine; Rombaut, Dries; Vandesompele, Jo; De Preter, Katleen; Mestdagh, Pieter
Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs. To tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot. Using the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5'-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool.
Background: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.Results: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.Conclusions: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that
Bryant Susan V
Full Text Available Abstract Background The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp were isolated and sequenced to characterize the structure of genic regions. Results Annotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86% of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! Conclusion This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.
Full Text Available The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Sahu Binod B
Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for
Washietl, Stefan; Pedersen, Jakob Skou; Korbel, Jan O
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack...... with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz...
Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y
Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.
Crucello, Aline; Sforça, Danilo Augusto; Horta, Maria Augusta Crivelente; dos Santos, Clelton Aparecido; Viana, Américo José Carvalho; Beloti, Lilian Luzia; de Toledo, Marcelo Augusto Szymanski; Vincentz, Michel; Kuroshu, Reginaldo Massanobu; de Souza, Anete Pereira
Trichoderma harzianum IOC-3844 secretes high levels of cellulolytic-active enzymes and is therefore a promising strain for use in biotechnological applications in second-generation bioethanol production. However, the T. harzianum biomass degradation mechanism has not been well explored at the genetic level. The present work investigates six genomic regions (~150 kbp each) in this fungus that are enriched with genes related to biomass conversion. A BAC library consisting of 5,760 clones was constructed, with an average insert length of 90 kbp. The assembled BAC sequences revealed 232 predicted genes, 31.5% of which were related to catabolic pathways, including those involved in biomass degradation. An expression profile analysis based on RNA-Seq data demonstrated that putative regulatory elements, such as membrane transport proteins and transcription factors, are located in the same genomic regions as genes related to carbohydrate metabolism and exhibit similar expression profiles. Thus, we demonstrate a rapid and efficient tool that focuses on specific genomic regions by combining a BAC library with transcriptomic data. This is the first BAC-based structural genomic study of the cellulolytic fungus T. harzianum, and its findings provide new perspectives regarding the use of this species in biomass degradation processes.
Ming TIAN,Rui HAO,Suyun FANG,Yanqiang WANG,Xiaorong GU,Chungang FENG,Xiaoxiang HU,Ning LI
Full Text Available A unique characteristic of the Silkie chicken is its fibromelanosis phenotype. The dermal layer of its skin, its connective tissue and shank dermis are hyperpigmented. This dermal hyperpigmentation phenotype is controlled by the sex-linked inhibitor of dermal melanin gene (ID and the dominant fibromelanosis allele. This study attempted to confirm the genomic region associated with ID. By genotyping, ID was found to be closely linked to the region between GGA_rs16127903 and GGA_rs14685542 (8406919 bp on chromosome Z, which contains ten functional genes. The expression of these genes was characterized in the embryo and 4 days after hatching and it was concluded that MTAP, encoding methylthioadenosinephosphorylase, would be the most likely candidate gene. Finally, target DNA capture and sequence analysis was performed, but no specific SNP(s was found in the targeted region of the Silkie genome. Further work is necessary to identify the causal ID mutation located on chromosome Z.
Bull, Carolee T.; Ishimaru, Carol A.; Loper, Joyce E.
Two regions involved in catechol biosynthesis (cbs) of Erwinia carotovora W3C105 were cloned by functional complementation of Escherichia coli mutants that were deficient in the biosynthesis of the catechol siderophore enterobactin (ent). A 4.3-kb region of genomic DNA of E. carotovora complemented the entB402 mutation of E. coli. A second genomic region of 12.8 kb complemented entD, entC147, entE405, and entA403 mutations of E. coli. Although functions encoded by catechol biosynthesis genes (cbsA, cbsB, cbsC, cbsD, and cbsE) of E. carotovora were interchangeable with those encoded by corresponding enterobactin biosynthesis genes (entA, entB, entC, entD, and entE), only cbsE hybridized to its functional counterpart (entE) in E. coli. The cbsEA region of E. carotovora W3C105 hybridized to genomic DNA of 21 diverse strains of E. carotovora but did not hybridize to that of a chrysobactin-producing strain of Erwinia chrysanthemi. Strains of E. carotovora fell into nine groups on the basis of sizes of restriction fragments that hybridized to the cbsEA region, indicating that catechol biosynthesis genes were highly polymorphic among strains of E. carotovora. PMID:16349193
Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu
Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian
Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.
Stephen N White
Full Text Available BACKGROUND: Like human immunodeficiency virus (HIV, ovine lentivirus (OvLV is macrophage-tropic and causes lifelong infection. OvLV infects one quarter of U.S. sheep and induces pneumonia and body condition wasting. There is no vaccine to prevent OvLV infection and no cost-effective treatment for infected animals. However, breed differences in prevalence and proviral concentration have indicated a genetic basis for susceptibility to OvLV. A recent study identified TMEM154 variants in OvLV susceptibility. The objective here was to identify additional loci associated with odds and/or control of OvLV infection. METHODOLOGY/PRINCIPAL FINDINGS: This genome-wide association study (GWAS included 964 sheep from Rambouillet, Polypay, and Columbia breeds with serological status and proviral concentration phenotypes. Analytic models accounted for breed and age, as well as genotype. This approach identified TMEM154 (nominal P=9.2×10(-7; empirical P=0.13, provided 12 additional genomic regions associated with odds of infection, and provided 13 regions associated with control of infection (all nominal P<1 × 10(-5. Rapid decline of linkage disequilibrium with distance suggested many regions included few genes each. Genes in regions associated with odds of infection included DPPA2/DPPA4 (empirical P=0.006, and SYTL3 (P=0.051. Genes in regions associated with control of infection included a zinc finger cluster (ZNF192, ZSCAN16, ZNF389, and ZNF165; P=0.001, C19orf42/TMEM38A (P=0.047, and DLGAP1 (P=0.092. CONCLUSIONS/SIGNIFICANCE: These associations provide targets for mutation discovery in sheep susceptibility to OvLV. Aside from TMEM154, these genes have not been associated previously with lentiviral infection in any species, to our knowledge. Further, data from other species suggest functional hypotheses for future testing of these genes in OvLV and other lentiviral infections. Specifically, SYTL3 binds and may regulate RAB27A, which is required for enveloped
Wy Ching Ng
Full Text Available Flaviviruses are enveloped arthropod-borne viruses with a single-stranded, positive-sense RNA genome that can cause serious illness in humans and animals. The 11 kb 5′ capped RNA genome consists of a single open reading frame (ORF, and is flanked by 5′ and 3′ untranslated regions (UTR. The ORF is a polyprotein that is processed into three structural and seven non-structural proteins. The UTRs have been shown to be important for viral replication and immune modulation. Both of these regions consist of elements that are essential for genome cyclization, resulting in initiation of RNA synthesis. Genome mutation studies have been employed to investigate each component of the essential elements to show the necessity of each component and its role in viral RNA replication and growth. Furthermore, the highly structured 3′UTR is responsible for the generation of subgenomic flavivirus RNA (sfRNA that helps the virus evade host immune response, thereby affecting viral pathogenesis. In addition, changes within the 3′UTR have been shown to affect transmissibility between vector and host, which can influence the development of vaccines.
Full Text Available BACKGROUND: The isochore, a large DNA sequence with relatively small GC variance, is one of the most important structures in eukaryotic genomes. Although the isochore has been widely studied in humans and other species, little is known about its distribution in pigs. PRINCIPAL FINDINGS: In this paper, we construct a map of long homogeneous genome regions (LHGRs, i.e., isochores and isochore-like regions, in pigs to provide an intuitive version of GC heterogeneity in each chromosome. The LHGR pattern study not only quantifies heterogeneities, but also reveals some primary characteristics of the chromatin organization, including the followings: (1 the majority of LHGRs belong to GC-poor families and are in long length; (2 a high gene density tends to occur with the appearance of GC-rich LHGRs; and (3 the density of LINE repeats decreases with an increase in the GC content of LHGRs. Furthermore, a portion of LHGRs with particular GC ranges (50%-51% and 54%-55% tend to have abnormally high gene densities, suggesting that biased gene conversion (BGC, as well as time- and energy-saving principles, could be of importance to the formation of genome organization. CONCLUSION: This study significantly improves our knowledge of chromatin organization in the pig genome. Correlations between the different biological features (e.g., gene density and repeat density and GC content of LHGRs provide a unique glimpse of in silico gene and repeats prediction.
Wan, Zi Yi; Xia, Jun Hong; Lin, Grace; Wang, Le; Lin, Valerie C. L.; Yue, Gen Hua
Sexual dimorphism is an interesting biological phenomenon. Previous studies showed that DNA methylation might play a role in sexual dimorphism. However, the overall picture of the genome-wide methylation landscape in sexually dimorphic species remains unclear. We analyzed the DNA methylation landscape and transcriptome in hybrid tilapia (Oreochromis spp.) using whole genome bisulfite sequencing (WGBS) and RNA-sequencing (RNA-seq). We found 4,757 sexually dimorphic differentially methylated regions (DMRs), with significant clusters of DMRs located on chromosomal regions associated with sex determination. CpG methylation in promoter regions was negatively correlated with the gene expression level. MAPK/ERK pathway was upregulated in male tilapia. We also inferred active cis-regulatory regions (ACRs) in skeletal muscle tissues from WGBS datasets, revealing sexually dimorphic cis-regulatory regions. These results suggest that DNA methylation contribute to sex-specific phenotypes and serve as resources for further investigation to analyze the functions of these regions and their contributions towards sexual dimorphisms. PMID:27782217
Laffin, Jennifer J S; Raca, Gordana; Jackson, Craig A; Strand, Edythe A; Jakielski, Kathy J; Shriberg, Lawrence D
The goal of this study was to identify new candidate genes and genomic copy-number variations associated with a rare, severe, and persistent speech disorder termed childhood apraxia of speech. Childhood apraxia of speech is the speech disorder segregating with a mutation in FOXP2 in a multigenerational London pedigree widely studied for its role in the development of speech-language in humans. A total of 24 participants who were suspected to have childhood apraxia of speech were assessed using a comprehensive protocol that samples speech in challenging contexts. All participants met clinical-research criteria for childhood apraxia of speech. Array comparative genomic hybridization analyses were completed using a customized 385K Nimblegen array (Roche Nimblegen, Madison, WI) with increased coverage of genes and regions previously associated with childhood apraxia of speech. A total of 16 copy-number variations with potential consequences for speech-language development were detected in 12 or half of the 24 participants. The copy-number variations occurred on 10 chromosomes, 3 of which had two to four candidate regions. Several participants were identified with copy-number variations in two to three regions. In addition, one participant had a heterozygous FOXP2 mutation and a copy-number variation on chromosome 2, and one participant had a 16p11.2 microdeletion and copy-number variations on chromosomes 13 and 14. Findings support the likelihood of heterogeneous genomic pathways associated with childhood apraxia of speech.
Full Text Available The "heterozygote instability" (HI hypothesis suggests that gene conversion events focused on heterozygous sites during meiosis locally increase the mutation rate, but this hypothesis remains largely untested. As humans left Africa they lost variability, which, if HI operates, should have reduced the mutation rate in non-Africans. Relative substitution rates were quantified in diverse humans using aligned whole genome sequences from the 1,000 genomes project. Substitution rate is consistently greater in Africans than in non-Africans, but only in diploid regions of the genome, consistent with a role for heterozygosity. Analysing the same data partitioned into a series of non-overlapping 2 Mb windows reveals a strong, non-linear correlation between the amount of heterozygosity lost "out of Africa" and the difference in substitution rate between Africans and non-Africans. Putative recent mutations, derived variants that occur only once among the 80 human chromosomes sampled, occur preferentially at the centre of 2 Kb windows that have elevated heterozygosity compared both with the same region in a closely related population and with an immediately adjacent region in the same population. More than half of all substitutions appear attributable to variation in heterozygosity. This observation provides strong support for HI with implications for many branches of evolutionary biology.
Peter D Keightley
Full Text Available Although sequences containing regulatory elements located close to protein-coding genes are often only weakly conserved during evolution, comparisons of rodent genomes have implied that these sequences are subject to some selective constraints. Evolutionary conservation is particularly apparent upstream of coding sequences and in first introns, regions that are enriched for regulatory elements. By comparing the human and chimpanzee genomes, we show here that there is almost no evidence for conservation in these regions in hominids. Furthermore, we show that gene expression is diverging more rapidly in hominids than in murids per unit of neutral sequence divergence. By combining data on polymorphism levels in human noncoding DNA and the corresponding human-chimpanzee divergence, we show that the proportion of adaptive substitutions in these regions in hominids is very low. It therefore seems likely that the lack of conservation and increased rate of gene expression divergence are caused by a reduction in the effectiveness of natural selection against deleterious mutations because of the low effective population sizes of hominids. This has resulted in the accumulation of a large number of deleterious mutations in sequences containing gene control elements and hence a widespread degradation of the genome during the evolution of humans and chimpanzees.
Full Text Available Abstract Background Linkage analyses strongly suggest a number of QTL for production, health and conformation traits in the middle part of bovine chromosome 6 (BTA6. The identification of the molecular background underlying the genetic variation at the QTL and subsequent functional studies require a well-annotated gene sequence map of the critical QTL intervals. To complete the sequence map of the defined subchromosomal regions on BTA6 poorly covered with comparative gene information, we focused on targeted isolation of transcribed sequences from bovine bacterial artificial chromosome (BAC clones mapped to the QTL intervals. Results Using the method of exon trapping, 92 unique exon trapping sequences (ETS were discovered in a chromosomal region of poor gene coverage. Sequence identity to the current NCBI sequence assembly for BTA6 was detected for 91% of unique ETS. Comparative sequence similarity search revealed that 11% of the isolated ETS displayed high similarity to genomic sequences located on the syntenic chromosomes of the human and mouse reference genome assemblies. Nearly a third of the ETS identified similar equivalent sequences in genomic sequence scaffolds from the alternative Celera-based sequence assembly of the human genome. Screening gene, EST, and protein databases detected 17% of ETS with identity to known transcribed sequences. Expression analysis of a subset of the ETS showed that most ETS (84% displayed a distinctive expression pattern in a multi-tissue panel of a lactating cow verifying their existence in the bovine transcriptome. Conclusion The results of our study demonstrate that the exon trapping method based on region-specific BAC clones is very useful for targeted screening for novel transcripts located within a defined chromosomal region being deficiently endowed with annotated gene information. The majority of identified ETS represents unknown noncoding sequences in intergenic regions on BTA6 displaying a
Harvey Steven P
Full Text Available Abstract Background The facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We identified and categorized tandem repeat arrays and their distribution throughout the genome of B. pseudomallei strain K96243 in order to develop a genetic typing method for B. pseudomallei. We then screened 104 of the potentially polymorphic loci across a diverse panel of 31 isolates including B. pseudomallei, B. mallei and B. thailandensis in order to identify loci with varying degrees of polymorphism. A subset of these tandem repeat arrays were subsequently developed into a multiple-locus VNTR analysis to examine 66 B. pseudomallei and 21 B. mallei isolates from around the world, as well as 95 lineages from a serial transfer experiment encompassing ~18,000 generations. Results B. pseudomallei contains a preponderance of tandem repeat loci throughout its genome, many of which are duplicated elsewhere in the genome. The majority of these loci are composed of repeat motif lengths of 6 to 9 bp with 4 to 10 repeat units and are predominately located in intergenic regions of the genome. Across geographically diverse B. pseudomallei and B.mallei isolates, the 32 VNTR loci displayed between 7 and 28 alleles, with Nei's diversity values ranging from 0.47 and 0.94. Mutation rates for these loci are comparable (>10-5 per locus per generation to that of the most diverse tandemly repeated regions found in other less diverse bacteria. Conclusion The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were
Gussow, Ayal B; Copeland, Brett R; Dhindsa, Ryan S; Wang, Quanli; Petrovski, Slavé; Majoros, William H; Allen, Andrew S; Goldstein, David B
There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.
Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas
Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations
Full Text Available Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae-a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC-trnV, trnR-atpA, ndhF-rpl32, psbM-trnD, and trnQ-rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters. Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid, Olea (asterids and Cymbidium (monocots showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF-rpl32 and trnK-rps16 were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing
Kuleshov, K V; Markelov, M L; Dedkov, V G; Vodop'ianov, A S; Kermanov, A V; Pisanov, R V; Kruglikov, V D; Mazrukho, A B; Maleev, V V; Shipulin, G A
Determination of origin of 2 Vibrio cholerae strains isolated on the territory of Rostov region by using full genome sequencing data. Toxigenic strain 2011 EL- 301 V. cholerae 01 El Tor Inaba No. 301 (ctxAB+, tcpA+) and nontoxigenic strain V. cholerae O1 Ogawa P- 18785 (ctxAB-, tcpA+) were studied. Sequencing was carried out on the MiSeq platform. Phylogenetic analysis of the genomes obtained was carried out based on comparison of conservative part of the studied and 54 previously sequenced genomes. 2011EL-301 strain genome was presented by 164 contigs with an average coverage of 100, N50 parameter was 132 kb, for strain P- 18785 - 159 contigs with a coverage of69, N50 - 83 kb. The contigs obtained for strain 2011 EL-301 were deposited in DDBJ/EMBL/GenBank databases with access code AJFN02000000, for strain P-18785 - ANHS00000000. 716 protein-coding orthologous genes were detected. Based on phylogenetic analysis strain P- 18785 belongs to PG-1 subgroup (a group of predecessor strains of the 7th pandemic). Strain 2011EL-301 belongs to groups of strains of the 7th pandemic and is included into the cluster with later isolates that are associated with cases of cholera in South Africa and cases of import of cholera to the USA from Pakistan. The data obtained allows to establish phylogenetic connections with V cholerae strains isolated earlier.
Allison David B
Full Text Available Abstract Background HIV susceptibility and pathogenicity exhibit both interindividual and intergroup variability. The etiology of intergroup variability is still poorly understood, and could be partly linked to genetic differences among racial/ethnic groups. These genetic differences may be traceable to different regimes of natural selection in the 60,000 years since the human radiation out of Africa. Here, we examine population differentiation and haplotype patterns at several loci identified through genome-wide association studies on HIV-1 control, as determined by viral-load setpoint, in European and African-American populations. We use genome-wide data from the Human Genome Diversity Project, consisting of 53 world-wide populations, to compare measures of FST and relative extended haplotype homozygosity (REHH at these candidate loci to the rest of the respective chromosome. Results We find that the Europe-Middle East and Europe-South Asia pairwise FST in the most strongly associated region are elevated compared to most pairwise comparisons with the sub-Saharan African group, which exhibit very low FST. We also find genetic signatures of recent positive selection (higher REHH at these associated regions among all groups except for sub-Saharan Africans and Native Americans. This pattern is consistent with one in which genetic differentiation, possibly due to diversifying/positive selection, occurred at these loci among Eurasians. Conclusions These findings are concordant with those from earlier studies suggesting recent evolutionary change at immunity-related genomic regions among Europeans, and shed light on the potential genetic and evolutionary origin of population differences in HIV-1 control.
The results of this thesis show that the probability of introgression of a putative transgene to wild relatives indeed depends strongly on the insertion location of the transgene. The study of genomic selection patterns can identify crop genomic regions under negative selection in multiple
Brown, T. A. (Terence A.)
... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...
Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa
Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.
Cornelia Di Gaetano
Full Text Available The peculiar position of Sardinia in the Mediterranean sea has rendered its population an interesting biogeographical isolate. The aim of this study was to investigate the genetic population structure, as well as to estimate Runs of Homozygosity and regions under positive selection, using about 1.2 million single nucleotide polymorphisms genotyped in 1077 Sardinian individuals. Using four different methods--fixation index, inflation factor, principal component analysis and ancestry estimation--we were able to highlight, as expected for a genetic isolate, the high internal homogeneity of the island. Sardinians showed a higher percentage of genome covered by RoHs>0.5 Mb (F(RoH%0.5 when compared to peninsular Italians, with the only exception of the area surrounding Alghero. We furthermore identified 9 genomic regions showing signs of positive selection and, we re-captured many previously inferred signals. Other regions harbor novel candidate genes for positive selection, like TMEM252, or regions containing long non coding RNA. With the present study we confirmed the high genetic homogeneity of Sardinia that may be explained by the shared ancestry combined with the action of evolutionary forces.
Full Text Available Duplications play a significant role in both extremes of the phenotypic spectrum of newly arising mutations: they can have severe deleterious effects (e.g. duplications underlie a variety of diseases but can also be highly advantageous. The phenotypic potential of newly arisen duplications has stimulated wide interest in both the mutational and selective processes shaping these variants in the genome. Here we take advantage of the Drosophila simulans-Drosophila melanogaster genetic system to further our understanding of both processes. Regarding mutational processes, the study of two closely related species allows investigation of the potential existence of shared duplication hotspots, and the similarities and differences between the two genomes can be used to dissect its underlying causes. Regarding selection, the difference in the effective population size between the two species can be leveraged to ask questions about the strength of selection acting on different classes of duplications. In this study, we conducted a survey of duplication polymorphisms in 14 different lines of D. simulans using tiling microarrays and combined it with an analogous survey for the D. melanogaster genome. By integrating the two datasets, we identified duplication hotspots conserved between the two species. However, unlike the duplication hotspots identified in mammalian genomes, Drosophila duplication hotspots are not associated with sequences of high sequence identity capable of mediating non-allelic homologous recombination. Instead, Drosophila duplication hotspots are associated with late-replicating regions of the genome, suggesting a link between DNA replication and duplication rates. We also found evidence supporting a higher effectiveness of selection on duplications in D. simulans than in D. melanogaster. This is also true for duplications segregating at high frequency, where we find evidence in D. simulans that a sizeable fraction of these mutations is
Alessandro de Mello Varani
Full Text Available Xylella fastidiosa is a Gram negative plant pathogen causing many economically important diseases, and analyses of completely sequenced X. fastidiosa genome strains allowed the identification of many prophage-like elements and possibly phage remnants, accounting for up to 15% of the genome composition. To better evaluate the recent evolution of the X. fastidiosa chromosome backbone among distinct pathovars, the number and location of prophage-like regions on two finished genomes (9a5c and Temecula1, and in two candidate molecules (Ann1 and Dixon were assessed. Based on comparative best bidirectional hit analyses, the majority (51% of the predicted genes in the X. fastidiosa prophage-like regions are related to structural phage genes belonging to the Siphoviridae family. Electron micrograph reveals the existence of putative viral particles with similar morphology to lambda phages in the bacterial cell in planta. Moreover, analysis of microarray data indicates that 9a5c strain cultivated under stress conditions presents enhanced expression of phage anti-repressor genes, suggesting switches from lysogenic to lytic cycle of phages under stress-induced situations. Furthermore, virulence-associated proteins and toxins are found within these prophage-like elements, thus suggesting an important role in host adaptation. Finally, clustering analyses of phage integrase genes based on multiple alignment patterns reveal they group in five lineages, all possessing a tyrosine recombinase catalytic domain, and phylogenetically close to other integrases found in phages that are genetic mosaics and able to perform generalized and specialized transduction. Integration sites and tRNA association is also evidenced. In summary, we present comparative and experimental evidence supporting the association and contribution of phage activity on the differentiation of Xylella genomes.
Luis E. Rojas
Full Text Available The extraction methods of genomic DNA are usually laborious and hazardous to human health and the environment by the use of organic solvents (chloroform and phenol. In this work a protocol for in situ extraction of genomic DNA by alkaline lysis is validated. It was used in order to amplify regions of DNA in four species of plants and fungi by polymerase chain reaction (PCR. From plant material of Saccharum officinarum L., Carica papaya L. and Digitalis purpurea L. it was possible to extend different regions of the genome through PCR. Furthermore, it was possible to amplify a fragment of avr-4 gene DNA purified from lyophilized mycelium of Mycosphaerella fijiensis. Additionally, it was possible to amplify the region ap24 transgene inserted into the genome of banana cv. `Grande naine' (Musa AAA. Key words: alkaline lysis, Carica papaya L., Digitalis purpurea L., Musa, Saccharum officinarum L.
Full Text Available Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs in Chinese.Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs. We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height.We detected 10 significant association signals for height (p<0.05 in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41 × 10(-4 was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048. Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L, was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators.Our findings suggest the important genetic variants underlying height variation in Chinese.
Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio
The evolutionary processes operating in the DNA regions that participate in the regulation of gene expression are poorly understood. In Escherichia coli, we have established a sequence pattern that distinguishes regulatory from nonregulatory regions. The density of promoter-like sequences, that are recognizable by RNA polymerase and may function as potential promoters, is high within regulatory regions, in contrast to coding regions and regions located between convergently-transcribed genes. Moreover, functional promoter sites identified experimentally are often found in the subregions of highest density of promoter-like signals, even when individual sites with higher binding affinity for RNA polymerase exist elsewhere within the regulatory region. In order to investigate the generality of this pattern, we have used position weight matrices describing the -35 and -10 promoter boxes of E. coli to search for these motifs in 43 additional genomes belonging to most established bacterial phyla, after specific calibration of the matrices according to the base composition of the noncoding regions of each genome. We have found that all bacterial species analyzed contain similar promoter-like motifs, and that, in most cases, these motifs follow the same genomic distribution observed in E. coli. Differential densities between regulatory and nonregulatory regions are detectable in most bacterial genomes, with the exception of those that have experienced evolutionary extreme genome reduction. Thus, the phylogenetic distribution of this pattern mirrors that of genes and other genomic features that require weak selection to be effective in order to persist. On this basis, we suggest that the loss of differential densities in the reduced genomes of host-restricted pathogens and symbionts is the outcome of a process of genome degradation resulting from the decreased efficiency of purifying selection in highly structured small populations. This implies that the differential
Full Text Available Bacillus cereus is a bacterial pathogen that is responsible for many recurrent disease outbreaks due to food contamination. Spores and biofilms are considered the most important reservoirs of B. cereus in contaminated fresh vegetables and fruits. Biofilms are bacterial communities that are difficult to eradicate from biotic and abiotic surfaces because of their stable and extremely strong extracellular matrix. These extracellular matrixes contain exopolysaccharides, proteins, extracellular DNA, and other minor components. Although B. cereus can form biofilms, the bacterial features governing assembly of the protective extracellular matrix are not known. Using the well-studied bacterium B. subtilis as a model, we identified two genomic loci in B. cereus, which encodes two orthologs of the amyloid-like protein TasA of B. subtilis and a SipW signal peptidase. Deletion of this genomic region in B. cereus inhibited biofilm assembly; notably, mutation of the putative signal peptidase SipW caused the same phenotype. However, mutations in tasA or calY did not completely prevent biofilm formation; strains that were mutated for either of these genes formed phenotypically different surface attached biofilms. Electron microscopy studies revealed that TasA polymerizes to form long and abundant fibers on cell surfaces, whereas CalY does not aggregate similarly. Heterologous expression of this amyloid-like cassette in a B. subtilis strain lacking the factors required for the assembly of TasA amyloid-like fibers revealed i the involvement of this B. cereus genomic region in formation of the air-liquid interphase pellicles and ii the intrinsic ability of TasA to form fibers similar to the amyloid-like fibers produced by its B. subtilis ortholog.
Liu, Jiaxuan; Zhao, Wei; Ware, Erin B; Turner, Stephen T; Mosley, Thomas H; Smith, Jennifer A
Genetic variations in apolipoprotein E (APOE) and proximal genes (PVRL2, TOMM40, and APOC1) are associated with cognitive function and dementia, particularly Alzheimer's disease. Epigenetic mechanisms such as DNA methylation play a central role in the regulation of gene expression. Recent studies have found evidence that DNA methylation may contribute to the pathogenesis of dementia, but its association with cognitive function in populations without dementia remains unclear. We assessed DNA methylation levels of 48 CpG sites in the APOE genomic region in peripheral blood leukocytes collected from 289 African Americans (mean age = 67 years) from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. Using linear regression, we examined the relationship between methylation in the APOE genomic region and multiple cognitive measures including learning, memory, processing speed, concentration, language and global cognitive function. We identified eight CpG sites in three genes (PVRL2, TOMM40, and APOE) that showed an inverse association between methylation level and delayed recall, a measure of memory, after adjusting for age and sex (False Discovery Rate q-value accounting for known genetic predictors for cognition. Our findings highlight the important role of epigenetic mechanisms in influencing cognitive performance, and suggest that changes in blood methylation may be an early indicator of individuals at risk for dementia as well as potential targets for intervention in asymptomatic populations.
Balding David J
Full Text Available Abstract Background The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21; Down's syndrome, and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV, arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each individual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM and a sampling algorithm to infer haplotypes jointly in multiple individuals and to obtain a measure of uncertainty in its inferences. Results In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. Conclusion With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
Wang, Yajie; Wu, Fengyi; Pan, Haining; Zheng, Wenzhong; Feng, Chi; Wang, Yunfu; Deng, Zixin; Wang, Lianrong; Luo, Jie; Chen, Shi
Alzheimer's disease (AD) is characterized by amyloid-β (Aβ) deposition in the brain. Aβ plaques are produced through sequential β/γ cleavage of amyloid precursor protein (APP), of which there are three main APP isoforms: APP695, APP751 and APP770. KPI-APPs (APP751 and APP770) are known to be elevated in AD, but the reason remains unclear. Transcription activator-like (TAL) effector nucleases (TALENs) induce mutations with high efficiency at specific genomic loci, and it is thus possible to knock out specific regions using TALENs. In this study, we designed and expressed TALENs specific for the C-terminus of APP in HeLa cells, in which KPI-APPs are predominantly expressed. The KPI-APP mutants lack a 12-aa region that encompasses a 5-aa trans-membrane (TM) region and 7-aa juxta-membrane (JM) region. The mutated KPI-APPs exhibited decreased mitochondrial localization. In addition, mitochondrial morphology was altered, resulting in an increase in spherical mitochondria in the mutant cells through the disruption of the balance between fission and fusion. Mitochondrial dysfunction, including decreased ATP levels, disrupted mitochondrial membrane potential, increased ROS generation and impaired mitochondrial dehydrogenase activity, was also found. These results suggest that specific regions of KPI-APPs are important for mitochondrial localization and function.
William A. MacDonald
Full Text Available Genomic imprinting is a form of epigenetic inheritance whereby the regulation of a gene or chromosomal region is dependent on the sex of the transmitting parent. During gametogenesis, imprinted regions of DNA are differentially marked in accordance to the sex of the parent, resulting in parent-specific expression. While mice are the primary research model used to study genomic imprinting, imprinted regions have been described in a broad variety of organisms, including other mammals, plants, and insects. Each of these organisms employs multiple, interrelated, epigenetic mechanisms to maintain parent-specific expression. While imprinted genes and imprint control regions are often species and locus-specific, the same suites of epigenetic mechanisms are often used to achieve imprinted expression. This review examines some examples of the epigenetic mechanisms responsible for genomic imprinting in mammals, plants, and insects.
Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin
Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
Cormier, Fabien; Le Gouis, Jacques; Dubreuil, Pierre; Lafarge, Stéphane; Praud, Sébastien
This study identified 333 genomic regions associated to 28 traits related to nitrogen use efficiency in European winter wheat using genome-wide association in a 214-varieties panel experimented in eight environments. Improving nitrogen use efficiency is a key factor to sustainably ensure global production increase. However, while high-throughput screening methods remain at a developmental stage, genetic progress may be mainly driven by marker-assisted selection. The objective of this study was to identify chromosomal regions associated with nitrogen use efficiency-related traits in bread wheat (Triticum aestivum L.) using a genome-wide association approach. Two hundred and fourteen European elite varieties were characterised for 28 traits related to nitrogen use efficiency in eight environments in which two different nitrogen fertilisation levels were tested. The genome-wide association study was carried out using 23,603 SNP with a mixed model for taking into account parentage relationships among varieties. We identified 1,010 significantly associated SNP which defined 333 chromosomal regions associated with at least one trait and found colocalisations for 39 % of these chromosomal regions. A method based on linkage disequilibrium to define the associated region was suggested and discussed with reference to false positive rate. Through a network approach, colocalisations were analysed and highlighted the impact of genomic regions controlling nitrogen status at flowering, precocity, and nitrogen utilisation on global agronomic performance. We were able to explain 40 ± 10 % of the total genetic variation. Numerous colocalisations with previously published genomic regions were observed with such candidate genes as Ppd-D1, Rht-D1, NADH-Gogat, and GSe. We highlighted selection pressure on yield and nitrogen utilisation discussing allele frequencies in associated regions.
Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.
Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred
The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known
Reyes-Solis, Guadalupe Del Carmen; Saavedra-Rodriguez, Karla; Suarez, Adriana Flores; Black, William C
The mosquito Aedes aegypti is the principal vector of dengue and yellow fever flaviviruses. Temephos is an organophosphate insecticide used globally to suppress Ae. aegypti larval populations but resistance has evolved in many locations. Quantitative Trait Loci (QTL) controlling temephos survival in Ae. aegypti larvae were mapped in a pair of F3 advanced intercross lines arising from temephos resistant parents from Solidaridad, México and temephos susceptible parents from Iquitos, Peru. Two sets of 200 F3 larvae were exposed to a discriminating dose of temephos and then dead larvae were collected and preserved for DNA isolation every two hours up to 16 hours. Larvae surviving longer than 16 hours were considered resistant. For QTL mapping, single nucleotide polymorphisms (SNPs) were identified at 23 single copy genes and 26 microsatellite loci of known physical positions in the Ae. aegypti genome. In both reciprocal crosses, Multiple Interval Mapping identified eleven QTL associated with time until death. In the Solidaridad×Iquitos (SLD×Iq) cross twelve were associated with survival but in the reciprocal IqxSLD cross, only six QTL were survival associated. Polymorphisms at acetylcholine esterase (AchE) loci 1 and 2 were not associated with either resistance phenotype suggesting that target site insensitivity is not an organophosphate resistance mechanism in this region of México. Temephos resistance is under the control of many metabolic genes of small effect and dispersed throughout the Ae. aegypti genome.
Mangan, Hazel; Gailín, Michael Ó; McStay, Brian
Nucleoli are the sites of ribosome biogenesis and the largest membraneless subnuclear structures. They are intimately linked with growth and proliferation control and function as sensors of cellular stress. Nucleoli form around arrays of ribosomal gene (rDNA) repeats also called nucleolar organizer regions (NORs). In humans, NORs are located on the short arms of all five human acrocentric chromosomes. Multiple NORs contribute to the formation of large heterochromatin-surrounded nucleoli observed in most human cells. Here we will review recent findings about their genomic architecture. The dynamic nature of nucleoli began to be appreciated with the advent of photodynamic experiments using fluorescent protein fusions. We review more recent data on nucleoli in Xenopus germinal vesicles (GVs) which has revealed a liquid droplet-like behavior that facilitates nucleolar fusion. Further analysis in both XenopusGVs and Drosophila embryos indicates that the internal organization of nucleoli is generated by a combination of liquid-liquid phase separation and active processes involving rDNA. We will attempt to integrate these recent findings with the genomic architecture of human NORs to advance our understanding of how nucleoli form and respond to stress in human cells. © 2017 Federation of European Biochemical Societies.
The investigations described in this dissertation were designed to determine the transcriptionally active DNA sequences of IIR region and to identify the viral mRNA transcribed from the transcriptionally most active DNA sequences of that region during late phase of HCMV Towne infection. Preliminary transcriptional studies which included the hybridization of a southern blot of XbaI digested entire HCMV genome to 32 P-labelled late phase infected cell A + RNA, indicated that late viral transcripts homologous to XbaI Q fragment of IIR region were very highly abundant while XbaI Q fragment showed a very low transcriptional activity. To facilitate further analysis of late transcription of IIR region, the entire DNA sequences of IIR region were molecularly cloned as U, S, and H BamHI fragments in pACYC-184 plasmid vector. In addition, to be used in future studies on other regions of the genome, except for y and c' smaller fragments the entire 240 kb HCMV genome was cloned as BamHI fragments in the same vector. Furthermore, the U, S, and H BamHI fragments were mapped with six other restriction enzymes in order to use that mapping data in subsequent transcriptional analysis of the IIR region. Further localization of transcriptionally active DNA sequences within IIR region was achieved by hybridization of southern blots of restricted U, S, and H BamHI fragments with 3' 32 P-labelled infected cell late A + RNA. The 1.5 kb EcooRI subfragments of S BamHI fragment and the adjoining 0.72 kb XhoI subfragment of H BamHI fragment revealed the highest level of transcription, although the remainder of the S fragment was also transcribed at a substantial level. The U fragment and the remainder of the H fragment was transcribed at a very low level
Guillermo Raúl Pratta
Full Text Available Incorporating wild germplasm such as S. pimpinellifolium is an alternative strategy to prolong tomato fruit shelf life(SL without reducing fruit quality. A set of recombinant inbred lines with discrepant values of SL and weight (FW were derived byantagonistic-divergent selection from an interspecific cross. The general objective of this research was to evaluate Genotype x Year(GY and Marker x Year (MY interaction in these new genetic materials for both traits. Genotype and year principal effects and GYinteraction were statistically significant for SL. Genotype and year principal effects were significant for FW but GY interaction wasnot. The marker principal effect was significant for SL and FW but both year principal effect and MY interaction were not significant.Though SL was highly influenced by year conditions, some genome regions appeared to maintain a stable effect across years ofevaluation. Fruit weight, instead, was more independent of year effect.
Background The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. Results Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. Conclusions Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage. PMID:23586706
Hertel, Robert; Rodríguez, David Pintor; Hollensteiner, Jacqueline; Dietrich, Sascha; Leimbach, Andreas; Hoppert, Michael; Liesegang, Heiko; Volland, Sonja
Prophages are viruses, which have integrated their genomes into the genome of a bacterial host. The status of the prophage genome can vary from fully intact with the potential to form infective particles to a remnant state where only a few phage genes persist. Prophages have impact on the properties of their host and are therefore of great interest for genomic research and strain design. Here we present a genome- and next generation sequencing (NGS)-based approach for identification and activity evaluation of prophage regions. Seven prophage or prophage-like regions were identified in the genome of Bacillus licheniformis DSM13. Six of these regions show similarity to members of the Siphoviridae phage family. The remaining region encodes the B. licheniformis orthologue of the PBSX prophage from Bacillus subtilis. Analysis of isolated phage particles (induced by mitomycin C) from the wild-type strain and prophage deletion mutant strains revealed activity of the prophage regions BLi_Pp2 (PBSX-like), BLi_Pp3 and BLi_Pp6. In contrast to BLi_Pp2 and BLi_Pp3, neither phage DNA nor phage particles of BLi_Pp6 could be visualized. However, the ability of prophage BLi_Pp6 to generate particles could be confirmed by sequencing of particle-protected DNA mapping to prophage locus BLi_Pp6. The introduced NGS-based approach allows the investigation of prophage regions and their ability to form particles. Our results show that this approach increases the sensitivity of prophage activity analysis and can complement more conventional approaches such as transmission electron microscopy (TEM). PMID:25811873
to varying degrees of dyspnea (respiratory distress), cachexia (body condition wasting), mastitis , arthritis, and/or encephalitis [5,6]. One of the...General Transcription Factor IIH, polypeptide 5), the gene order does not agree with other mammal genomes including cow , human, dog, and mouse, and it may
Tsai, Kevin J; Lu, Mei-Yeh Jade; Yang, Kai-Jung; Li, Mengyun; Teng, Yuchuan; Chen, Shihmay; Ku, Maurice S B; Li, Wen-Hsiung
The diploid C 4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains.
Batta, Kiran; Zhang, Zhenhai; Yen, Kuangyu; Goffman, David B; Pugh, B Franklin
Nucleosomal organization in and around genes may contribute substantially to transcriptional regulation. The contribution of histone modifications to genome-wide nucleosomal organization has not been systematically evaluated. In the present study, we examine the role of H2BK123 ubiquitylation, a key regulator of several histone modifications, on nucleosomal organization at promoter, genic, and transcription termination regions in Saccharomyces cerevisiae. Using high-resolution MNase chromatin immunoprecipitation and sequencing (ChIP-seq), we map nucleosome positioning and occupancy in mutants of the H2BK123 ubiquitylation pathway. We found that H2B ubiquitylation-mediated nucleosome formation and/or stability inhibits the assembly of the transcription machinery at normally quiescent promoters, whereas ubiquitylation within highly active gene bodies promotes transcription elongation. This regulation does not proceed through ubiquitylation-regulated histone marks at H3K4, K36, and K79. Our findings suggest that mechanistically similar functions of H2B ubiquitylation (nucleosome assembly) elicit different functional outcomes on genes depending on its positional context in promoters (repressive) versus transcribed regions (activating).
Kumar, Nitin; Cai, Haoyang; von Mering, Christian; Baudis, Michael
Regional genomic copy number alterations (CNA) are observed in the vast majority of cancers. Besides specifically targeting well-known, canonical oncogenes, CNAs may also play more subtle roles in terms of modulating genetic potential and broad gene expression patterns of developing tumors. Any significant differences in the overall CNA patterns between different cancer types may thus point towards specific biological mechanisms acting in those cancers. In addition, differences among CNA profiles may prove valuable for cancer classifications beyond existing annotation systems. We have analyzed molecular-cytogenetic data from 25579 tumors samples, which were classified into 160 cancer types according to the International Classification of Disease (ICD) coding system. When correcting for differences in the overall CNA frequencies between cancer types, related cancers were often found to cluster together according to similarities in their CNA profiles. Based on a randomization approach, distance measures from the cluster dendrograms were used to identify those specific genomic regions that contributed significantly to this signal. This approach identified 43 non-neutral genomic regions whose propensity for the occurrence of copy number alterations varied with the type of cancer at hand. Only a subset of these identified loci overlapped with previously implied, highly recurrent (hot-spot) cytogenetic imbalance regions. Thus, for many genomic regions, a simple null-hypothesis of independence between cancer type and relative copy number alteration frequency can be rejected. Since a subset of these regions display relatively low overall CNA frequencies, they may point towards second-tier genomic targets that are adaptively relevant but not necessarily essential for cancer development.
Duarte, Vinícius da Silva; Giaretta, Sabrina; Treu, Laura
Streptococcus thermophilus, a very important dairy species, is constantly threatened by phage infection. We report the genome sequences of three S. thermophilus bacteriophages isolated from a dairy environment in the Veneto region of Italy. These sequences will be used for the development of new ...
Full Text Available We performed a genome wide analysis of 164 urothelial carcinoma samples and 27 bladder cancer cell lines to identify copy number changes associated with disease characteristics, and examined the association of amplification events with stage and grade of disease. Multiplex inversion probe (MIP analysis, a recently developed genomic technique, was used to study 80 urothelial carcinomas to identify mutations and copy number changes. Selected amplification events were then analyzed in a validation cohort of 84 bladder cancers by multiplex ligation-dependent probe assay (MLPA. In the MIP analysis, 44 regions of significant copy number change were identified using GISTIC. Nine gene-containing regions of amplification were selected for validation in the second cohort by MLPA. Amplification events at these 9 genomic regions were found to correlate strongly with stage, being seen in only 2 of 23 (9% Ta grade 1 or 1-2 cancers, in contrast to 31 of 61 (51% Ta grade 3 and T2 grade 2 cancers, p<0.001. These observations suggest that analysis of genomic amplification of these 9 regions might help distinguish non-invasive from invasive urothelial carcinoma, although further study is required. Both MIP and MLPA methods perform well on formalin-fixed paraffin-embedded DNA, enhancing their potential clinical use. Furthermore several of the amplified genes identified here (ERBB2, MDM2, CCND1 are potential therapeutic targets.
Adam H Freedman
Full Text Available Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.
Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz
Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.
Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo
Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Reifenberger, Jeffrey; Dorfman, Kevin; Cao, Han
Human DNA is a not a polymer consisting of a uniform distribution of all 4 nucleic acids, but rather contains regions of high AT and high GC content. When confined, these regions could have different stretch due to the extra hydrogen bond present in the GC basepair. To measure this potential difference, human genomic DNA was nicked with NtBspQI, labeled with a cy3 like fluorophore at the nick site, stained with YOYO, loaded into a device containing an array of nanochannels, and imaged. Over 473,000 individual molecules of DNA, corresponding to roughly 30x coverage of a human genome, were collected and aligned to the human reference. Based on the known AT/GC content between aligned pairs of labels, the stretch was measured for regions of similar size but different AT/GC content. We found that regions of high GC content were consistently more stretched than regions of high AT content between pairs of labels varying in size between 2.5 kbp and 500 kbp. We measured that for every 1% increase in GC content there was roughly a 0.06% increase in stretch. While this effect is small, it is important to take into account differences in stretch between AT and GC rich regions to improve the sensitivity of detection of structural variations from genomic variations. NIH Grant: R01-HG006851.
Ceapa, Corina; Davids, Mark; Ritari, Jarmo; Lambert, Jolanda; Wels, Michiel; Douillard, François P; Smokvina, Tamara; de Vos, Willem M; Knol, Jan; Kleerebezem, Michiel
Lactobacillus rhamnosus is a diverse Gram-positive species with strains isolated from different ecological niches. Here, we report the genome sequence analysis of 40 diverse strains of L. rhamnosus and their genomic comparison, with a focus on the variable genome. Genomic comparison of 40 L. rhamnosus strains discriminated the conserved genes (core genome) and regions of plasticity involving frequent rearrangements and horizontal transfer (variome). The L. rhamnosus core genome encompasses 2,164 genes, out of 4,711 genes in total (the pan-genome). The accessory genome is dominated by genes encoding carbohydrate transport and metabolism, extracellular polysaccharides (EPS) biosynthesis, bacteriocin production, pili production, the cas system, and the associated clustered regularly interspaced short palindromic repeat (CRISPR) loci, and more than 100 transporter functions and mobile genetic elements like phages, plasmid genes, and transposons. A clade distribution based on amino acid differences between core (shared) proteins matched with the clade distribution obtained from the presence-absence of variable genes. The phylogenetic and variome tree overlap indicated that frequent events of gene acquisition and loss dominated the evolutionary segregation of the strains within this species, which is paralleled by evolutionary diversification of core gene functions. The CRISPR-Cas system could have contributed to this evolutionary segregation. Lactobacillus rhamnosus strains contain the genetic and metabolic machinery with strain-specific gene functions required to adapt to a large range of environments. A remarkable congruency of the evolutionary relatedness of the strains' core and variome functions, possibly favoring interspecies genetic exchanges, underlines the importance of gene-acquisition and loss within the L. rhamnosus strain diversification. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre
MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
Full Text Available When testing association between rare variants and diseases, an efficient analytical approach involves considering a set of variants in a genomic region as the unit of analysis. One factor complicating this approach is that the vast majority of rare variants in practical applications are believed to represent background neutral variation. As a result, analyzing a single set with all variants may not represent a powerful approach. Here, we propose two alternative strategies. In the first, we analyze the subsets of rare variants exhaustively. In the second, we categorize variants selectively into two subsets: one in which variants are overrepresented in cases, and the other in which variants are overrepresented in controls. When the proportion of neutral variants is moderate to large we show, by simulations, that the both proposed strategies improve the statistical power over methods analyzing a single set with total variants. When applied to a real sequencing association study, the proposed methods consistently produce smaller p-values than their competitors. When applied to another real sequencing dataset to study the difference of rare allele distributions between ethnic populations, the proposed methods detect the overrepresentation of variants between the CHB (Chinese Han in Beijing and YRI (Yoruba people of Ibadan populations with small p-values. Additional analyses suggest that there is no difference between the CHB and CHD (Chinese Han in Denver datasets, as expected. Finally, when applied to the CHB and JPT (Japanese people in Tokyo populations, existing methods fail to detect any difference, while it is detected by the proposed methods in several regions.
Doran, Anthony G; Berry, Donagh P; Creevey, Christopher J
Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation. Following adjustment for false discovery (q-value carcass traits using a single SNP regression approach. Using a Bayesian approach, 46 QTL were associated (posterior probability > 0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway. A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth such as glucagon and leptin. Several biological pathways, including PPAR signaling, were
Full Text Available An unusual feature of many eukaryotic genomes is the presence of regions that appear intrinsically difficult to copy during the process of DNA replication. Curiously, the location of these difficult-to-replicate regions is often conserved between species, implying a valuable role in some aspect of genome organization or maintenance. The most prominent class of these regions in mammalian cells is defined as chromosome fragile sites, which acquired their name because of a propensity to form visible gaps/breaks on otherwise-condensed chromosomes in mitosis. This fragility is particularly apparent following perturbation of DNA replication—a phenomenon often referred to as “replication stress”. Here, we review recent data on the molecular basis for chromosome fragility and the role of fragile sites in the etiology of cancer. In particular, we highlight how studies on fragile sites have provided unexpected insights into how the DNA repair machinery assists in the completion of DNA replication.
Gualtieri, G.; Bisseling, T.
A synteny based positional cloning approach was started to clone the pea SYM2 gene by using locally conserved genome structure with the model plant Medicago truncatula. We reported that a pea marker tightly linked to SYM2 was used to screen a M. truncatula BAC library, and two contigs named C1/C2
Chen, Jie-Yin; Liu, Chun; Gui, Yue-Jing; Si, Kai-Wei; Zhang, Dan-Dan; Wang, Jie; Short, Dylan P G; Huang, Jin-Qun; Li, Nan-Yang; Liang, Yong; Zhang, Wen-Qi; Yang, Lin; Ma, Xue-Feng; Li, Ting-Gang; Zhou, Lei; Wang, Bao-Li; Bao, Yu-Ming; Subbarao, Krishna V; Zhang, Geng-Yun; Dai, Xiao-Feng
Verticillium dahliae isolates are most virulent on the host from which they were originally isolated. Mechanisms underlying these dominant host adaptations are currently unknown. We sequenced the genome of V. dahliae Vd991, which is highly virulent on its original host, cotton, and performed comparisons with the reference genomes of JR2 (from tomato) and VdLs.17 (from lettuce). Pathogenicity-related factor prediction, orthology and multigene family classification, transcriptome analyses, phylogenetic analyses, and pathogenicity experiments were performed. The Vd991 genome harbored several exclusive, lineage-specific (LS) genes within LS regions (LSRs). Deletion mutants of the seven genes within one LSR (G-LSR2) in Vd991 were less virulent only on cotton. Integration of G-LSR2 genes individually into JR2 and VdLs.17 resulted in significantly enhanced virulence on cotton but did not affect virulence on tomato or lettuce. Transcription levels of the seven LS genes in Vd991 were higher during the early stages of cotton infection, as compared with other hosts. Phylogenetic analyses suggested that G-LSR2 was acquired from Fusarium oxysporum f. sp. vasinfectum through horizontal gene transfer. Our results provide evidence that horizontal gene transfer from Fusarium to Vd991 contributed significantly to its adaptation to cotton and may represent a significant mechanism in the evolution of an asexual plant pathogen. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Li, Zhengcao; Chen, Jiucheng; Wang, Zhen; Pan, Yuchun; Wang, Qishan; Xu, Ningying; Wang, Zhengguang
Chinese pigs have been undergoing both natural and artificial selection for thousands of years. Jinhua pigs are of great importance, as they can be a valuable model for exploring the genetic mechanisms linked to meat quality and other traits such as disease resistance, reproduction and production. The purpose of this study was to identify distinctive footprints of selection between Jinhua pigs and other breeds utilizing genome-wide SNP data. Genotyping by genome reducing and sequencing was implemented in order to perform cross-population extended haplotype homozygosity to reveal strong signatures of selection for those economically important traits. This work was performed at a 2% genome level, which comprised 152 006 SNPs genotyped in a total of 517 individuals. Population-specific footprints of selective sweeps were searched for in the genome of Jinhua pigs using six native breeds and three European breeds as reference groups. Several candidate genes associated with meat quality, health and reproduction, such as GH1, CRHR2, TRAF4 and CCK, were found to be overlapping with the significantly positive outliers. Additionally, the results revealed that some genomic regions associated with meat quality, immune response and reproduction in Jinhua pigs have evolved directionally under domestication and subsequent selections. The identified genes and biological pathways in Jinhua pigs showed different selection patterns in comparison with the Chinese and European breeds. © 2016 Stichting International Foundation for Animal Genetics.
Taras K Oleksyk
Full Text Available When a selective sweep occurs in the chromosomal region around a target gene in two populations that have recently separated, it produces three dramatic genomic consequences: 1 decreased multi-locus heterozygosity in the region; 2 elevated or diminished genetic divergence (F(ST of multiple polymorphic variants adjacent to the selected locus between the divergent populations, due to the alternative fixation of alleles; and 3 a consequent regional increase in the variance of F(ST (S(2F(ST for the same clustered variants, due to the increased alternative fixation of alleles in the loci surrounding the selection target. In the first part of our study, to search for potential targets of directional selection, we developed and validated a resampling-based computational approach; we then scanned an array of 31 different-sized moving windows of SNP variants (5-65 SNPs across the human genome in a set of European and African American population samples with 183,997 SNP loci after correcting for the recombination rate variation. The analysis revealed 180 regions of recent selection with very strong evidence in either population or both. In the second part of our study, we compared the newly discovered putative regions to those sites previously postulated in the literature, using methods based on inspecting patterns of linkage disequilibrium, population divergence and other methodologies. The newly found regions were cross-validated with those found in nine other studies that have searched for selection signals. Our study was replicated especially well in those regions confirmed by three or more studies. These validated regions were independently verified, using a combination of different methods and different databases in other studies, and should include fewer false positives. The main strength of our analysis method compared to others is that it does not require dense genotyping and therefore can be used with data from population-based genome SNP scans
Rossin, Elizabeth J.; Hansen, Kasper Lage; Raychaudhuri, Soumya
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these r......Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed...... in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more...... that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non...
Full Text Available Plant fibers can be used to produce composite materials for automobile parts, thus reducing plastic used in their manufacture, overall vehicle weight and fuel consumption when they replace mineral fillers and glass fibers. Soybean stem residues are, potentially, significant sources of inexpensive, renewable and biodegradable natural fibers, but are not curretly used for biocomposite production due to the functional properties of their fibers in composites being unknown. The current study was initiated to investigate the effects of plant genotype on the performance characteristics of soybean stem fibers when incorporated into a polypropylene (PP matrix using a selective phenotyping approach. Fibers from 50 lines of a recombinant inbred line population (169 RILs grown in different environments were incorporated into PP at 20% (wt/wt by extrusion. Test samples were injection molded and characterized for their mechanical properties. The performance of stem fibers in the composites was significantly affected by genotype and environment. Fibers from different genotypes had significantly different chemical compositions, thus composites prepared with these fibers displayed different physical properties. This study demonstrates that thermoplastic composites with soybean stem-derived fibers have mechanical properties that are equivalent or better than wheat straw fiber composites currently being used for manufacturing interior automotive parts. The addition of soybean stem residues improved flexural, tensile and impact properties of the composites. Furthermore, by linkage and in silico mapping we identified genomic regions to which quantitative trait loci (QTL for compositional and functional properties of soybean stem fibers in thermoplastic composites, as well as genes for cell wall synthesis, were co-localized. These results may lead to the development of high value uses for soybean stem residue.
Full Text Available Abstract Background Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has increased interest in studying regulatory mechanisms of gene expression aimed at elucidating the differences in phenotypes. In particular, a proximal promoter region contains a large number of regulatory elements that control the expression of its downstream gene. Although many studies have focused on identification of these elements, a broader picture on the complexity of transcriptional regulation of different biological processes has not been addressed in mammals. The regulatory complexity may strongly correlate with gene function, as different evolutionary forces must act on the regulatory systems under different biological conditions. We investigate this hypothesis by comparing the conservation of promoters upstream of genes classified in different functional categories. Results By conducting a rank correlation analysis between functional annotation and upstream sequence alignment scores obtained by human-mouse and human-dog comparison, we found a significantly greater conservation of the upstream sequence of genes involved in development, cell communication, neural functions and signaling processes than those involved in more basic processes shared with unicellular organisms such as metabolism and ribosomal function. This observation persists after controlling for G+C content. Considering conservation as a functional signature, we hypothesize a higher density of cis-regulatory elements upstream of genes participating in complex and adaptive processes. Conclusion We identified a class of functions that are associated with either high or low promoter conservation in mammals. We detected a significant tendency that points to complex and adaptive processes were associated with higher promoter conservation, despite the fact that they have emerged
Marcos De Donato
Full Text Available KCNQ1OT1 is located in the region with the highest number of genes showing genomic imprinting, but the mechanisms controlling the genes under its influence have not been fully elucidated. Therefore, we conducted a comparative analysis of the KCNQ1/KCNQ1OT1-CDKN1C region to study its conservation across the best assembled eutherian mammalian genomes sequenced to date and analyzed potential elements that may be implicated in the control of genomic imprinting in this region. The genomic features in these regions from human, mouse, cattle, and dog show a higher number of genes and CpG islands (detected using cpgplot from EMBOSS, but lower number of repetitive elements (including short interspersed nuclear elements and long interspersed nuclear elements, compared with their whole chromosomes (detected by RepeatMasker. The KCNQ1OT1-CDKN1C region contains the highest number of conserved noncoding sequences (CNS among mammals, where we found 16 regions containing about 38 different highly conserved repetitive elements (using mVista, such as LINE1 elements: L1M4, L1MB7, HAL1, L1M4a, L1Med, and an LTR element: MLT1H. From these elements, we found 74 CNS showing high sequence identity (>70% between human, cattle, and mouse, from which we identified 13 motifs (using Multiple Em for Motif Elicitation/Motif Alignment and Search Tool with a significant probability of occurrence, 3 of which were the most frequent and were used to find transcription factor–binding sites. We detected several transcription factors (using JASPAR suite from the families SOX, FOX, and GATA. A phylogenetic analysis of these CNS from human, marmoset, mouse, rat, cattle, dog, horse, and elephant shows branches with high levels of support and very similar phylogenetic relationships among these groups, confirming previous reports. Our results suggest that functional DNA elements identified by comparative genomics in a region densely populated with imprinted mammalian genes may be
Forrester, H.B.; Radford, I.R.
Full text: DNA rearrangement events leading to chromosomal aberrations are central to ionizing radiation-induced cell death. Although DNA double-strand breaks are probably the lesion that initiates formation of chromosomal aberrations, little is understood about the molecular mechanisms that generate and modulate DNA rearrangement. Examination of the sequences that flank sites of DNA rearrangement may provide information regarding the processes and enzymes involved in rearrangement events. Accordingly, we developed a method using inverse PCR that allows the detection and sequencing of putative radiation-induced DNA rearrangements in defined regions of the human genome. The method can detect single copies of a rearrangement event that has occurred in a particular region of the genome and, therefore, DNA rearrangement detection does not require survival and continued multiplication of the affected cell. Ionizing radiation-induced DNA rearrangements were detected in several different regions of the genome of human fibroblast cells that were exposed to 30 Gy of γ-irradiation and then incubated for 24 hours at 37 deg C. There was a 3- to 5-fold increase in the number of products amplified from irradiated as compared with control cells in the target regions 5' to the C-MYC, CDKN1A, RB1, and FGFR2 genes. Sequences were examined from 121 DNA rearrangements. Approximately half of the PCR products were derived from possible inter-chromosomal rearrangements and the remainder were from intra-chromosomal events. A high proportion of the sequences that rearranged with target regions were located in genes, suggesting that rearrangements may occur preferentially in transcribed regions. Eighty-four percent of the sequences examined by reverse transcriptase PCR were from transcribed sequences in IMR-90 cells. The distribution of DNA rearrangements within the target regions is non-random and homology occurs between the sequences involved in rearrangements in some cases but is not
Chowdhary, Rajesh; Bajic, Vladimir B.; Dong, Difeng; Wong, Limsoon; Liu, Jun S
of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome
The objectives of this grant proposal include (1) development of a chromosome microdissection and PCR-mediated microcloning technology, (2) application of this microtechnology to the construction of region-specific libraries for human genome analysis. During this grant period, the authors have successfully developed this microtechnology and have applied it to the construction of microdissection libraries for the following chromosome regions: a whole chromosome 21 (21E), 2 region-specific libraries for the long arm of chromosome 2, 2q35-q37 (2Q1) and 2q33-q35 (2Q2), and 4 region-specific libraries for the entire short arm of chromosome 2, 2p23-p25 (2P1), 2p21-p23 (2P2), 2p14-p16 (wP3) and 2p11-p13 (2P4). In addition, 20--40 unique sequence microclones have been isolated and characterized for genomic studies. These region-specific libraries and the single-copy microclones from the library have been used as valuable resources for (1) isolating microsatellite probes in linkage analysis to further refine the disease locus; (2) isolating corresponding clones with large inserts, e.g. YAC, BAC, P1, cosmid and phage, to facilitate construction of contigs for high resolution physical mapping; and (3) isolating region-specific cDNA clones for use as candidate genes. These libraries are being deposited in the American Type Culture Collection (ATCC) for general distribution.
Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P
We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.
Full Text Available Sheep are among the major economically important livestock species worldwide because the animals produce milk, wool, skin, and meat. In the present study, the Illumina OvineSNP50 BeadChip was used to investigate genetic diversity and genome selection among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds from the United States. After quality-control filtering of SNPs (single nucleotide polymorphisms, we used 48,026 SNPs, including 46,850 SNPs on autosomes that were in Hardy-Weinberg equilibrium and 1,176 SNPs on chromosome × for analysis. Phylogenetic analysis based on all 46,850 SNPs clearly separated Suffolk from Rambouillet, Columbia, Polypay, and Targhee, which was not surprising as Rambouillet contributed to the synthesis of the later three breeds. Based on pair-wise estimates of F(ST, significant genetic differentiation appeared between Suffolk and Rambouillet (F(ST = 0.1621, while Rambouillet and Targhee had the closest relationship (F(ST = 0.0681. A scan of the genome revealed 45 and 41 differentially selected regions (DSRs between Suffolk and Rambouillet and among Rambouillet-related breed populations, respectively. Our data indicated that regions 13 and 24 between Suffolk and Rambouillet might be good candidates for evaluating breed differences. Furthermore, ovine genome v3.1 assembly was used as reference to link functionally known homologous genes to economically important traits covered by these differentially selected regions. In brief, our present study provides a comprehensive genome-wide view on within- and between-breed genetic differentiation, biodiversity, and evolution among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds. These results may provide new guidance for the synthesis of new breeds with different breeding objectives.
Rahmatalla, Siham A; Arends, Danny; Reissmann, Monika; Said Ahmed, Ammar; Wimmers, Klaus; Reyer, Henry; Brockmann, Gudrun A
Sudan is endowed with a variety of indigenous goat breeds which are used for meat and milk production and which are well adapted to the local environment. The aim of the present study was to determine the genetic diversity and relationship within and between the four main Sudanese breeds of Nubian, Desert, Taggar and Nilotic goats. Using the 50 K SNP chip, 24 animals of each breed were genotyped. More than 96% of high quality SNPs were polymorphic with an average minor allele frequency of 0.3. In all breeds, no significant difference between observed (0.4) and expected (0.4) heterozygosity was found and the inbreeding coefficients (F IS ) did not differ from zero. F st coefficients for the genetic distance between breeds also did not significantly deviate from zero. In addition, the analysis of molecular variance revealed that 93% of the total variance in the examined population can be explained by differences among individuals, while only 7% result from differences between the breeds. These findings provide evidence for high genetic diversity and little inbreeding within breeds on one hand, and low diversity between breeds on the other hand. Further examinations using Nei's genetic distance and STRUCTURE analysis clustered Taggar goats distinct from the other breeds. In a principal component (PC) analysis, PC1 could separate Taggar, Nilotic and a mix of Nubian and Desert goats into three groups. The SNPs that contributed strongly to PC1 showed high F st values in Taggar goat versus the other goat breeds. PCA allowed us to identify target genomic regions which contain genes known to influence growth, development, bone formation and the immune system. The information on the genetic variability and diversity in this study confirmed that Taggar goat is genetically different from the other goat breeds in Sudan. The SNPs identified by the first principal components show high F st values in Taggar goat and allowed to identify candidate genes which can be used in the
Damgaard, Christian Kroun; Andersen, Ebbe Sloth; Knudsen, Bjarne
The untranslated leader of the dimeric HIV-1 RNA genome is folded into a complex structure that plays multiple and essential roles in the viral replication cycle. Here, we have investigated secondary and tertiary structural elements within the 5' 744 nucleotides of the HIV-1 genome using...... a combination of bioinformatics, enzymatic probing, native gel electrophoresis, and UV-crosslinking experiments. We used a recently developed RNA folding algorithm (Pfold) to predict the common secondary structure of an alignment of 20 divergent HIV-1 sequences. Combining this analysis with biochemical data, we...
Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián
Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.
Kim, Sangkyu; Welsh, David A; Myers, Leann; Cherry, Katie E; Wyckoff, Jennifer; Jazwinski, S Michal
We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13-14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity.
Silar, Philippe; Barreau, Christian; Debuchy, Robert; Kicka, Sébastien; Turcq, Béatrice; Sainsard-Chanet, Annie; Sellem, Carole H; Billault, Alain; Cattolico, Laurence; Duprat, Simone; Weissenbach, Jean
A Podospora anserina BAC library of 4800 clones has been constructed in the vector pBHYG allowing direct selection in fungi. Screening of the BAC collection for centromeric sequences of chromosome V allowed the recovery of clones localized on either sides of the centromere, but no BAC clone was found to contain the centromere. Seven BAC clones containing 322,195 and 156,244bp from either sides of the centromeric region were sequenced and annotated. One 5S rRNA gene, 5 tRNA genes, and 163 putative coding sequences (CDS) were identified. Among these, only six CDS seem specific to P. anserina. The gene density in the centromeric region is approximately one gene every 2.8kb. Extrapolation of this gene density to the whole genome of P. anserina suggests that the genome contains about 11,000 genes. Synteny analyses between P. anserina and Neurospora crassa show that co-linearity extends at the most to a few genes, suggesting rapid genome rearrangements between these two species.
Javed Iqbal Wattoo
Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.
Limited number of simple sequence repeat (SSR) markers is available for the genome of garlic (Allium sativum L.) although SSR markers have become one of the most preferred marker systems because they are typically co-dominant, reproducible, cross species transferable and highly polymorphic. In this ...
Gilbert, Maarten J.; Miller, William G.; Yee, Emma; Kik, Marja; Zomer, Aldert L.; Wagenaar, Jaap A.; Duim, Birgitta
Campylobacter iguaniorum is most closely related to the species C. fetus, C. hyointestinalis, andC. lanienae. Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C. iguaniorumstrain 1485E, isolated
Skibola, Christine F.; Berndt, Sonja I.; Vijai, Joseph; Conde, Lucia; Wang, Zhaoming; Yeager, Meredith; de Bakker, Paul I. W.; Birmann, Brenda M.; Vajdic, Claire M.; Foo, Jia-Nee; Bracci, Paige M.; Vermeulen, Roel C. H.; Slager, Susan L.; de Sanjose, Silvia; Wang, Sophia S.; Linet, Martha S.; Salles, Gilles; Lan, Qing; Severi, Gianluca; Hjalgrim, Henrik; Lightfoot, Tracy; Melbye, Mads; Gu, Jian; Ghesquieres, Herve; Link, Brian K.; Morton, Lindsay M.; Holly, Elizabeth A.; Smith, Alex; Tinker, Lesley F.; Teras, Lauren R.; Kricker, Anne; Becker, Nikolaus; Purdue, Mark P.; Spinelli, John J.; Zhang, Yawei; Giles, Graham G.; Vineis, Paolo; Monnereau, Alain; Bertrand, Kimberly A.; Albanes, Demetrius; Zeleniuch-Jacquotte, Anne; Gabbas, Attilio; Chung, Charles C.; Burdett, Laurie; Hutchinson, Amy; Lawrence, Charles; Montalvan, Rebecca; Liang, Liming; Huang, Jinyan; Ma, Baoshan; Liu, Jianjun; Adami, Hans-Olov; Glimelius, Bengt; Ye, Yuanqing; Nowakowski, Grzegorz S.; Dogan, Ahmet; Thompson, Carrie A.; Habermann, Thomas M.; Novak, Anne J.; Liebow, Mark; Witzig, Thomas E.; Weiner, George J.; Schenk, Maryjean; Hartge, Patricia; De Roos, Anneclaire J.; Cozen, Wendy; Zhi, Degui; Akers, Nicholas K.; Riby, Jacques; Smith, Martyn T.; Lacher, Mortimer; Villano, Danylo J.; Maria, Ann; Roman, Eve; Kane, Eleanor; Jackson, Rebecca D.; North, Kari E.; Diver, W. Ryan; Turner, Jenny; Armstrong, Bruce K.; Benavente, Yolanda; Boffetta, Paolo; Brennan, Paul; Foretova, Lenka; Maynadie, Marc; Staines, Anthony; McKay, James; Brooks-Wilson, Angela R.; Zheng, Tongzhang; Holford, Theodore R.; Chamosa, Saioa; Kaaks, Rudolph; Kelly, Rachel S.; Ohlsson, Bodil; Travis, Ruth C.; Weiderpass, Elisabete; Clave, Jacqueline; Giovannucci, Edward; Kraft, Peter; Virtamo, Jarmo; Mazza, Patrizio; Cocco, Pierluigi; Ennas, Maria Grazia; Chiu, Brian C. H.; Fraumeni, Joseph R.; Nieters, Alexandra; Offit, Kenneth; Wu, Xifeng; Cerhan, James R.; Smedby, Karin E.; Chanock, Stephen J.; Rothman, Nathaniel
Genome-wide association studies (GWASs) of follicular lymphoma (FL) have previously identified human leukocyte antigen (HLA) gene variants. To identify additional FL susceptibility loci, we conducted a large-scale two-stage GWAS in 4,523 case subjects and 13,344 control subjects of European
Oppentocht, JE; van Delden, W; van de Zande, L
The nucleotide sequences of the Adh and Adhr genes of Drosophila kuntzei were derived from combined overlapping sequences of clones isolated from a genomic library and from cloned PCR and inverse-PCR fragments. Only a proximal promoter was detected upstream of the Adh gene, indicating that D.
Bouwman, A.C.; Visker, M.H.P.W.; Arendonk, van J.A.M.; Bovenhuis, H.
Background - In this study we perform a genome-wide association study (GWAS) for bovine milk fatty acids from summer milk samples. This study replicates a previous study where we performed a GWAS for bovine milk fatty acids based on winter milk samples from the same population. Fatty acids from
Liu, Xiaohua; Kelsoe, John R; Greenwood, Tiffany A
Bipolar disorder is a heterogeneous mood disorder associated with several important clinical comorbidities, such as eating disorders. This clinical heterogeneity complicates the identification of genetic variants contributing to bipolar susceptibility. Here we investigate comorbidity of eating disorders as a subphenotype of bipolar disorder to identify genetic variation that is common and unique to both disorders. We performed a genome-wide association analysis contrasting 184 bipolar subjects with eating disorder comorbidity against both 1370 controls and 2006 subjects with bipolar disorder only from the Bipolar Genome Study (BiGS). The most significant genome-wide finding was observed bipolar with comorbid eating disorder vs. controls within SOX2-OT (p=8.9×10(-8) for rs4854912) with a secondary peak in the adjacent FXR1 gene (p=1.2×10(-6) for rs1805576) on chromosome 3q26.33. This region was also the most prominent finding in the case-only analysis (p=3.5×10(-7) and 4.3×10(-6), respectively). Several regions of interest containing genes involved in neurodevelopment and neuroprotection processes were also identified. While our primary finding did not quite reach genome-wide significance, likely due to the relatively limited sample size, these results can be viewed as a replication of a recent study of eating disorders in a large cohort. These findings replicate the prior association of SOX2-OT with eating disorders and broadly support the involvement of neurodevelopmental/neuroprotective mechanisms in the pathophysiology of both disorders. They further suggest that different clinical manifestations of bipolar disorder may reflect differential genetic contributions and argue for the utility of clinical subphenotypes in identifying additional molecular pathways leading to illness. Copyright © 2015 Elsevier B.V. All rights reserved.
Boschiero, Clarissa; Moreira, Gabriel Costa Monteiro; Gheyas, Almas Ara; Godoy, Thaís Fernanda; Gasparin, Gustavo; Mariani, Pilar Drummond Sampaio Corrêa; Paduan, Marcela; Cesar, Aline Silva Mello; Ledur, Mônica Corrêa; Coutinho, Luiz Lehmann
Meat and egg-type chickens have been selected for several generations for different traits. Artificial and natural selection for different phenotypes can change frequency of genetic variants, leaving particular genomic footprints throghtout the genome. Thus, the aims of this study were to sequence 28 chickens from two Brazilian lines (meat and white egg-type) and use this information to characterize genome-wide genetic variations, identify putative regions under selection using Fst method, and find putative pathways under selection. A total of 13.93 million SNPs and 1.36 million INDELs were identified, with more variants detected from the broiler (meat-type) line. Although most were located in non-coding regions, we identified 7255 intolerant non-synonymous SNPs, 512 stopgain/loss SNPs, 1381 frameshift and 1094 non-frameshift INDELs that may alter protein functions. Genes harboring intolerant non-synonymous SNPs affected metabolic pathways related mainly to reproduction and endocrine systems in the white-egg layer line, and lipid metabolism and metabolic diseases in the broiler line. Fst analysis in sliding windows, using SNPs and INDELs separately, identified over 300 putative regions of selection overlapping with more than 250 genes. For the first time in chicken, INDEL variants were considered for selection signature analysis, showing high level of correlation in results between SNP and INDEL data. The putative regions of selection signatures revealed interesting candidate genes and pathways related to important phenotypic traits in chicken, such as lipid metabolism, growth, reproduction, and cardiac development. In this study, Fst method was applied to identify high confidence putative regions under selection, providing novel insights into selection footprints that can help elucidate the functional mechanisms underlying different phenotypic traits relevant to meat and egg-type chicken lines. In addition, we generated a large catalog of line-specific and common
Blackmon Barbara P
Full Text Available Abstract Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Full Text Available Metabolic syndrome is a highly prevalent human disease with substantial genomic and environmental components. Previous studies indicate the presence of significant genetic determinants of several features of metabolic syndrome on rat chromosome 16 (RNO16 and the syntenic regions of human genome. We derived the SHR.BN16 congenic strain by introgression of a limited RNO16 region from the Brown Norway congenic strain (BN-Lx into the genomic background of the spontaneously hypertensive rat (SHR strain. We compared the morphometric, metabolic, and hemodynamic profiles of adult male SHR and SHR.BN16 rats. We also compared in silico the DNA sequences for the differential segment in the BN-Lx and SHR parental strains. SHR.BN16 congenic rats had significantly lower weight, decreased concentrations of total triglycerides and cholesterol, and improved glucose tolerance compared with SHR rats. The concentrations of insulin, free fatty acids, and adiponectin were comparable between the two strains. SHR.BN16 rats had significantly lower systolic (18-28 mmHg difference and diastolic (10-15 mmHg difference blood pressure throughout the experiment (repeated-measures ANOVA, P < 0.001. The differential segment spans approximately 22 Mb of the telomeric part of the short arm of RNO16. The in silico analyses revealed over 1200 DNA variants between the BN-Lx and SHR genomes in the SHR.BN16 differential segment, 44 of which lead to missense mutations, and only eight of which (in Asb14, Il17rd, Itih1, Syt15, Ercc6, RGD1564958, Tmem161a, and Gatad2a genes are predicted to be damaging to the protein product. Furthermore, a number of genes within the RNO16 differential segment associated with metabolic syndrome components in human studies showed polymorphisms between SHR and BN-Lx (including Lpl, Nrg3, Pbx4, Cilp2, and Stab1. Our novel congenic rat model demonstrates that a limited genomic region on RNO16 in the SHR significantly affects many of the features of metabolic
Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves
While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti--called Wallaby--present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes.
Kahsai, Lily; Cook, Kevin R
Hundreds of Drosophila melanogaster stocks are currently maintained at the Bloomington Drosophila Stock Center with mutations that have not been associated with sequence-defined genes. They have been preserved because they have interesting loss-of-function phenotypes. The experimental value of these mutations would be increased by tying them to specific genomic intervals so that geneticists can more easily associate them with annotated genes. Here, we report the mapping of 85 second chromosome complementation groups in the Bloomington collection to specific, small clusters of contiguous genes or individual genes in the sequenced genome. This information should prove valuable to Drosophila geneticists interested in processes associated with particular phenotypes and those searching for mutations affecting specific sequence-defined genes. Copyright © 2018 Kahsai,Cook.
Full Text Available Hundreds of Drosophila melanogaster stocks are currently maintained at the Bloomington Drosophila Stock Center with mutations that have not been associated with sequence-defined genes. They have been preserved because they have interesting loss-of-function phenotypes. The experimental value of these mutations would be increased by tying them to specific genomic intervals so that geneticists can more easily associate them with annotated genes. Here, we report the mapping of 85 second chromosome complementation groups in the Bloomington collection to specific, small clusters of contiguous genes or individual genes in the sequenced genome. This information should prove valuable to Drosophila geneticists interested in processes associated with particular phenotypes and those searching for mutations affecting specific sequence-defined genes.
Strange, Amy; Bellenguez, Céline; Sim, Xueling; Luben, Robert; Hysi, Pirro G.; Ramdas, Wishal D.; van Koolwijk, Leonieke M.E.; Freeman, Colin; Pirinen, Matti; Su, Zhan; Band, Gavin; Pearson, Richard; Vukcevic, Damjan; Langford, Cordelia; Deloukas, Panos
To discover quantitative trait loci for intraocular pressure, a major risk factor for glaucoma and the only modifiable one, we performed a genome-wide association study on a discovery cohort of 2175 individuals from Sydney, Australia. We found a novel association between intraocular pressure and a common variant at 7p21 near to GLCCI1 and ICA1. The findings in this region were confirmed through two UK replication cohorts totalling 4866 individuals (rs59072263, P(combined) = 1.10 × 10(-8)). A ...
Flather, Dylan; Cathcart, Andrea L.; Cruz, Casey; Baggs, Eric; Ngo, Tuan; Gershon, Paul D.; Semler, Bert L.
Despite being intensely studied for more than 50 years, a complete understanding of the enterovirus replication cycle remains elusive. Specifically, only a handful of cellular proteins have been shown to be involved in the RNA replication cycle of these viruses. In an effort to isolate and identify additional cellular proteins that function in enteroviral RNA replication, we have generated multiple recombinant polioviruses containing RNA affinity tags within the 3′ or 5′ noncoding region of the genome. These recombinant viruses retained RNA affinity sequences within the genome while remaining viable and infectious over multiple passages in cell culture. Further characterization of these viruses demonstrated that viral protein production and growth kinetics were unchanged or only slightly altered relative to wild type poliovirus. However, attempts to isolate these genetically-tagged viral genomes from infected cells have been hindered by high levels of co-purification of nonspecific proteins and the limited matrix-binding efficiency of RNA affinity sequences. Regardless, these recombinant viruses represent a step toward more thorough characterization of enterovirus ribonucleoprotein complexes involved in RNA replication. PMID:26861382
Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets
Ali, Amjad; Naz, Anam; Soares, Siomar C.
-genome approach; the predicted conserved gene families (1,193) constitute similar to 77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost....... Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan...
Full Text Available Barrett's esophagus (BE and esophageal adenocarcinoma (EAC are far more prevalent in European Americans than in African Americans. Hypothesizing that this racial disparity in prevalence might represent a genetic susceptibility, we used an admixture mapping approach to interrogate disease association with genomic differences between European and African ancestry.Formalin fixed paraffin embedded samples were identified from 54 African Americans with BE or EAC through review of surgical pathology databases at participating Barrett's Esophagus Translational Research Network (BETRNet institutions. DNA was extracted from normal tissue, and genotyped on the Illumina OmniQuad SNP chip. Case-only admixture mapping analysis was performed on the data from both all 54 cases and also on a subset of 28 cases with high genotyping quality. Haplotype phases were inferred with Beagle 3.3.2, and local African and European ancestries were inferred with SABER plus. Disease association was tested by estimating and testing excess European ancestry and contrasting it to excess African ancestry.Both datasets, the 54 cases and the 28 cases, identified two admixture regions. An association of excess European ancestry on chromosome 11p reached a 5% genome-wide significance threshold, corresponding to -log10(P = 4.28. A second peak on chromosome 8q reached -log10(P = 2.73. The converse analysis examining excess African ancestry found no genetic regions with significant excess African ancestry associated with BE and EAC. On average, the regions on chromosomes 8q and 11p showed excess European ancestry of 15% and 20%, respectively.Chromosomal regions on 11p15 and 8q22-24 are associated with excess European ancestry in African Americans with BE and EAC. Because GWAS have not reported any variants in these two regions, low frequency and/or rare disease associated variants that confer susceptibility to developing BE and EAC may be driving the observed European ancestry
Full Text Available Preeclampsia (PE is a leading cause of perinatal morbidity and mortality. However, as a common form of PE, the etiology of late-onset PE is elusive. We analyzed 5-methylcytosine (5mC and 5-hydroxymethylcytosine (5hmC levels in the placentas of late-onset severe PE patients (n = 4 and normal controls (n = 4 using a (hydroxymethylated DNA immunoprecipitation approach combined with deep sequencing ([h]MeDIP-seq, and the results were verified by (hMeDIP-qPCR. The most significant differentially methylated regions (DMRs were verified by MassARRAY EppiTYPER in an enlarged sample size (n = 20. Bioinformatics analysis identified 714 peaks of 5mC that were associated with 403 genes and 119 peaks of 5hmC that were associated with 61 genes, thus showing significant differences between the PE patients and the controls (>2-fold, p<0.05. Further, only one gene, PTPRN2, had both 5mC and 5hmC changes in patients. The ErbB signaling pathway was enriched in those 403 genes that had significantly different 5mC level between the groups. This genome-wide mapping of 5mC and 5hmC in late-onset severe PE and normal controls demonstrates that both 5mC and 5hmC play epigenetic roles in the regulation of the disease, but work independently. We reveal the genome-wide mapping of DNA methylation and DNA hydroxymethylation in late-onset PE placentas for the first time, and the identified ErbB signaling pathway and the gene PTPRN2 may be relevant to the epigenetic pathogenesis of late-onset PE.
Full Text Available Abstract Background To have an insight into the Mayetiola destructor (Hessian fly genome, we performed an in silico comparative genomic analysis utilizing genetic mapping, genomic sequence and EST sequence data along with data available from public databases. Results Chromosome walking and FISH were utilized to identify a contig of 50 BAC clones near the telomere of the short arm of Hessian fly chromosome X2 and near the avirulence gene vH13. These clones enabled us to correlate physical and genetic distance in this region of the Hessian fly genome. Sequence data from these BAC ends encompassing a 760 kb region, and a fully sequenced and assembled 42.6 kb BAC clone, was utilized to perform a comparative genomic study. In silico gene prediction combined with BLAST analyses was used to determine putative orthology to the sequenced dipteran genomes of the fruit fly, Drosophila melanogaster, and the malaria mosquito, Anopheles gambiae, and to infer evolutionary relationships. Conclusion This initial effort enables us to advance our understanding of the structure, composition and evolution of the genome of this important agricultural pest and is an invaluable tool for a whole genome sequencing effort.
Shiina, Takashi; Ando, Asako; Suto, Yumiko; Kasai, Fumio; Shigenari, Atsuko; Takishima, Nobusada; Kikkawa, Eri; Iwata, Kyoko; Kuwano, Yuko; Kitamura, Yuka; Matsuzawa, Yumiko; Sano, Kazumi; Nogami, Masahiro; Kawata, Hisako; Li, Suyun; Fukuzumi, Yasuhito; Yamazaki, Masaaki; Tashiro, Hiroyuki; Tamiya, Gen; Kohda, Atsushi; Okumura, Katsuzumi; Ikemura, Toshimichi; Soeda, Eiichi; Mizuki, Nobuhisa; Kimura, Minoru; Bahram, Seiamak; Inoko, Hidetoshi
Human chromosomes 1q21–q25, 6p21.3–22.2, 9q33–q34, and 19p13.1–p13.4 carry clusters of paralogous loci, to date best defined by the flagship 6p MHC region. They have presumably been created by two rounds of large-scale genomic duplications around the time of vertebrate emergence. Phylogenetically, the 1q21–25 region seems most closely related to the 6p21.3 MHC region, as it is only the MHC paralogous region that includes bona fide MHC class I genes, the CD1 and MR1 loci. Here, to clarify the genomic structure of this model MHC paralogous region as well as to gain insight into the evolutionary dynamics of the entire quadriplication process, a detailed analysis of a critical 1.7 megabase (Mb) region was performed. To this end, a composite, deep, YAC, BAC, and PAC contig encompassing all five CD1 genes and linking the centromeric +P5 locus to the telomeric KRTC7 locus was constructed. Within this contig a 1.1-Mb BAC and PAC core segment joining CD1D to FCER1A was fully sequenced and thoroughly analyzed. This led to the mapping of a total of 41 genes (12 expressed genes, 12 possibly expressed genes, and 17 pseudogenes), among which 31 were novel. The latter include 20 olfactory receptor (OR) genes, 9 of which are potentially expressed. Importantly, CD1, SPTA1, OR, and FCERIA belong to multigene families, which have paralogues in the other three regions. Furthermore, it is noteworthy that 12 of the 13 expressed genes in the 1q21–q22 region around the CD1 loci are immunologically relevant. In addition to CD1A-E, these include SPTA1, MNDA, IFI-16, AIM2, BL1A, FY and FCERIA. This functional convergence of structurally unrelated genes is reminiscent of the 6p MHC region, and perhaps represents the emergence of yet another antigen presentation gene cluster, in this case dedicated to lipid/glycolipid antigens rather than antigen-derived peptides. [The nucleotide sequence data reported in this paper have been submitted to the DDBJ, EMBL, and GenBank databases under
Full Text Available Only a limited number of simple sequence repeat (SSR markers is available for the genome of garlic (Allium sativum L. despite the fact that SSR markers have become one of the most preferred DNA marker systems. To develop new SSR markers for the garlic genome, garlic expressed sequence tags (ESTs at the publicly available GarlicEST database were screened for SSR motifs and a total of 132 SSR motifs were identified. Primer pairs were designed for 50 SSR motifs and 24 of these primer pairs were selected as SSR markers based on their consistent amplification patterns and polymorphisms. In addition, two SSR markers were developed from the sequences of garlic cDNA-AFLP fragments. The use of 26 EST-SSR markers for the assessment of genetic relationship was tested using 31 garlic genotypes. Twenty six EST-SSR markers amplified 130 polymorphic DNA fragments and the number of polymorphic alleles per SSR marker ranged from 2 to 13 with an average of 5 alleles. Observed heterozygosity and polymorphism information content (PIC of the SSR markers were between 0.23 and 0.88, and 0.20 and 0.87, respectively. Twenty one out of the 31 garlic genotypes were analyzed in a previous study using AFLP markers and the garlic genotypes clustered together with AFLP markers were also grouped together with EST-SSR markers demonstrating high concordance between AFLP and EST-SSR marker systems and possible immediate application of EST-SSR markers for fingerprinting of garlic clones. EST-SSR markers could be used in genetic studies such as genetic mapping, association mapping, genetic diversity and comparison of the genomes of Allium species.
Schrooten, C.; Bink, M.C.A.M.; Bovenhuis, H.
Chromosomal regions affecting multiple traits ( multiple trait quantitative trait regions or MQR) in dairy cattle were detected using a method based on results from single trait analyses to detect quantitative trait loci (QTL). The covariance between contrasts for different traits in single trait
Sérgio D J Pena
Full Text Available Based on pre-DNA racial/color methodology, clinical and pharmacological trials have traditionally considered the different geographical regions of Brazil as being very heterogeneous. We wished to ascertain how such diversity of regional color categories correlated with ancestry. Using a panel of 40 validated ancestry-informative insertion-deletion DNA polymorphisms we estimated individually the European, African and Amerindian ancestry components of 934 self-categorized White, Brown or Black Brazilians from the four most populous regions of the Country. We unraveled great ancestral diversity between and within the different regions. Especially, color categories in the northern part of Brazil diverged significantly in their ancestry proportions from their counterparts in the southern part of the Country, indicating that diverse regional semantics were being used in the self-classification as White, Brown or Black. To circumvent these regional subjective differences in color perception, we estimated the general ancestry proportions of each of the four regions in a form independent of color considerations. For that, we multiplied the proportions of a given ancestry in a given color category by the official census information about the proportion of that color category in the specific region, to arrive at a "total ancestry" estimate. Once such a calculation was performed, there emerged a much higher level of uniformity than previously expected. In all regions studied, the European ancestry was predominant, with proportions ranging from 60.6% in the Northeast to 77.7% in the South. We propose that the immigration of six million Europeans to Brazil in the 19th and 20th centuries--a phenomenon described and intended as the "whitening of Brazil"--is in large part responsible for dissipating previous ancestry dissimilarities that reflected region-specific population histories. These findings, of both clinical and sociological importance for Brazil
Background The genomic architecture of bud phenology and height growth remains poorly known in most forest trees. In non model species, QTL studies have shown limited application because most often QTL data could not be validated from one experiment to another. The aim of our study was to overcome this limitation by basing QTL detection on the construction of genetic maps highly-enriched in gene markers, and by assessing QTLs across pedigrees, years, and environments. Results Four saturated individual linkage maps representing two unrelated mapping populations of 260 and 500 clonally replicated progeny were assembled from 471 to 570 markers, including from 283 to 451 gene SNPs obtained using a multiplexed genotyping assay. Thence, a composite linkage map was assembled with 836 gene markers. For individual linkage maps, a total of 33 distinct quantitative trait loci (QTLs) were observed for bud flush, 52 for bud set, and 52 for height growth. For the composite map, the corresponding numbers of QTL clusters were 11, 13, and 10. About 20% of QTLs were replicated between the two mapping populations and nearly 50% revealed spatial and/or temporal stability. Three to four occurrences of overlapping QTLs between characters were noted, indicating regions with potential pleiotropic effects. Moreover, some of the genes involved in the QTLs were also underlined by recent genome scans or expression profile studies. Overall, the proportion of phenotypic variance explained by each QTL ranged from 3.0 to 16.4% for bud flush, from 2.7 to 22.2% for bud set, and from 2.5 to 10.5% for height growth. Up to 70% of the total character variance could be accounted for by QTLs for bud flush or bud set, and up to 59% for height growth. Conclusions This study provides a basic understanding of the genomic architecture related to bud flush, bud set, and height growth in a conifer species, and a useful indicator to compare with Angiosperms. It will serve as a basic reference to functional and
Larsen, Jesper; Kuhnert, Peter; Frey, Joachim
subclades, thus reaffirming the hypothesis of vertical inheritance of the leukotoxin operon. The presence of individual 5' flanking regions in M. haemolytica + M. glucosida and M. granulomatis reflects later genome rearrangements within each subclade. The evolution of the novel 5' flanking region in M...
Full Text Available Here we report the draft genome sequence of the acid-tolerant Desulfovibrio sp. DV isolated from the sediments of a Pb-Zn mine tailings dam in the Chita region, Russia. The draft genome has a size of 4.9 Mb and encodes multiple K+-transporters and proton-consuming decarboxylases. The phylogenetic analysis based on concatenated ribosomal proteins revealed that strain DV clusters together with the acid-tolerant Desulfovibrio sp. TomC and Desulfovibrio magneticus. The draft genome sequence and annotation have been deposited at GenBank under the accession number MLBG00000000.
You Frank M
Full Text Available Abstract Background Physical maps employing libraries of bacterial artificial chromosome (BAC clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of the D-genome of hexaploid wheat (Triticum aestivum, Aegilops tauschii, is used as a resource for wheat genomics. The barley diploid genome also provides a good model for the Triticeae and T. aestivum since it is only slightly larger than the ancestor wheat D genome. Gene co-linearity between the grasses can be exploited by extrapolating from rice and Brachypodium distachyon to Ae. tauschii or barley, and then to wheat. Results We report the use of Ae. tauschii for the construction of the physical map of a large distal region of chromosome arm 3DS. A physical map of 25.4 Mb was constructed by anchoring BAC clones of Ae. tauschii with 85 EST on the Ae. tauschii and barley genetic maps. The 24 contigs were aligned to the rice and B. distachyon genomic sequences and a high density SNP genetic map of barley. As expected, the mapped region is highly collinear to the orthologous chromosome 1 in rice, chromosome 2 in B. distachyon and chromosome 3H in barley. However, the chromosome scale of the comparative maps presented provides new insights into grass genome organization. The disruptions of the Ae. tauschii-rice and Ae. tauschii-Brachypodium syntenies were identical. We observed chromosomal rearrangements between Ae. tauschii and barley. The comparison of Ae. tauschii physical and genetic maps showed that the recombination rate across the region dropped from 2.19 cM/Mb in the distal region to 0.09 cM/Mb in the proximal region. The size of the gaps between contigs was evaluated by comparing the recombination rate along the map with the local recombination rates calculated on single contigs. Conclusions The physical map reported here is the first physical map using fingerprinting of a complete
Xu, Pei; Wu, Xinyi; Muñoz-Amatriaín, María; Wang, Baogen; Wu, Xiaohua; Hu, Yaowen; Huynh, Bao-Lam; Close, Timothy J; Roberts, Philip A; Zhou, Wen; Lu, Zhongfu; Li, Guojing
Cowpea (V. unguiculata L. Walp) is a climate resilient legume crop important for food security. Cultivated cowpea (V. unguiculata L) generally comprises the bushy, short-podded grain cowpea dominant in Africa and the climbing, long-podded vegetable cowpea popular in Asia. How selection has contributed to the diversification of the two types of cowpea remains largely unknown. In the current study, a novel genotyping assay for over 50 000 SNPs was employed to delineate genomic regions governing pod length. Major, minor and epistatic QTLs were identified through QTL mapping. Seventy-two SNPs associated with pod length were detected by genome-wide association studies (GWAS). Population stratification analysis revealed subdivision among a cowpea germplasm collection consisting of 299 accessions, which is consistent with pod length groups. Genomic scan for selective signals suggested that domestication of vegetable cowpea was accompanied by selection of multiple traits including pod length, while the further improvement process was featured by selection of pod length primarily. Pod growth kinetics assay demonstrated that more durable cell proliferation rather than cell elongation or enlargement was the main reason for longer pods. Transcriptomic analysis suggested the involvement of sugar, gibberellin and nutritional signalling in regulation of pod length. This study establishes the basis for map-based cloning of pod length genes in cowpea and for marker-assisted selection of this trait in breeding programmes. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Argov, Tal; Rabinovich, Lev; Sigal, Nadejda; Herskovits, Anat A
Construction of Listeria monocytogenes mutants by allelic exchange has been laborious and time-consuming due to lack of proficient selection markers for the final recombination event, that is, a marker conveying substance sensitivity to the bacteria bearing it, enabling the exclusion of merodiploids and selection for plasmid loss. In order to address this issue, we engineered a counterselection marker based on a mutated phenylalanyl-tRNA synthetase gene ( pheS* ). This mutation renders the phenylalanine-binding site of the enzyme more promiscuous and allows the binding of the toxic p -chloro-phenylalanine analog ( p -Cl-phe) as a substrate. When pheS* is introduced into L. monocytogenes and highly expressed under control of a constitutively active promoter, the bacteria become sensitive to p -Cl-phe supplemented in the medium. This enabled us to utilize pheS* as a negative selection marker and generate a novel, efficient suicide vector for allelic exchange in L. monocytogenes We used this vector to investigate the monocin genomic region in L. monocytogenes strain 10403S by constructing deletion mutants of the region. We have found this region to be active and to cause bacterial lysis upon mitomycin C treatment. The future applications of such an effective counterselection system, which does not require any background genomic alterations, are vast, as it can be modularly used in various selection systems (e.g., genetic screens). We expect this counterselection marker to be a valuable genetic tool in research on L. monocytogenes IMPORTANCE L. monocytogenes is an opportunistic intracellular pathogen and a widely studied model organism. An efficient counterselection marker is a long-standing need in Listeria research for improving the ability to design and perform various genetic manipulations and screening systems for different purposes. We report the construction and utilization of an efficient suicide vector for allelic exchange which can be conjugated, leaves no
Full Text Available This study evaluated Mn concentration in the seeds of 120 RILs of lentil developed from the cross “CDC Redberry” × “ILL7502”. Micronutrient analysis using atomic absorption spectrometry indicated mean seed manganese (Mn concentrations ranging from 8.5 to 26.8 mg/kg, based on replicated field trials grown at three locations in Turkey in 2012 and 2013. A linkage map of lentil was constructed and consisted of seven linkage groups with 5,385 DNA markers. The total map length was 973.1 cM, with an average distance between markers of 0.18 cM. A total of 6 QTL for Mn concentration were identified using composite interval mapping (CIM. All QTL were statistically significant and explained 15.3–24.1% of the phenotypic variation, with LOD scores ranging from 3.00 to 4.42. The high-density genetic map reported in this study will increase fundamental knowledge of the genome structure of lentil, and will be the basis for the development of micronutrient-enriched lentil genotypes to support biofortification efforts.
Ates, Duygu; Aldemir, Secil; Yagmur, Bulent; Kahraman, Abdullah; Ozkan, Hakan; Vandenberg, Albert; Tanyolac, Muhammed Bahattin
This study evaluated Mn concentration in the seeds of 120 RILs of lentil developed from the cross "CDC Redberry" × "ILL7502". Micronutrient analysis using atomic absorption spectrometry indicated mean seed manganese (Mn) concentrations ranging from 8.5 to 26.8 mg/kg, based on replicated field trials grown at three locations in Turkey in 2012 and 2013. A linkage map of lentil was constructed and consisted of seven linkage groups with 5,385 DNA markers. The total map length was 973.1 cM, with an average distance between markers of 0.18 cM. A total of 6 QTL for Mn concentration were identified using composite interval mapping (CIM). All QTL were statistically significant and explained 15.3-24.1% of the phenotypic variation, with LOD scores ranging from 3.00 to 4.42. The high-density genetic map reported in this study will increase fundamental knowledge of the genome structure of lentil, and will be the basis for the development of micronutrient-enriched lentil genotypes to support biofortification efforts. Copyright © 2018 Ates et al.
Zhu, Xuefeng; Wirén, Marianna; Sinha, Indranil
Mediator exists in a free form containing the Med12, Med13, CDK8, and CycC subunits (the Srb8-11 module) and a smaller form, which lacks these four subunits and associates with RNA polymerase II (Pol II), forming a holoenzyme. We use chromatin immunoprecipitation (ChIP) and DNA microarrays...... to investigate genome-wide localization of Mediator and the Srb8-11 module in fission yeast. Mediator and the Srb8-11 module display similar binding patterns, and interactions with promoters and upstream activating sequences correlate with increased transcription activity. Unexpectedly, Mediator also interacts...... with the downstream coding region of many genes. These interactions display a negative bias for positions closer to the 5' ends of open reading frames (ORFs) and appear functionally important, because downregulation of transcription in a temperature-sensitive med17 mutant strain correlates with increased Mediator...
Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de
Lidsky, A.S.; Law, M.L.; Morse, H.G.; Kao, F.T.; Rabin, M.; Ruddle, F.H.; Woo, S.L.C.
Phenylketonuria (PKU) is an autosomal recessive disorder of amino acid metabolism caused by a deficiency of the hepatic enzyme phenylalanine hydroxylase. To define the regional map position of the disease locus and the PAH gene on human chromosome 12, DNA was isolated from human-hamster somatic cell hybrids with various deletions of human chromosome 12 and was analyzed by Southern blot analysis using the human cDNA PAH clone as a hybridization probe. From these results, together with detailed biochemical and cytogenetic characterization of the hybrid cells, the region on chromosome 12 containing the human PAH gene has been defined as 12q14.3..-->..qter. The PAH map position on chromosome 12 was further localized by in situ hybridization of /sup 125/I-labeled human PAH cDNA to chromosomes prepared from a human lymphoblastoid cell line. Results of these experiments demonstrated that the region on chromosome 12 containing the PAH gene and the PKU locus in man is 12q22..-->..12q24.1. These results not only provide a regionalized map position for a major human disease locus but also can serve as a reference point for linkage analysis with other DNA markers on human chromosome 12.
Lidsky, A.S.; Law, M.L.; Morse, H.G.; Kao, F.T.; Rabin, M.; Ruddle, F.H.; Woo, S.L.C.
Phenylketonuria (PKU) is an autosomal recessive disorder of amino acid metabolism caused by a deficiency of the hepatic enzyme phenylalanine hydroxylase. To define the regional map position of the disease locus and the PAH gene on human chromosome 12, DNA was isolated from human-hamster somatic cell hybrids with various deletions of human chromosome 12 and was analyzed by Southern blot analysis using the human cDNA PAH clone as a hybridization probe. From these results, together with detailed biochemical and cytogenetic characterization of the hybrid cells, the region on chromosome 12 containing the human PAH gene has been defined as 12q14.3→qter. The PAH map position on chromosome 12 was further localized by in situ hybridization of 125 I-labeled human PAH cDNA to chromosomes prepared from a human lymphoblastoid cell line. Results of these experiments demonstrated that the region on chromosome 12 containing the PAH gene and the PKU locus in man is 12q22→12q24.1. These results not only provide a regionalized map position for a major human disease locus but also can serve as a reference point for linkage analysis with other DNA markers on human chromosome 12
Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas
The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse br...
Bonnema, G.; Berg, van den P.; Lindhout, P.
A set of three tomato chromosome 7 introgression lines (ILs) containing overlapping segments of Lycopersicon pennellii DNA was screened with a set of 10 EcoRI–MseI and 10 PstI–MseI AFLP primer combinations. A large number of markers were identified that mapped to one of the four regions of
Chen, Zhi-Teng; Du, Yu-Zhou
The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer ( AGN ), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae.
Rossin, Elizabeth J.; Lage, Kasper; Raychaudhuri, Soumya; Xavier, Ramnik J.; Tatar, Diana; Benita, Yair
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in
Oshima, Masao; Kikuchi, Rie; Imamura, Jun; Handa, Hirokazu
CMS (cytoplasmic male sterile) rapeseed is produced by asymmetrical somatic cell fusion between the Brassica napus cv. Westar and the Raphanus sativus Kosena CMS line (Kosena radish). The CMS rapeseed contains a CMS gene, orf125, which is derived from Kosena radish. Our sequence analyses revealed that the orf125 region in CMS rapeseed originated from recombination between the orf125/orfB region and the nad1C/ccmFN1 region by way of a 63 bp repeat. A precise sequence comparison among the related sequences in CMS rapeseed, Kosena radish and normal rapeseed showed that the orf125 region in CMS rapeseed consisted of the Kosena orf125/orfB region and the rapeseed nad1C/ccmFN1 region, even though Kosena radish had both the orf125/orfB region and the nad1C/ccmFN1 region in its mitochondrial genome. We also identified three tandem repeat sequences in the regions surrounding orf125, including a 63 bp repeat, which were involved in several recombination events. Interestingly, differences in the recombination activity for each repeat sequence were observed, even though these sequences were located adjacent to each other in the mitochondrial genome. We report results indicating that recombination events within the mitochondrial genomes are regulated at the level of specific repeat sequences depending on the cellular environment.
Lee, K-E; Lee, E-J; Park, H-S
Recent advances in computational epigenetics have provided new opportunities to evaluate n-gram probabilistic language models. In this paper, we describe a systematic genome-wide approach for predicting functional roles in inactive chromatin regions by using a sequence-based Markovian chromatin map of the human genome. We demonstrate that Markov chains of sequences can be used as a precursor to predict functional roles in heterochromatin regions and provide an example comparing two publicly available chromatin annotations of large-scale epigenomics projects: ENCODE project consortium and Roadmap Epigenomics consortium.
Avian papillomaviruses: the parrot Psittacus erithacus papillomavirus (PePV genome has a unique organization of the early protein region and is phylogenetically related to the chaffinch papillomavirus
Jenson A Bennett
Full Text Available Abstract Background An avian papillomavirus genome has been cloned from a cutaneous exophytic papilloma from an African grey parrot (Psittacus erithacus. The nucleotide sequence, genome organization, and phylogenetic position of the Psittacus erithacus papillomavirus (PePV were determined. This PePV sequence represents the first complete avian papillomavirus genome defined. Results The PePV genome (7304 basepairs differs from other papillomaviruses, in that it has a unique organization of the early protein region lacking classical E6 and E7 open reading frames. Phylogenetic comparison of the PePV sequence with partial E1 and L1 sequences of the chaffinch (Fringilla coelebs papillomavirus (FPV reveals that these two avian papillomaviruses form a monophyletic cluster with a common branch that originates near the unresolved center of the papillomavirus evolutionary tree. Conclusions The PePV genome has a unique layout of the early protein region which represents a novel prototypic genomic organization for avian papillomaviruses. The close relationship between PePV and FPV, and between their Psittaciformes and Passeriformes hosts, supports the hypothesis that papillomaviruses have co-evolved and speciated together with their host species throughout evolution.
Background Discerning the traits evolving under neutral conditions from those traits evolving rapidly because of various selection pressures is a great challenge. We propose a new method, composite selection signals (CSS), which unifies the multiple pieces of selection evidence from the rank distribution of its diverse constituent tests. The extreme CSS scores capture highly differentiated loci and underlying common variants hauling excess haplotype homozygosity in the samples of a target population. Results The data on high-density genotypes were analyzed for evidence of an association with either polledness or double muscling in various cohorts of cattle and sheep. In cattle, extreme CSS scores were found in the candidate regions on autosome BTA-1 and BTA-2, flanking the POLL locus and MSTN gene, for polledness and double muscling, respectively. In sheep, the regions with extreme scores were localized on autosome OAR-2 harbouring the MSTN gene for double muscling and on OAR-10 harbouring the RXFP2 gene for polledness. In comparison to the constituent tests, there was a partial agreement between the signals at the four candidate loci; however, they consistently identified additional genomic regions harbouring no known genes. Persuasively, our list of all the additional significant CSS regions contains genes that have been successfully implicated to secondary phenotypic diversity among several subpopulations in our data. For example, the method identified a strong selection signature for stature in cattle capturing selective sweeps harbouring UQCC-GDF5 and PLAG1-CHCHD7 gene regions on BTA-13 and BTA-14, respectively. Both gene pairs have been previously associated with height in humans, while PLAG1-CHCHD7 has also been reported for stature in cattle. In the additional analysis, CSS identified significant regions harbouring multiple genes for various traits under selection in European cattle including polledness, adaptation, metabolism, growth rate, stature
Wanzek, Katharina; Schwindt, Eike; Capra, John A.; Paeschke, Katrin
The regulation of replication is essential to preserve genome integrity. Mms1 is part of the E3 ubiquitin ligase complex that is linked to replication fork progression. By identifying Mms1 binding sites genome-wide in Saccharomyces cerevisiae we connected Mms1 function to genome integrity and
Full Text Available VKORC1 (vitamin K epoxide reductase complex subunit 1, 16p11.2 is the main genetic determinant of human response to oral anticoagulants of antivitamin K type (AVK. This gene was recently suggested to be a putative target of positive selection in East Asian populations. In this study, we genotyped the HGDP-CEPH Panel for six VKORC1 SNPs and downloaded chromosome 16 genotypes from the HGDP-CEPH database in order to characterize the geographic distribution of footprints of positive selection within and around this locus. A unique VKORC1 haplotype carrying the promoter mutation associated with AVK sensitivity showed especially high frequencies in all the 17 HGDP-CEPH East Asian population samples. VKORC1 and 24 neighboring genes were found to lie in a 505 kb region of strong linkage disequilibrium in these populations. Patterns of allele frequency differentiation and haplotype structure suggest that this genomic region has been submitted to a near complete selective sweep in all East Asian populations and only in this geographic area. The most extreme scores of the different selection tests are found within a smaller 45 kb region that contains VKORC1 and three other genes (BCKDK, MYST1 (KAT8, and PRSS8 with different functions. Because of the strong linkage disequilibrium, it is not possible to determine if VKORC1 or one of the three other genes is the target of this strong positive selection that could explain present-day differences among human populations in AVK dose requirement. Our results show that the extended region surrounding a presumable single target of positive selection should be analyzed for genetic variation in a wide range of genetically diverse populations in order to account for other neighboring and confounding selective events and the hitchhiking effect.
Stuhlmüller, M.; Schwarz-Finsterle, J.; Fey, E.; Lux, J.; Bach, M.; Cremer, C.; Hinderhofer, K.; Hausmann, M.; Hildenbrand, G.
Trinucleotide repeat expansions (like (CGG)n) of chromatin in the genome of cell nuclei can cause neurological disorders such as for example the Fragile-X syndrome. Until now the mechanisms are not clearly understood as to how these expansions develop during cell proliferation. Therefore in situ investigations of chromatin structures on the nanoscale are required to better understand supra-molecular mechanisms on the single cell level. By super-resolution localization microscopy (Spectral Position Determination Microscopy; SPDM) in combination with nano-probing using COMBO-FISH (COMBinatorial Oligonucleotide FISH), novel insights into the nano-architecture of the genome will become possible. The native spatial structure of trinucleotide repeat expansion genome regions was analysed and optical sequencing of repetitive units was performed within 3D-conserved nuclei using SPDM after COMBO-FISH. We analysed a (CGG)n-expansion region inside the 5' untranslated region of the FMR1 gene. The number of CGG repeats for a full mutation causing the Fragile-X syndrome was found and also verified by Southern blot. The FMR1 promotor region was similarly condensed like a centromeric region whereas the arrangement of the probes labelling the expansion region seemed to indicate a loop-like nano-structure. These results for the first time demonstrate that in situ chromatin structure measurements on the nanoscale are feasible. Due to further methodological progress it will become possible to estimate the state of trinucleotide repeat mutations in detail and to determine the associated chromatin strand structural changes on the single cell level. In general, the application of the described approach to any genome region will lead to new insights into genome nano-architecture and open new avenues for understanding mechanisms and their relevance in the development of heredity diseases.
Kato, N; Ootsuyama, Y; Sekiya, H; Ohkoshi, S; Nakazawa, T; Hijikata, M; Shimotohno, K
The hypervariable region 1 (HVR1) of the putative second envelope glycoprotein (gp70) of hepatitis C virus (HCV) contains a sequence-specific immunological B-cell epitope that induces the production of antibodies restricted to the specific viral isolate, and anti-HVR1 antibodies are involved in the genetic drift of HVR1 driven by immunoselection (N. Kato, H. Sekiya, Y. Ootsuyama, T. Nakazawa, M. Hijikata, S. Ohkoshi, and K. Shimotohno, J. Virol. 67:3923-3930, 1993). We further investigated th...
Su, He-Ling; Huang, Ri-Dong; He, Song-Qing; Xu, Qing; Zhu, Hua; Mo, Zhi-Jing; Liu, Qing-Bo; Liu, Yong-Ming
Brown ducks carrying DHBV were widely used as hepatitis B animal model in the research of the activity and toxicity of anti-HBV dugs. Studies showed that the ratio of DHBV carriers in the brown ducks in Guilin region was relatively high. Nevertheless, the characters of the DHBV genome of Guilin brown duck remain unknown. Here we report the cloning of the genome of Guilin brown duck DHBV and the sequence analysis of the genome. The full length of the DHBV genome of Guilin brown duck was 3 027bp. Analysis using ORF finder found that there was an ORF for an unknown peptide other than S-ORF, PORF and C-ORF in the genome of the DHBV. Vector NTI 8. 0 analysis revealed that the unknown peptide contained a motif which binded to HLA * 0201. Aligning with the DHBV sequences from different countries and regions indicated that there were no obvious differences of regional distribution among the sequences. A fluorescence quantitative PCR for detecting DHBV was establishment based on the recombinant plasmid pGEM-DHBV-S constructed. This study laid the groundwork for using Guilin brown duck as a hepatitis B animal model.
Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L
Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct ® , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Lagisz, M; Wen, S-Y; Routtu, J; Klappert, K; Mazzi, D; Morales-Hojas, R; Schäfer, M A; Vieira, J; Hoikkala, A; Ritchie, M G; Butlin, R K
Acoustic signals often have a significant role in pair formation and in species recognition. Determining the genetic basis of signal divergence will help to understand signal evolution by sexual selection and its role in the speciation process. An earlier study investigated quantitative trait locus for male courtship song carrier frequency (FRE) in Drosophila montana using microsatellite markers. We refined this study by adding to the linkage map markers for 10 candidate genes known to affect song production in Drosophila melanogaster. We also extended the analyses to additional song characters (pulse train length (PTL), pulse number (PN), interpulse interval, pulse length (PL) and cycle number (CN)). Our results indicate that loci in two different regions of the genome control distinct features of the courtship song. Pulse train traits (PTL and PN) mapped to the X chromosome, showing significant linkage with the period gene. In contrast, characters related to song pulse properties (PL, CN and carrier FRE) mapped to the region of chromosome 2 near the candidate gene fruitless, identifying these genes as suitable loci for further investigations. In previous studies, the pulse train traits have been found to vary substantially between Drosophila species, and so are potential species recognition signals, while the pulse traits may be more important in intra-specific mate choice.
Ishida-Yamamoto, Akemi; Furio, Laetitia; Igawa, Satomi; Honma, Masaru; Tron, Elodie; Malan, Valerie; Murakami, Masamoto; Hovnanian, Alain
Peeling skin syndrome (PSS) type B is a rare recessive genodermatosis characterized by lifelong widespread, reddish peeling of the skin with pruritus. The disease is caused by small-scale mutations in the Corneodesmosin gene (CDSN) leading to premature termination codons. We report for the first time a Japanese case resulting from complete deletion of CDSN. Corneodesmosin was undetectable in the epidermis, and CDSN was unamplifiable by PCR. QMPSF analysis demonstrated deletion of CDSN exons inherited from each parent. Deletion mapping using microsatellite haplotyping, CGH array and PCR analysis established that the genomic deletion spanned 49-72 kb between HCG22 and TCF19, removing CDSN as well as five other genes within the psoriasis susceptibility region 1 (PSORS1) on 6p21.33. This observation widens the spectrum of molecular defects underlying PSS type B and shows that loss of these five genes from the PSORS1 region does not result in an additional cutaneous phenotype. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Hawash, Mohamed B. F.; Andersen, Lee O.; Gasser, Robin B.
Trichuris from françois' leaf monkey, suggesting multiple whipworm species circulating among non-human primates. The genetic and protein distances between pig Trichuris from Denmark and other regions were roughly 9% and 6%, respectively, while Chinese and Ugandan whipworms were more closely related......) suggesting that they represented different species. Trichuris from the olive baboon in US was genetically related to human Trichuris in China, while the other from the hamadryas baboon in Denmark was nearly identical to human Trichuris from Uganda. Baboon-derived Trichuris was genetically distinct from......BACKGROUND: The whipworms Trichuris trichiura and Trichuris suis are two parasitic nematodes of humans and pigs, respectively. Although whipworms in human and non-human primates historically have been referred to as T. trichiura, recent reports suggest that several Trichuris spp. are found...
Snitkin, Evan S; Won, Sarah; Pirani, Ali; Lapp, Zena; Weinstein, Robert A; Lolans, Karen; Hayden, Mary K
Development of effective strategies to limit the proliferation of multidrug-resistant organisms requires a thorough understanding of how such organisms spread among health care facilities. We sought to uncover the chains of transmission underlying a 2008 U.S. regional outbreak of carbapenem-resistant Klebsiella pneumoniae by performing an integrated analysis of genomic and interfacility patient-transfer data. Genomic analysis yielded a high-resolution transmission network that assigned directionality to regional transmission events and discriminated between intra- and interfacility transmission when epidemiologic data were ambiguous or misleading. Examining the genomic transmission network in the context of interfacility patient transfers (patient-sharing networks) supported the role of patient transfers in driving the outbreak, with genomic analysis revealing that a small subset of patient-transfer events was sufficient to explain regional spread. Further integration of the genomic and patient-sharing networks identified one nursing home as an important bridge facility early in the outbreak-a role that was not apparent from analysis of genomic or patient-transfer data alone. Last, we found that when simulating a real-time regional outbreak, our methodology was able to accurately infer the facility at which patients acquired their infections. This approach has the potential to identify facilities with high rates of intra- or interfacility transmission, data that will be useful for triggering targeted interventions to prevent further spread of multidrug-resistant organisms. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Muino, J.M.; Kaufmann, K.; Ham, van R.C.H.J.; Angenent, G.C.; Krajewski, P.
Background In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally
Tran Thi Tuyet, H.; Zwart, M.P.; Phuong, N.T.; Oanh, D.T.H.; Jong, de M.C.M.; Vlak, J.M.
Sequence comparisons of the genomes of white spot syndrome virus (WSSV) strains have identified regions containing variable-length insertions/deletions (i.e. indels). Indel-I and Indel-II, positioned between open reading frames (ORFs) 14/15 and 23/24, respectively, are the largest and the most
Full Text Available Legionella pneumophila is an opportunistic pathogen that can cause a severe pneumonia called Legionnaires' disease. In the environment, L. pneumophila is found in fresh water reservoirs in a large spectrum of environmental conditions, where the bacteria are able to replicate within a variety of protozoan hosts. To survive within eukaryotic cells, L. pneumophila require a type IV secretion system, designated Dot/Icm, that delivers bacterial effector proteins into the host cell cytoplasm. In recent years, a number of Dot/Icm substrate proteins have been identified; however, the function of most of these proteins remains unknown, and it is unclear why the bacterium maintains such a large repertoire of effectors to promote its survival. Here we investigate a region of the L. pneumophila chromosome that displays a high degree of plasticity among four sequenced L. pneumophila strains. Analysis of GC content suggests that several genes encoded in this region were acquired through horizontal gene transfer. Protein translocation studies establish that this region of genomic plasticity encodes for multiple Dot/Icm effectors. Ectopic expression studies in mammalian cells indicate that one of these substrates, a protein called PieA, has unique effector activities. PieA is an effector that can alter lysosome morphology and associates specifically with vacuoles that support L. pneumophila replication. It was determined that the association of PieA with vacuoles containing L. pneumophila requires modifications to the vacuole mediated by other Dot/Icm effectors. Thus, the localization properties of PieA reveal that the Dot/Icm system has the ability to spatially and temporally control the association of an effector with vacuoles containing L. pneumophila through activities mediated by other effector proteins.
Sergeeva, Ekaterina M; Shcherban, Andrey B; Adonina, Irina G; Nesterov, Michail A; Beletsky, Alexey V; Rakitin, Andrey L; Mardanov, Andrey V; Ravin, Nikolai V; Salina, Elena A
The multigene family encoding the 5S rRNA, one of the most important structurally-functional part of the large ribosomal subunit, is an obligate component of all eukaryotic genomes. 5S rDNA has long been a favored target for cytological and phylogenetic studies due to the inherent peculiarities of its structural organization, such as the tandem arrays of repetitive units and their high interspecific divergence. The complex polyploid nature of the genome of bread wheat, Triticum aestivum, and the technically difficult task of sequencing clusters of tandem repeats mean that the detailed organization of extended genomic regions containing 5S rRNA genes remains unclear. This is despite the recent progress made in wheat genomic sequencing. Using pyrosequencing of BAC clones, in this work we studied the organization of two distinct 5S rDNA-tagged regions of the 5BS chromosome of bread wheat. Three BAC-clones containing 5S rDNA were identified in the 5BS chromosome-specific BAC-library of Triticum aestivum. Using the results of pyrosequencing and assembling, we obtained six 5S rDNA- containing contigs with a total length of 140,417 bp, and two sets (pools) of individual 5S rDNA sequences belonging to separate, but closely located genomic regions on the 5BS chromosome. Both regions are characterized by the presence of approximately 70-80 copies of 5S rDNA, however, they are completely different in their structural organization. The first region contained highly diverged short-type 5S rDNA units that were disrupted by multiple insertions of transposable elements. The second region contained the more conserved long-type 5S rDNA, organized as a single tandem array. FISH using probes specific to both 5S rDNA unit types showed differences in the distribution and intensity of signals on the chromosomes of polyploid wheat species and their diploid progenitors. A detailed structural organization of two closely located 5S rDNA-tagged genomic regions on the 5BS chromosome of bread
Full Text Available Legumes contain a variety of phytochemicals derived from the phenylpropanoid pathway that have important effects on human health as well as seed coat color, plant disease resistance and nodulation. However, the information about the genes involved in this important pathway is fragmentary in common bean (Phaseolus vulgaris L.. The objectives of this research were to isolate genes that function in and control the phenylpropanoid pathway in common bean, determine their genomic locations in silico in common bean and soybean, and analyze sequences of the 4CL gene family in two common bean genotypes. Sequences of phenylpropanoid pathway genes available for common bean or other plant species were aligned, and the conserved regions were used to design sequence-specific primers. The PCR products were cloned and sequenced and the gene sequences along with common bean gene-based (g markers were BLASTed against the Glycine max v.1.0 genome and the P. vulgaris v.1.0 (Andean early release genome. In addition, gene sequences were BLASTed against the OAC Rex (Mesoamerican genome sequence assembly. In total, fragments of 46 structural and regulatory phenylpropanoid pathway genes were characterized in this way and placed in silico on common bean and soybean sequence maps. The maps contain over 250 common bean g and SSR (simple sequence repeat markers and identify the positions of more than 60 additional phenylpropanoid pathway gene sequences, plus the putative locations of seed coat color genes. The majority of cloned phenylpropanoid pathway gene sequences were mapped to one location in the common bean genome but had two positions in soybean. The comparison of the genomic maps confirmed previous studies, which show that common bean and soybean share genomic regions, including those containing phenylpropanoid pathway gene sequences, with conserved synteny. Indels identified in the comparison of Andean and Mesoamerican common bean sequences might be used to develop
Doekes, Harmen P; Veerkamp, Roel F; Bijma, Piter; Hiemstra, Sipke J; Windig, Jack J
In recent decades, Holstein-Friesian (HF) selection schemes have undergone profound changes, including the introduction of optimal contribution selection (OCS; around 2000), a major shift in breeding goal composition (around 2000) and the implementation of genomic selection (GS; around 2010). These changes are expected to have influenced genetic diversity trends. Our aim was to evaluate genome-wide and region-specific diversity in HF artificial insemination (AI) bulls in the Dutch-Flemish breeding program from 1986 to 2015. Pedigree and genotype data (~ 75.5 k) of 6280 AI-bulls were used to estimate rates of genome-wide inbreeding and kinship and corresponding effective population sizes. Region-specific inbreeding trends were evaluated using regions of homozygosity (ROH). Changes in observed allele frequencies were compared to those expected under pure drift to identify putative regions under selection. We also investigated the direction of changes in allele frequency over time. Effective population size estimates for the 1986-2015 period ranged from 69 to 102. Two major breakpoints were observed in genome-wide inbreeding and kinship trends. Around 2000, inbreeding and kinship levels temporarily dropped. From 2010 onwards, they steeply increased, with pedigree-based, ROH-based and marker-based inbreeding rates as high as 1.8, 2.1 and 2.8% per generation, respectively. Accumulation of inbreeding varied substantially across the genome. A considerable fraction of markers showed changes in allele frequency that were greater than expected under pure drift. Putative selected regions harboured many quantitative trait loci (QTL) associated to a wide range of traits. In consecutive 5-year periods, allele frequencies changed more often in the same direction than in opposite directions, except when comparing the 1996-2000 and 2001-2005 periods. Genome-wide and region-specific diversity trends reflect major changes in the Dutch-Flemish HF breeding program. Introduction of
Luis E. Rojas; Maritza Reyes; Naivy Pérez-Alonso; María I. Olóriz; Laisyn Posada-Pérez; Bárbara Ocaña; Orelvis Portal; Borys Chong-Pérez; Jorge L. Pérez Pérez
The extraction methods of genomic DNA are usually laborious and hazardous to human health and the environment by the use of organic solvents (chloroform and phenol). In this work a protocol for in situ extraction of genomic DNA by alkaline lysis is validated. It was used in order to amplify regions of DNA in four species of plants and fungi by polymerase chain reaction (PCR). From plant material of Saccharum officinarum L., Carica papaya L. and Digitalis purpurea L. it was possible to extend ...
Bienz, K.; Egger, D.; Troxler, M.; Pasamontes, L.
Transcriptionally active replication complexes bound to smooth membrane vesicles were isolated from poliovirus-infected cells. In electron microscopic, negatively stained preparations, the replication complex appeared as an irregularly shaped, oblong structure attached to several virus-induced vesicles of a rosettelike arrangement. Electron microscopic immunocytochemistry of such preparations demonstrated that the poliovirus replication complex contains the proteins coded by the P2 genomic region (P2 proteins) in a membrane-associated form. In addition, the P2 proteins are also associated with viral RNA, and they can be cross-linked to viral RNA by UV irradiation. Guanidine hydrochloride prevented the P2 proteins from becoming membrane bound but did not change their association with viral RNA. The findings allow the conclusion that the protein 2C or 2C-containing precursor(s) is responsible for the attachment of the viral RNA to the vesicular membrane and for the spatial organization of the replication complex necessary for its proper functioning in viral transcription. A model for the structure of the viral replication complex and for the function of the 2C-containing P2 protein(s) and the vesicular membranes is proposed
Liu, Shenglin; Hansen, Michael M; Jacobsen, Magnus W
We analysed 81 whole genome sequences of threespine sticklebacks from Pacific North America, Greenland and Northern Europe, representing 16 populations. Principal component analysis of nuclear SNPs grouped populations according to geographical location, with Pacific populations being more divergent from each other relative to European and Greenlandic populations. Analysis of mitogenome sequences showed Northern European populations to represent a single phylogeographical lineage, whereas Greenlandic and particularly Pacific populations showed admixture between lineages. We estimated demographic history using a genomewide coalescence with recombination approach. The Pacific populations showed gradual population expansion starting >100 Kya, possibly reflecting persistence in cryptic refuges near the present distributional range, although we do not rule out possible influence of ancient admixture. Sharp population declines ca. 14-15 Kya were suggested to reflect founding of freshwater populations by marine ancestors. In Greenland and Northern Europe, demographic expansion started ca. 20-25 Kya coinciding with the end of the Last Glacial Maximum. In both regions, marine and freshwater populations started to show different demographic trajectories ca. 8-9 Kya, suggesting that this was the time of recolonization. In Northern Europe, this estimate was surprisingly late, but found support in subfossil evidence for presence of several freshwater fish species but not sticklebacks 12 Kya. The results demonstrate distinctly different demographic histories across geographical regions with potential consequences for adaptive processes. They also provide empirical support for previous assumptions about freshwater populations being founded independently from large, coherent marine populations, a key element in the Transporter Hypothesis invoked to explain the widespread occurrence of parallel evolution across freshwater stickleback populations. © 2016 John Wiley & Sons Ltd.
Len J Wade
Full Text Available The rapid progress in rice genotyping must be matched by advances in phenotyping. A better understanding of genetic variation in rice for drought response, root traits, and practical methods for studying them are needed. In this study, the OryzaSNP set (20 diverse genotypes that have been genotyped for SNP markers was phenotyped in a range of field and container studies to study the diversity of rice root growth and response to drought. Of the root traits measured across more than 20 root experiments, root dry weight showed the most stable genotypic performance across studies. The environment (E component had the strongest effect on yield and root traits. We identified genomic regions correlated with root dry weight, percent deep roots, maximum root depth, and grain yield based on a correlation analysis with the phenotypes and aus, indica, or japonica introgression regions using the SNP data. Two genomic regions were identified as hot spots in which root traits and grain yield were co-located; on chromosome 1 (39.7-40.7 Mb and on chromosome 8 (20.3-21.9 Mb. Across experiments, the soil type/ growth medium showed more correlations with plant growth than the container dimensions. Although the correlations among studies and genetic co-location of root traits from a range of study systems points to their potential utility to represent responses in field studies, the best correlations were observed when the two setups had some similar properties. Due to the co-location of the identified genomic regions (from introgression block analysis with QTL for a number of previously reported root and drought traits, these regions are good candidates for detailed characterization to contribute to understanding rice improvement for response to drought. This study also highlights the utility of characterizing a small set of 20 genotypes for root growth, drought response, and related genomic regions.
Full Text Available Abstract Background Mouse limb bud is a prime model to study the regulatory interactions that control vertebrate organogenesis. Major aspects of limb bud development are controlled by feedback loops that define a self-regulatory signalling system. The SHH/GREM1/AER-FGF feedback loop forms the core of this signalling system that operates between the posterior mesenchymal organiser and the ectodermal signalling centre. The BMP antagonist Gremlin1 (GREM1 is a critical node in this system, whose dynamic expression is controlled by BMP, SHH, and FGF signalling and key to normal progression of limb bud development. Previous analysis identified a distant cis-regulatory landscape within the neighbouring Formin1 (Fmn1 locus that is required for Grem1 expression, reminiscent of the genomic landscapes controlling HoxD and Shh expression in limb buds. Results Three highly conserved regions (HMCO1-3 were identified within the previously defined critical genomic region and tested for their ability to regulate Grem1 expression in mouse limb buds. Using a combination of BAC and conventional transgenic approaches, a 9 kb region located ~70 kb downstream of the Grem1 transcription unit was identified. This region, termed Grem1 Regulatory Sequence 1 (GRS1, is able to recapitulate major aspects of Grem1 expression, as it drives expression of a LacZ reporter into the posterior and, to a lesser extent, in the distal-anterior mesenchyme. Crossing the GRS1 transgene into embryos with alterations in the SHH and BMP pathways established that GRS1 depends on SHH and is modulated by BMP signalling, i.e. integrates inputs from these pathways. Chromatin immunoprecipitation revealed interaction of endogenous GLI3 proteins with the core cis-regulatory elements in the GRS1 region. As GLI3 is a mediator of SHH signal transduction, these results indicated that SHH directly controls Grem1 expression through the GRS1 region. Finally, all cis-regulatory regions within the Grem1
Tabas-Madrid, Daniel; Méndez-Vigo, Belén; Arteaga, Noelia; Marcer, Arnald; Pascual-Montano, Alberto; Weigel, Detlef; Xavier Picó, F; Alonso-Blanco, Carlos
Current global change is fueling an interest to understand the genetic and molecular mechanisms of plant adaptation to climate. In particular, altered flowering time is a common strategy for escape from unfavourable climate temperature. In order to determine the genomic bases underlying flowering time adaptation to this climatic factor, we have systematically analysed a collection of 174 highly diverse Arabidopsis thaliana accessions from the Iberian Peninsula. Analyses of 1.88 million single nucleotide polymorphisms provide evidence for a spatially heterogeneous contribution of demographic and adaptive processes to geographic patterns of genetic variation. Mountains appear to be allele dispersal barriers, whereas the relationship between flowering time and temperature depended on the precise temperature range. Environmental genome-wide associations supported an overall genome adaptation to temperature, with 9.4% of the genes showing significant associations. Furthermore, phenotypic genome-wide associations provided a catalogue of candidate genes underlying flowering time variation. Finally, comparison of environmental and phenotypic genome-wide associations identified known (Twin Sister of FT, FRIGIDA-like 1, and Casein Kinase II Beta chain 1) and new (Epithiospecifer Modifier 1 and Voltage-Dependent Anion Channel 5) genes as candidates for adaptation to climate temperature by altered flowering time. Thus, this regional collection provides an excellent resource to address the spatial complexity of climate adaptation in annual plants. © 2018 John Wiley & Sons Ltd.
Full Text Available Considerable progress has been made in the HCV evolutionary analysis, since the software BEAST was released. However, prior information, especially the prior evolutionary rate, which plays a critical role in BEAST analysis, is always difficult to ascertain due to various uncertainties. Providing a proper prior HCV evolutionary rate is thus of great importance.176 full-length sequences of HCV subtype 1a and 144 of 1b were assembled by taking into consideration the balance of the sampling dates and the even dispersion in phylogenetic trees. According to the HCV genomic organization and biological functions, each dataset was partitioned into nine genomic regions and two routinely amplified regions. A uniform prior rate was applied to the BEAST analysis for each region and also the entire ORF. All the obtained posterior rates for 1a are of a magnitude of 10(-3 substitutions/site/year and in a bell-shaped distribution. Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model. This indicates that some of the rates for subtype 1b are less accurate, so they were adjusted by including more sequences to improve the temporal structure.Among the various HCV subtypes and genomic regions, the evolutionary patterns are dissimilar. Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length. By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.
Gregory E. Perry
Full Text Available Resistance to common bacterial blight, caused by Xanthomonas axonopodis pv. phaseoli, in Phaseolus vulgaris is conditioned by several loci on different chromosomes. Previous studies with OAC-Rex, a CBB-resistant, white bean variety of Mesoamerican origin, identified two resistance loci associated with the molecular markers Pv-CTT001 and SU91, on chromosome 4 and 8, respectively. Resistance to CBB is assumed to be derived from an interspecific cross with Phaseolus acutifolius in the pedigree of OAC-Rex. Our current whole genome sequencing effort with OAC-Rex provided the opportunity to compare its genome in the regions associated with CBB resistance with the v1.0 release of the P. vulgaris line G19833, which is a large seeded bean of Andean origin, and (assumed to be CBB susceptible.. In addition, the genomic regions containing SAP6, a marker associated with P. vulgaris-derived CBB-resistance on chromosome 10, were compared. These analyses indicated that gene content was highly conserved between G19833 and OAC-Rex across the regions examined (>80%. However, fifty-nine genes unique to OAC Rex were identified, with resistance gene homologues making up the largest category (10 genes identified. Two unique genes in OAC-Rex located within the SU91 resistance QTL have homology to P. acutifolius ESTs and may be potential sources of CBB resistance. As the genomic sequence assembly of OAC-Rex is completed, we expect that further comparisons between it and the G19833 genome will lead to a greater understanding of CBB resistance in bean.
Full Text Available The androgen receptor (AR is a steroid-activated transcription factor that binds at specific DNA locations and plays a key role in the etiology of prostate cancer. While numerous studies have identified a clear connection between AR binding and expression of target genes for a limited number of loci, high-throughput elucidation of these sites allows for a deeper understanding of the complexities of this process.We have mapped 189 AR occupied regions (ARORs and 1,388 histone H3 acetylation (AcH3 loci to a 3% continuous stretch of human genomic DNA using chromatin immunoprecipitation (ChIP microarray analysis. Of 62 highly reproducible ARORs, 32 (52% were also marked by AcH3. While the number of ARORs detected in prostate cancer cells exceeded the number of nearby DHT-responsive genes, the AcH3 mark defined a subclass of ARORs much more highly associated with such genes -- 12% of the genes flanking AcH3+ARORs were DHT-responsive, compared to only 1% of genes flanking AcH3-ARORs. Most ARORs contained enhancer activities as detected in luciferase reporter assays. Analysis of the AROR sequences, followed by site-directed ChIP, identified binding sites for AR transcriptional coregulators FoxA1, CEBPbeta, NFI and GATA2, which had diverse effects on endogenous AR target gene expression levels in siRNA knockout experiments.We suggest that only some ARORs function under the given physiological conditions, utilizing diverse mechanisms. This diversity points to differential regulation of gene expression by the same transcription factor related to the chromatin structure.
Sahoo, Dipak K; Abeysekara, Nilwala S; Cianzio, Silvia R; Robertson, Alison E; Bhattacharyya, Madan K
Phytophthora sojae Kaufmann and Gerdemann, which causes Phytophthora root rot, is a widespread pathogen that limits soybean production worldwide. Development of Phytophthora resistant cultivars carrying Phytophthora resistance Rps genes is a cost-effective approach in controlling this disease. For this mapping study of a novel Rps gene, 290 recombinant inbred lines (RILs) (F7 families) were developed by crossing the P. sojae resistant cultivar PI399036 with the P. sojae susceptible AR2 line, and were phenotyped for responses to a mixture of three P. sojae isolates that overcome most of the known Rps genes. Of these 290 RILs, 130 were homozygous resistant, 12 heterzygous and segregating for Phytophthora resistance, and 148 were recessive homozygous and susceptible. From this population, 59 RILs homozygous for Phytophthora sojae resistance and 61 susceptible to a mixture of P. sojae isolates R17 and Val12-11 or P7074 that overcome resistance encoded by known Rps genes mapped to Chromosome 18 were selected for mapping novel Rps gene. A single gene accounted for the 1:1 segregation of resistance and susceptibility among the RILs. The gene encoding the Phytophthora resistance mapped to a 5.8 cM interval between the SSR markers BARCSOYSSR_18_1840 and Sat_064 located in the lower arm of Chromosome 18. The gene is mapped 2.2 cM proximal to the NBSRps4/6-like sequence that was reported to co-segregate with the Phytophthora resistance genes Rps4 and Rps6. The gene is mapped to a highly recombinogenic, gene-rich genomic region carrying several nucleotide binding site-leucine rich repeat (NBS-LRR)-like genes. We named this novel gene as Rps12, which is expected to be an invaluable resource in breeding soybeans for Phytophthora resistance.
Dipak K Sahoo
Full Text Available Phytophthora sojae Kaufmann and Gerdemann, which causes Phytophthora root rot, is a widespread pathogen that limits soybean production worldwide. Development of Phytophthora resistant cultivars carrying Phytophthora resistance Rps genes is a cost-effective approach in controlling this disease. For this mapping study of a novel Rps gene, 290 recombinant inbred lines (RILs (F7 families were developed by crossing the P. sojae resistant cultivar PI399036 with the P. sojae susceptible AR2 line, and were phenotyped for responses to a mixture of three P. sojae isolates that overcome most of the known Rps genes. Of these 290 RILs, 130 were homozygous resistant, 12 heterzygous and segregating for Phytophthora resistance, and 148 were recessive homozygous and susceptible. From this population, 59 RILs homozygous for Phytophthora sojae resistance and 61 susceptible to a mixture of P. sojae isolates R17 and Val12-11 or P7074 that overcome resistance encoded by known Rps genes mapped to Chromosome 18 were selected for mapping novel Rps gene. A single gene accounted for the 1:1 segregation of resistance and susceptibility among the RILs. The gene encoding the Phytophthora resistance mapped to a 5.8 cM interval between the SSR markers BARCSOYSSR_18_1840 and Sat_064 located in the lower arm of Chromosome 18. The gene is mapped 2.2 cM proximal to the NBSRps4/6-like sequence that was reported to co-segregate with the Phytophthora resistance genes Rps4 and Rps6. The gene is mapped to a highly recombinogenic, gene-rich genomic region carrying several nucleotide binding site-leucine rich repeat (NBS-LRR-like genes. We named this novel gene as Rps12, which is expected to be an invaluable resource in breeding soybeans for Phytophthora resistance.
Full Text Available Complete mitochondrial (mt genome sequences with duplicate control regions (CRs have been detected in various animal species. In Testudines, duplicate mtCRs have been reported in the mtDNA of the Asian big-headed turtle, Platysternon megacephalum, which has three living subspecies. However, the evolutionary pattern of these CRs remains unclear. In this study, we report the completed sequences of duplicate CRs from 20 individuals belonging to three subspecies of this turtle and discuss the micro-evolutionary analysis of the evolution of duplicate CRs. Genetic distances calculated with MEGA 4.1 using the complete duplicate CR sequences revealed that within turtle subspecies, genetic distances between orthologous copies from different individuals were 0.63% for CR1 and 1.2% for CR2app:addword:respectively, and the average distance between paralogous copies of CR1 and CR2 was 4.8%. Phylogenetic relationships were reconstructed from the CR sequences, excluding the variable number of tandem repeats (VNTRs at the 3' end using three methods: neighbor-joining, maximum likelihood algorithm, and Bayesian inference. These data show that any two CRs within individuals were more genetically distant from orthologous genes in different individuals within the same subspecies. This suggests independent evolution of the two mtCRs within each P. megacephalum subspecies. Reconstruction of separate phylogenetic trees using different CR components (TAS, CD, CSB, and VNTRs suggested the role of recombination in the evolution of duplicate CRs. Consequently, recombination events were detected using RDP software with break points at ≈290 bp and ≈1,080 bp. Based on these results, we hypothesize that duplicate CRs in P. megacephalum originated from heterological ancestral recombination of mtDNA. Subsequent recombination could have resulted in homogenization during independent evolutionary events, thus maintaining the functions of duplicate CRs in the mtDNA of P
Hejnová, Jana; Pages, Delphine; Rusniok, Ch.; Glaser, P.; Šebo, Peter; Buchrieser, C.
Roč. 296, - (2006), s. 541-546 ISSN 1438-4221 Institutional research plan: CEZ:AV0Z50200510 Keywords : escherichia coli * commensal * genome comparison Subject RIV: EE - Microbiology, Virology Impact factor: 2.760, year: 2006
Urantowka, Adam Dawid; Hajduk, Kacper; Kosowska, Barbara
Amazona barbadensis is an endangered species of parrot living in northern coastal Venezuela and in several Caribbean islands. In this study, we sequenced full mitochondrial genome of the considered species. The total length of the mitogenome was 18,983 bp and contained 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, duplicated control region, and degenerate copies of ND6 and tRNA (Glu) genes. High degree of identity between two copies of control region suggests their coincident evolution and functionality. Comparative analysis of both the control region sequences from four Amazona species revealed their 89.1% identity over a region of 1300 bp and indicates the presence of distinctive parts of two control region copies.
Eoche-Bosy, D; Gautier, M; Esquibet, M; Legeai, F; Bretaudeau, A; Bouchez, O; Fournet, S; Grenier, E; Montarry, J
Improving resistance durability involves to be able to predict the adaptation speed of pathogen populations. Identifying the genetic bases of pathogen adaptation to plant resistances is a useful step to better understand and anticipate this phenomenon. Globodera pallida is a major pest of potato crop for which a resistance QTL, GpaV vrn , has been identified in Solanum vernei. However, its durability is threatened as G. pallida populations are able to adapt to the resistance in few generations. The aim of this study was to investigate the genomic regions involved in the resistance breakdown by coupling experimental evolution and high-density genome scan. We performed a whole-genome resequencing of pools of individuals (Pool-Seq) belonging to G. pallida lineages derived from two independent populations having experimentally evolved on susceptible and resistant potato cultivars. About 1.6 million SNPs were used to perform the genome scan using a recent model testing for adaptive differentiation and association to population-specific covariables. We identified 275 outliers and 31 of them, which also showed a significant reduction in diversity in adapted lineages, were investigated for their genic environment. Some candidate genomic regions contained genes putatively encoding effectors and were enriched in SPRYSECs, known in cyst nematodes to be involved in pathogenicity and in (a)virulence. Validated candidate SNPs will provide a useful molecular tool to follow frequencies of virulence alleles in natural G. pallida populations and define efficient strategies of use of potato resistances maximizing their durability. © 2017 John Wiley & Sons Ltd.
Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
Full Text Available Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads. This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads. Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
Full Text Available The migration of maize from tropical to temperate climates was accompanied by a dramatic evolution in flowering time. To gain insight into the genetic architecture of this adaptive trait, we conducted a 50K SNP-based genome-wide association and diversity investigation on a panel of tropical and temperate American and European representatives. Eighteen genomic regions were associated with flowering time. The number of early alleles cumulated along these regions was highly correlated with flowering time. Polymorphism in the vicinity of the ZCN8 gene, which is the closest maize homologue to Arabidopsis major flowering time (FT gene, had the strongest effect. This polymorphism is in the vicinity of the causal factor of Vgt2 QTL. Diversity was lower, whereas differentiation and LD were higher for associated loci compared to the rest of the genome, which is consistent with selection acting on flowering time during maize migration. Selection tests also revealed supplementary loci that were highly differentiated among groups and not associated with flowering time in our panel, whereas they were in other linkage-based studies. This suggests that allele fixation led to a lack of statistical power when structure and relatedness were taken into account in a linear mixed model. Complementary designs and analysis methods are necessary to unravel the architecture of complex traits. Based on linkage disequilibrium (LD estimates corrected for population structure, we concluded that the number of SNPs genotyped should be at least doubled to capture all QTLs contributing to the genetic architecture of polygenic traits in this panel. These results show that maize flowering time is controlled by numerous QTLs of small additive effect and that strong polygenic selection occurred under cool climatic conditions. They should contribute to more efficient genomic predictions of flowering time and facilitate the dissemination of diverse maize genetic resources under a wide
Pandey, Manish K; Khan, Aamir W; Singh, Vikas K; Vishwakarma, Manish K; Shasidhar, Yaduru; Kumar, Vinay; Garg, Vanika; Bhat, Ramesh S; Chitikineni, Annapurna; Janila, Pasupuleti; Guo, Baozhu; Varshney, Rajeev K
Rust and late leaf spot (LLS) are the two major foliar fungal diseases in groundnut, and their co-occurrence leads to significant yield loss in addition to the deterioration of fodder quality. To identify candidate genomic regions controlling resistance to rust and LLS, whole-genome resequencing (WGRS)-based approach referred as 'QTL-seq' was deployed. A total of 231.67 Gb raw and 192.10 Gb of clean sequence data were generated through WGRS of resistant parent and the resistant and susceptible bulks for rust and LLS. Sequence analysis of bulks for rust and LLS with reference-guided resistant parent assembly identified 3136 single-nucleotide polymorphisms (SNPs) for rust and 66 SNPs for LLS with the read depth of ≥7 in the identified genomic region on pseudomolecule A03. Detailed analysis identified 30 nonsynonymous SNPs affecting 25 candidate genes for rust resistance, while 14 intronic and three synonymous SNPs affecting nine candidate genes for LLS resistance. Subsequently, allele-specific diagnostic markers were identified for three SNPs for rust resistance and one SNP for LLS resistance. Genotyping of one RIL population (TAG 24 × GPBD 4) with these four diagnostic markers revealed higher phenotypic variation for these two diseases. These results suggest usefulness of QTL-seq approach in precise and rapid identification of candidate genomic regions and development of diagnostic markers for breeding applications. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
TOZAKI, Teruaki; KIKUCHI, Mio; KAKOI, Hironaga; HIROTA, Kei-ichi; NAGATA, Shun-ichi
ABSTRACT Body weight is an important trait to confirm growth and development in humans and animals. In Thoroughbred racehorses, it is measured in the postnatal, training, and racing periods to evaluate growth and training degrees. The body weight of mature Thoroughbred racehorses generally ranges from 400 to 600 kg, and this broad range is likely influenced by environmental and genetic factors. Therefore, a genome-wide association study (GWAS) using the Equine SNP70 BeadChip was performed to identify the genomic regions associated with body weight in Japanese Thoroughbred racehorses using 851 individuals. The average body weight of these horses was 473.9 kg (standard deviation: 28.0) at the age of 3, and GWAS identified statistically significant SNPs on chromosomes 3 (BIEC2_808466, P=2.32E-14), 9 (BIEC2_1105503, P=1.03E-7), 15 (BIEC2_322669, P=9.50E-6), and 18 (BIEC2_417274, P=1.44E-14), which were associated with body weight as a quantitative trait. The genomic regions on chromosomes 3, 9, 15, and 18 included ligand-dependent nuclear receptor compressor-like protein (LCORL), zinc finger and AT hook domain containing (ZFAT), tribbles pseudokinase 2 (TRIB2), and myostatin (MSTN), respectively, as candidate genes. LCORL and ZFAT are associated with withers height in horses, whereas MSTN affects muscle mass. Thus, the genomic regions identified in this study seem to affect the body weight of Thoroughbred racehorses. Although this information is useful for breeding and growth management of the horses, the production of genetically modified animals and gene doping (abuse/misuse of gene therapy) should be prohibited to maintain horse racing integrity. PMID:29270069
Vrljicak, Pavle; Tao, Shijie; Varshney, Gaurav K; Quach, Helen Ngoc Bao; Joshi, Adita; LaFave, Matthew C; Burgess, Shawn M; Sampath, Karuna
DNA transposons and retroviruses are important transgenic tools for genome engineering. An important consideration affecting the choice of transgenic vector is their insertion site preferences. Previous large-scale analyses of Ds transposon integration sites in plants were done on the basis of reporter gene expression or germ-line transmission, making it difficult to discern vertebrate integration preferences. Here, we compare over 1300 Ds transposon integration sites in zebrafish with Tol2 transposon and retroviral integration sites. Genome-wide analysis shows that Ds integration sites in the presence or absence of marker selection are remarkably similar and distributed throughout the genome. No strict motif was found, but a preference for structural features in the target DNA associated with DNA flexibility (Twist, Tilt, Rise, Roll, Shift, and Slide) was observed. Remarkably, this feature is also found in transposon and retroviral integrations in maize and mouse cells. Our findings show that structural features influence the integration of heterologous DNA in genomes, and have implications for targeted genome engineering. Copyright © 2016 Vrljicak et al.
J Stephen Dumler
Full Text Available Anaplasma phagocytophilum, an obligate intracellular prokaryote, infects neutrophils and alters cardinal functions via reprogrammed transcription. Large contiguous regions of neutrophil chromosomes are differentially expressed during infection. Secreted A. phagocytophilum effector AnkA transits into the neutrophil or granulocyte nucleus to complex with DNA in heterochromatin across all chromosomes. AnkA binds to gene promoters to dampen cis-transcription and also has features of matrix attachment region (MAR-binding proteins that regulate three-dimensional chromatin architecture and coordinate transcriptional programs encoded in topologically-associated chromatin domains. We hypothesize that identification of additional AnkA binding sites will better delineate how A. phagocytophilum infection results in reprogramming of the neutrophil genome. Using AnkA-binding ChIP-seq, we showed that AnkA binds broadly throughout all chromosomes in a reproducible pattern, especially at: i intergenic regions predicted to be matrix attachment regions (MARs; ii within predicted lamina-associated domains; and iii at promoters ≤3,000 bp upstream of transcriptional start sites. These findings provide genome-wide support for AnkA as a regulator of cis-gene transcription. Moreover, the dominant mark of AnkA in distal intergenic regions known to be AT-enriched, coupled with frequent enrichment in the nuclear lamina, provides strong support for its role as a MAR-binding protein and genome re-organizer. AnkA must be considered a prime candidate to promote neutrophil reprogramming and subsequent functional changes that belie improved microbial fitness and pathogenicity.
Full Text Available The genus Helicobacter is a group of Gram-negative, helical-shaped pathogens consisting of at least 36 bacterial species. Helicobacter pylori (H. pylori, infecting more than 50% of the human population, is considered as the major cause of gastritis, peptic ulcer, and gastric cancer. However, the genetic underpinnings of H. pylori that are responsible for its large scale epidemic and gastrointestinal environment adaption within human beings remain unclear. Core-pan genome analysis was performed among 75 representative H. pylori and 24 non-pylori Helicobacter genomes. There were 1173 conserved protein families of H. pylori and 673 of all 99 Helicobacter genus strains. We found 79 genome unique regions, a total of 202,359bp, shared by at least 80% of the H. pylori but lacked in non-pylori Helicobacter species. The operons, genes, and sRNAs within the H. pylori unique regions were considered as potential ones associated with its pathogenicity and adaptability, and the relativity among them has been partially confirmed by functional annotation analysis. However, functions of at least 54 genes and 10 sRNAs were still unclear. Our analysis of protein-protein interaction showed that 30 genes within them may have the cooperation relationship.
Reddy, Kondreddy Eswar; Thu, Ha Thi; Yoo, Mi Sun; Ramya, Mummadireddy; Reddy, Bheemireddy Anjana; Lien, Nguyen Thi Kim; Trang, Nguyen Thi Phuong; Duong, Bui Thi Thuy; Lee, Hyun-Jeong; Kang, Seung-Won; Quyen, Dong Van
Sacbrood virus (SBV) is one of the most common viral infections of honeybees. The entire genome sequence for nine SBV infecting honeybees, Apis cerana and Apis mellifera, in Vietnam, namely AcSBV-Viet1, AcSBV-Viet2, AcSBV-Viet3, AmSBV-Viet4, AcSBV-Viet5, AmSBV-Viet6, AcSBV-Viet7, AcSBV-Viet8, and AcSBV-Viet9, was determined. These sequences were aligned with seven previously reported complete genome sequences of SBV from other countries, and various genomic regions were compared. The Vietnamese SBVs (VN-SBVs) shared 91-99% identity with each other, and shared 89-94% identity with strains from other countries. The open reading frames (ORFs) of the VN-SBV genomes differed greatly from those of SBVs from other countries, especially in their VP1 sequences. The AmSBV-Viet6 and AcSBV-Viet9 genome encodes 17 more amino acids within this region than the other VN-SBVs. In a phylogenetic analysis, the strains AmSBV-Viet4, AcSBV-Viet2, and AcSBV-Viet3 were clustered in group with AmSBV-UK, AmSBV-Kor21, and AmSBV-Kor19 strains. Whereas, the strains AmSBV-Viet6 and AcSBV-Viet7 clustered separately with the AcSBV strains from Korea and AcSBV-VietSBM2. And the strains AcSBV-Viet8, AcSBV-Viet1, AcSBV-Viet5, and AcSBV-Viet9 clustered with the AcSBV-India, AcSBV-Kor and AcSBV-VietSBM2. In a Simplot graph, the VN-SBVs diverged stronger in their ORF regions than in their 5' or 3' untranslated regions. The VN-SBVs possess genetic characteristics which are more similar to the Asian AcSBV strains than to AmSBV-UK strain. Taken together, our data indicate that host specificity, geographic distance, and viral cross-infections between different bee species may explain the genetic diversity among the VN-SBVs in A. cerana and A. mellifera and other SBV strains. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.
Delfin, Frederick; Min-Shan Ko, Albert; Li, Mingkun; Gunnarsdóttir, Ellen D; Tabbada, Kristina A; Salvador, Jazelyn M; Calacal, Gayvelline C; Sagum, Minerva S; Datar, Francisco A; Padilla, Sabino G; De Ungria, Maria Corazon A; Stoneking, Mark
The Philippines is a strategic point in the Asia-Pacific region for the study of human diversity, history and origins, as it is a cross-road for human migrations and consequently exhibits enormous ethnolinguistic diversity. Following on a previous in-depth study of Y-chromosome variation, here we provide new insights into the maternal genetic history of Filipino ethnolinguistic groups by surveying complete mitochondrial DNA (mtDNA) genomes from a total of 14 groups (11 groups in this study and 3 groups previously published) including previously published mtDNA hypervariable segment (HVS) data from Filipino regional center groups. Comparison of HVS data indicate genetic differences between ethnolinguistic and regional center groups. The complete mtDNA genomes of 14 ethnolinguistic groups reveal genetic aspects consistent with the Y-chromosome, namely: diversity and heterogeneity of groups, no support for a simple dichotomy between Negrito and non-Negrito groups, and different genetic affinities with Asia-Pacific groups that are both ancient and recent. Although some mtDNA haplogroups can be associated with the Austronesian expansion, there are others that associate with South Asia, Near Oceania and Australia that are consistent with a southern migration route for ethnolinguistic group ancestors into the Asia-Pacific, with a timeline that overlaps with the initial colonization of the Asia-Pacific region, the initial colonization of the Philippines and a possible separate post-colonization migration into the Philippine archipelago. PMID:23756438
Nakajima, Masahiro; Takahashi, Atsushi; Kou, Ikuyo; Rodriguez-Fontenla, Cristina; Gomez-Reino, Juan J.; Furuichi, Tatsuya; Dai, Jin; Sudo, Akihiro; Uchida, Atsumasa; Fukui, Naoshi; Kubo, Michiaki; Kamatani, Naoyuki; Tsunoda, Tatsuhiko; Malizos, Konstantinos N.; Tsezou, Aspasia; Gonzalez, Antonio; Nakamura, Yusuke; Ikegawa, Shiro
Osteoarthritis (OA) is a common disease that has a definite genetic component. Only a few OA susceptibility genes that have definite functional evidence and replication of association have been reported, however. Through a genome-wide association study and a replication using a total of ∼4,800 Japanese subjects, we identified two single nucleotide polymorphisms (SNPs) (rs7775228 and rs10947262) associated with susceptibility to knee OA. The two SNPs were in a region containing HLA class II/III genes and their association reached genome-wide significance (combined P = 2.43×10−8 for rs7775228 and 6.73×10−8 for rs10947262). Our results suggest that immunologic mechanism is implicated in the etiology of OA. PMID:20305777
The aim of the present study was to examine the genomic relationship among 112 Actinobacillus pleuropneumoniae serotype 2 strains obtained throughout Europe and North America. HindIII ribotyping of the strains resulted in five ribotypes of high similarity (87-98%). Sequence analysis of the riboso......The aim of the present study was to examine the genomic relationship among 112 Actinobacillus pleuropneumoniae serotype 2 strains obtained throughout Europe and North America. HindIII ribotyping of the strains resulted in five ribotypes of high similarity (87-98%). Sequence analysis...... of the ribosomal intergenic region of strains representing each ribotype and each country showed no differences. A common ribotype was further characterized by PFGE of 12 strains representing all countries. The resultant five PFGE patterns of European strains showed a similarity of more than 91%, to which the two...
Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection.
Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter
A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set
Commercial and experimental genetic resources were used to investigate genetic pleiotropic factors that influence age at puberty, litter-size and reproductive longevity. The phenotypes were complemented by high-density genotyping and whole genome and RNA sequencing. The SNPs from Porcine SNP60 BeadA...
Full Text Available Abstract Background Sequence data and other characters from mitochondrial genomes (gene translocations, secondary structure of RNA molecules are useful in phylogenetic studies among metazoan animals from population to phylum level. Moreover, the comparison of complete mitochondrial sequences gives valuable information about the evolution of small genomes, e.g. about different mechanisms of gene translocation, gene duplication and gene loss, or concerning nucleotide frequency biases. The Peracarida (gammarids, isopods, etc. comprise about 21,000 species of crustaceans, living in many environments from deep sea floor to arid terrestrial habitats. Ligia oceanica is a terrestrial isopod living at rocky seashores of the european North Sea and Atlantic coastlines. Results The study reveals the first complete mitochondrial DNA sequence from a peracarid crustacean. The mitochondrial genome of Ligia oceanica is a circular double-stranded DNA molecule, with a size of 15,289 bp. It shows several changes in mitochondrial gene order compared to other crustacean species. An overview about mitochondrial gene order of all crustacean taxa yet sequenced is also presented. The largest non-coding part (the putative mitochondrial control region of the mitochondrial genome of Ligia oceanica is unexpectedly not AT-rich compared to the remainder of the genome. It bears two repeat regions (4× 10 bp and 3× 64 bp, and a GC-rich hairpin-like secondary structure. Some of the transfer RNAs show secondary structures which derive from the usual cloverleaf pattern. While some tRNA genes are putative targets for RNA editing, trnR could not be localized at all. Conclusion Gene order is not conserved among Peracarida, not even among isopods. The two isopod species Ligia oceanica and Idotea baltica show a similarly derived gene order, compared to the arthropod ground pattern and to the amphipod Parhyale hawaiiensis, suggesting that most of the translocation events were already
MacArthur, Stewart; Li, Xiao-Yong; Li, Jingyi; Brown, James B.; Chu, Hou Cheng; Zeng, Lucy; Grondona, Brandi P.; Hechmer, Aaron; Simirenko, Lisa; Keranen, Soile V.E.; Knowles, David W.; Stapleton, Mark; Bickel, Peter; Biggin, Mark D.; Eisen, Michael B.
BACKGROUND: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. RESULTS: Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of function and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. CONCLUSIONS: It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.
Ignatieva, Elena V; Levitsky, Victor G; Yudin, Nikolay S; Moshkin, Mikhail P; Kolchanov, Nikolay A
The molecular mechanism of olfactory cognition is very complicated. Olfactory cognition is initiated by olfactory receptor proteins (odorant receptors), which are activated by olfactory stimuli (ligands). Olfactory receptors are the initial player in the signal transduction cascade producing a nerve impulse, which is transmitted to the brain. The sensitivity to a particular ligand depends on the expression level of multiple proteins involved in the process of olfactory cognition: olfactory receptor proteins, proteins that participate in signal transduction cascade, etc. The expression level of each gene is controlled by its regulatory regions, and especially, by the promoter [a region of DNA about 100-1000 base pairs long located upstream of the transcription start site (TSS)]. We analyzed single nucleotide polymorphisms using human whole-genome data from the 1000 Genomes Project and revealed an extremely high level of single nucleotide polymorphisms in promoter regions of olfactory receptor genes and HLA genes. We hypothesized that the high level of polymorphisms in olfactory receptor promoters was responsible for the diversity in regulatory mechanisms controlling the expression levels of olfactory receptor proteins. Such diversity of regulatory mechanisms may cause the great variability of olfactory cognition of numerous environmental olfactory stimuli perceived by human beings (air pollutants, human body odors, odors in culinary etc.). In turn, this variability may provide a wide range of emotional and behavioral reactions related to the vast variety of olfactory stimuli.
Elena V. Ignatieva
Full Text Available The molecular mechanism of olfactory cognition is very complicated. Olfactory cognition is initiated by olfactory receptor proteins (odorant receptors, which are activated by olfactory stimuli (ligands. Olfactory receptors are the initial player in the signal transduction cascade producing a nerve impulse, which is transmitted to the brain. The sensitivity to a particular ligand depends on the expression level of multiple proteins involved in the process of olfactory cognition: olfactory receptor proteins, proteins that participate in signal transduction cascade, etc. The expression level of each gene is controlled by its regulatory regions, and especially, by the promoter (a region of DNA about 100–1000 base pairs long located upstream of the transcription start site. We analyzed single nucleotide polymorphisms using human whole-genome data from the 1000 Genomes Project and revealed an extremely high level of single nucleotide polymorphisms in promoter regions of olfactory receptor genes and HLA genes. We hypothesized that the high level of polymorphisms in olfactory receptor promoters was responsible for the diversity in regulatory mechanisms controlling the expression levels of olfactory receptor proteins. Such diversity of regulatory mechanisms may cause the great variability of olfactory cognition of numerous environmental olfactory stimuli perceived by human beings (air pollutants, human body odors, odors in culinary etc.. In turn, this variability may provide a wide range of emotional and behavioral reactions related to the vast variety of olfactory stimuli.
Lutz, Vanessa; Stratz, Patrick; Preuß, Siegfried; Tetens, Jens; Grashorn, Michael A; Bessei, Werner; Bennewitz, Jörn
Feather pecking and aggressive pecking in laying hens are serious economic and welfare issues. In spite of extensive research on feather pecking during the last decades, the motivation for this behavior is still not clear. A small to moderate heritability has frequently been reported for these traits. Recently, we identified several single-nucleotide polymorphisms (SNPs) associated with feather pecking by mapping selection signatures in two divergent feather pecking lines. Here, we performed a genome-wide association analysis (GWAS) for feather pecking and aggressive pecking behavior, then combined the results with those from the recent selection signature experiment, and linked them to those obtained from a differential gene expression study. A large F2 cross of 960 F2 hens was generated using the divergent lines as founders. Hens were phenotyped for feather pecks delivered (FPD), aggressive pecks delivered (APD), and aggressive pecks received (APR). Individuals were genotyped with the Illumina 60K chicken Infinium iSelect chip. After data filtering, 29,376 SNPs remained for analyses. Single-marker GWAS was performed using a Poisson model. The results were combined with those from the selection signature experiment using Fisher's combined probability test. Numerous significant SNPs were identified for all traits but with low false discovery rates. Nearly all significant SNPs were located in clusters that spanned a maximum of 3 Mb and included at least two significant SNPs. For FPD, four clusters were identified, which increased to 13 based on the meta-analysis (FPD meta ). Seven clusters were identified for APD and three for APR. Eight genes (of the 750 investigated genes located in the FPD meta clusters) were significantly differentially-expressed in the brain of hens from both lines. One gene, SLC12A9, and the positional candidate gene for APD, GNG2, may be linked to the monomanine signaling pathway, which is involved in feather pecking and aggressive behavior
Dinsmore, P K; Klaenhammer, T R
A spontaneous mutant of the lactococcal phage phi31 that is insensitive to the phage defense mechanism AbiA was characterized in an effort to identify the phage factor(s) involved in sensitivity of phi31 to AbiA. A point mutation was localized in the genome of the AbiA-insensitive phage (phi31A) by heteroduplex analysis of a 9-kb region. The mutation (G to T) was within a 738-bp open reading frame (ORF245) and resulted in an arginine-to-leucine change in the predicted amino acid sequence of t...
Kakisaka, Michinori; Yamada, Kazunori; Yamaji-Hasegawa, Akiko; Kobayashi, Toshihide; Aida, Yoko
To be incorporated into progeny virions, the viral genome must be transported to the inner leaflet of the plasma membrane (PM) and accumulate there. Some viruses utilize lipid components to assemble at the PM. For example, simian virus 40 (SV40) targets the ganglioside GM1 and human immunodeficiency virus type 1 (HIV-1) utilizes phosphatidylinositol (4,5) bisphosphate [PI(4,5)P2]. Recent studies clearly indicate that Rab11-mediated recycling endosomes are required for influenza A virus (IAV) trafficking of vRNPs to the PM but it remains unclear how IAV vRNP localized or accumulate underneath the PM for viral genome incorporation into progeny virions. In this study, we found that the second intrinsically disordered region (IDR2) of NP regulates two binding steps involved in viral genome packaging. First, IDR2 facilitates NP oligomer binding to viral RNA to form vRNP. Secondly, vRNP assemble by interacting with PI(4,5)P2 at the PM via IDR2. These findings suggest that PI(4,5)P2 functions as the determinant of vRNP accumulation at the PM. Copyright © 2016 Elsevier Inc. All rights reserved.
Sharp, Andrew J.; Hansen, Sierra; Selzer, Rebecca R.; Cheng, Ze; Regan, Regina; Hurst, Jane A.; Stewart, Helen; Price, Sue M.; Blair, Edward; Hennekam, Raoul C.; Fitzpatrick, Carrie A.; Segraves, Rick; Richmond, Todd A.; Guiver, Cheryl; Albertson, Donna G.; Pinkel, Daniel; Eis, Peggy S.; Schwartz, Stuart; Knight, Samantha J. L.; Eichler, Evan E.
Genomic disorders are characterized by the presence of flanking segmental duplications that predispose these regions to recurrent rearrangement. Based on the duplication architecture of the genome, we investigated 130 regions that we hypothesized as candidates for previously undescribed genomic
Bai, Yongsheng; Kinne, Jeff; Donham, Brandon; Jiang, Feng; Ding, Lizhong; Hassler, Justin R; Kaufman, Randal J
Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bioinformatics approaches in conjunction with next-generation technologies. During endoplasmic reticulum (ER) stress, the gene encoding the RNase Ire1α is known to splice out a short 26 nt region from the mRNA of the transcription factor Xbp1 non-canonically within the cytosol. This causes an open reading frame-shift that induces expression of many downstream genes in reaction to ER stress as part of the unfolded protein response (UPR). We previously published an algorithm termed "Read-Split-Walk" (RSW) to identify non-canonical splicing regions using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygote and knockout mouse embryonic fibroblast cell lines. In this study, we have developed an improved algorithm "Read-Split-Run" (RSR) for detecting genome-wide Ire1α-targeted genes with non-canonical spliced regions at a faster speed. We applied the RSR algorithm using different combinations of several parameters to the previously RSW tested mouse embryonic fibroblast cells (MEF) and the human Encyclopedia of DNA Elements (ENCODE) RNA-Seq data. We also compared the performance of RSR with two other alternative splicing events identification tools (TopHat (Trapnell et al., Bioinformatics 25:1105-1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012)) utilizing the context of the spliced Xbp1 mRNA as a positive control in the data sets we identified it to be the top cleavage target present in Ire1α (+/-) but absent in Ire1α (-/-) MEF samples and this comparison was also extended to human ENCODE RNA-Seq data. Proof of principle came in our results by the fact that the 26 nt non-conventional splice site in Xbp1 was detected as the top hit by our new RSR
The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications.
Lagman, David; Ocampo Daza, Daniel; Widmark, Jenny; Abalo, Xesús M; Sundström, Görel; Larhammar, Dan
Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the evolution of vision and the
Gundogdu, Ozan; da Silva, Daiani T; Mohammad, Banaz; Elmi, Abdi; Wren, Brendan W; van Vliet, Arnoud H M; Dorrell, Nick
Campylobacter jejuni is the leading cause of bacterial foodborne diarrhoeal disease worldwide. Despite the microaerophilic nature of the bacterium, C. jejuni can survive the atmospheric oxygen conditions in the environment. Bacteria that can survive either within a host or in the environment like C. jejuni require variable responses to survive the stresses associated with exposure to different levels of reactive oxygen species. The MarR-type transcriptional regulators RrpA and RrpB have recently been shown to play a role in controlling both the C. jejuni oxidative and aerobic stress responses. Analysis of 3,746 C. jejuni and 486 C. coli genome sequences showed that whilst rrpA is present in over 99% of C. jejuni strains, the presence of rrpB is restricted and appears to correlate with specific MLST clonal complexes (predominantly ST-21 and ST-61). C. coli strains in contrast lack both rrpA and rrpB . In C. jejuni rrpB + strains, the rrpB gene is located within a variable genomic region containing the IF subtype of the type I Restriction-Modification ( hsd ) system, whilst this variable genomic region in C. jejuni rrpB - strains contains the IAB subtype hsd system and not the rrpB gene. C. jejuni rrpB - strains exhibit greater resistance to peroxide and aerobic stress than C. jejuni rrpB + strains. Inactivation of rrpA resulted in increased sensitivity to peroxide stress in rrpB + strains, but not in rrpB - strains. Mutation of rrpA resulted in reduced killing of Galleria mellonella larvae and enhanced biofilm formation independent of rrpB status. The oxidative and aerobic stress responses of rrpB - and rrpB + strains suggest adaptation of C. jejuni within different hosts and niches that can be linked to specific MLST clonal complexes.
Qian, Chaoju; Wang, Yuanxiu; Guo, Zhichun; Yang, Jianke; Kan, Xianzhao
The circular mitochondrial genome of Alauda arvensis is 17,018 bp in length, containing 13 protein-coding genes (PCGs), 2 ribosomal RNA genes, 22 transfer RNA (tRNA) genes, and 2 extensive heteroplasmic control regions. All of the genes encoded on the H-strand, with the exceptions of one PCG (nad6) and eight tRNA genes (tRNA(Gln), tRNA(Ala), tRNA(Asn), tRNA(Cys), tRNA(Tyr), tRNA(Ser(UCN)), tRNA(Pro), and tRNA(Glu)), as found in other birds' mitochondrial genomes. All of these PCGs are initiated with ATG, while stopped by six types of stop codons. All tRNA genes have the potential to fold into typical clover-leaf structure. Two extensive heteroplasmic control regions were found, and more interestingly, a minisatellite of 37 nucleotides (5'-TCAATCCCATTGATTTCATTATATTAGTATAAAGAAA-3') with 6 tandem repeats was detected at the end of CR2.
Wolters, Anne-Marie A; Caro, Myluska; Dong, Shufang; Finkers, Richard; Gao, Jianchang; Visser, Richard G F; Wang, Xiaoxuan; Du, Yongchen; Bai, Yuling
A chromosomal inversion associated with the tomato Ty - 2 gene for TYLCV resistance is the cause of severe suppression of recombination in a tomato Ty - 2 introgression line. Among tomato and its wild relatives inversions are often observed, which result in suppression of recombination. Such inversions hamper the transfer of important traits from a related species to the crop by introgression breeding. Suppression of recombination was reported for the TYLCV resistance gene, Ty-2, which has been introgressed in cultivated tomato (Solanum lycopersicum) from the wild relative S. habrochaites accession B6013. Ty-2 was mapped to a 300-kb region on the long arm of chromosome 11. The suppression of recombination in the Ty-2 region could be caused by chromosomal rearrangements in S. habrochaites compared with S. lycopersicum. With the aim of visualizing the genome structure of the Ty-2 region, we compared the draft de novo assembly of S. habrochaites accession LYC4 with the sequence of cultivated tomato ('Heinz'). Furthermore, using populations derived from intraspecific crosses of S. habrochaites accessions, the order of markers in the Ty-2 region was studied. Results showed the presence of an inversion of approximately 200 kb in the Ty-2 region when comparing S. lycopersicum and S. habrochaites. By sequencing a BAC clone from the Ty-2 introgression line, one inversion breakpoint was identified. Finally, the obtained results are discussed with respect to introgression breeding and the importance of a priori de novo sequencing of the species involved.
Full Text Available Abstract Background O-glycosylation of secretory proteins has been found to be an important factor in fungal biology and virulence. It consists in the addition of short glycosidic chains to Ser or Thr residues in the protein backbone via O-glycosidic bonds. Secretory proteins in fungi frequently display Ser/Thr rich regions that could be sites of extensive O-glycosylation. We have analyzed in silico the complete sets of putatively secretory proteins coded by eight fungal genomes (Botrytis cinerea, Magnaporthe grisea, Sclerotinia sclerotiorum, Ustilago maydis, Aspergillus nidulans, Neurospora crassa, Trichoderma reesei, and Saccharomyces cerevisiae in search of Ser/Thr-rich regions as well as regions predicted to be highly O-glycosylated by NetOGlyc (http://www.cbs.dtu.dk. Results By comparison with experimental data, NetOGlyc was found to overestimate the number of O-glycosylation sites in fungi by a factor of 1.5, but to be quite reliable in the prediction of highly O-glycosylated regions. About half of secretory proteins have at least one Ser/Thr-rich region, with a Ser/Thr content of at least 40% over an average length of 40 amino acids. Most secretory proteins in filamentous fungi were predicted to be O-glycosylated, sometimes in dozens or even hundreds of sites. Residues predicted to be O-glycosylated have a tendency to be grouped together forming hyper-O-glycosylated regions of varying length. Conclusions About one fourth of secretory fungal proteins were predicted to have at least one hyper-O-glycosylated region, which consists of 45 amino acids on average and displays at least one O-glycosylated Ser or Thr every four residues. These putative highly O-glycosylated regions can be found anywhere along the proteins but have a slight tendency to be at either one of the two ends.
Amine M Boukerb
Full Text Available LecA and LecB tetrameric lectins take part in oligosaccharide-mediated adhesion-processes of Pseudomonas aeruginosa. Glycomimetics have been designed to block these interactions. The great versatility of P. aeruginosa suggests that the range of application of these glycomimetics could be restricted to genotypes with particular lectin types. The likelihood of having genomic and genetic changes impacting LecA and LecB interactions with glycomimetics such as galactosylated and fucosylated calixarene was investigated over a collection of strains from the main clades of P. aeruginosa. Lectin types were defined, and their ligand specificities were inferred. These analyses showed a loss of lecA among the PA7 clade. Genomic changes impacting lec loci were thus assessed using strains of this clade, and by making comparisons with the PAO1 genome. The lecA regions were found challenged by phage attacks and PAGI-2 (genomic island integrations. A prophage was linked to the loss of lecA. The lecB regions were found less impacted by such rearrangements but greater lecB than lecA genetic divergences were recorded. Sixteen combinations of LecA and LecB types were observed. Amino acid variations were mapped on PAO1 crystal structures. Most significant changes were observed on LecBPA7, and found close to the fucose binding site. Glycan array analyses were performed with purified LecBPA7. LecBPA7 was found less specific for fucosylated oligosaccharides than LecBPAO1, with a preference for H type 2 rather than type 1, and Lewisa rather than Lewisx. Comparison of the crystal structures of LecBPA7 and LecBPAO1 in complex with Lewisa showed these changes in specificity to have resulted from a modification of the water network between the lectin, galactose and GlcNAc residues. Incidence of these modifications on the interactions with calixarene glycomimetics at the cell level was investigated. An aggregation test was used to establish the efficacy of these ligands
Chen, Wei-Min; Erdos, Michael R; Jackson, Anne U
Identifying the genetic variants that regulate fasting glucose concentrations may further our understanding of the pathogenesis of diabetes. We therefore investigated the association of fasting glucose levels with SNPs in 2 genome-wide scans including a total of 5,088 nondiabetic individuals from...... Finland and Sardinia. We found a significant association between the SNP rs563694 and fasting glucose concentrations (P = 3.5 x 10(-7)). This association was further investigated in an additional 18,436 nondiabetic individuals of mixed European descent from 7 different studies. The combined P value...... for association in these follow-up samples was 6.9 x 10(-26), and combining results from all studies resulted in an overall P value for association of 6.4 x 10(-33). Across these studies, fasting glucose concentrations increased 0.01-0.16 mM with each copy of the major allele, accounting for approximately 1...
Jennifer L Anderson
Full Text Available Within vertebrates, major sex determining genes can differ among taxa and even within species. In zebrafish (Danio rerio, neither heteromorphic sex chromosomes nor single sex determination genes of large effect, like Sry in mammals, have yet been identified. Furthermore, environmental factors can influence zebrafish sex determination. Although progress has been made in understanding zebrafish gonad differentiation (e.g. the influence of germ cells on gonad fate, the primary genetic basis of zebrafish sex determination remains poorly understood. To identify genetic loci associated with sex, we analyzed F(2 offspring of reciprocal crosses between Oregon *AB and Nadia (NA wild-type zebrafish stocks. Genome-wide linkage analysis, using more than 5,000 sequence-based polymorphic restriction site associated (RAD-tag markers and population genomic analysis of more than 30,000 single nucleotide polymorphisms in our *ABxNA crosses revealed a sex-associated locus on the end of the long arm of chr-4 for both cross families, and an additional locus in the middle of chr-3 in one cross family. Additional sequencing showed that two SNPs in dmrt1 previously suggested to be functional candidates for sex determination in a cross of ABxIndia wild-type zebrafish, are not associated with sex in our AB fish. Our data show that sex determination in zebrafish is polygenic and that different genes may influence sex determination in different strains or that different genes become more important under different environmental conditions. The association of the end of chr-4 with sex is remarkable because, unique in the karyotype, this chromosome arm shares features with known sex chromosomes: it is highly heterochromatic, repetitive, late replicating, and has reduced recombination. Our results reveal that chr-4 has functional and structural properties expected of a sex chromosome.
Wyrwicz, Lucjan S; Koczyk, Grzegorz; Rychlewski, Leszek; Plewczynski, Dariusz
The annotation of protein folds within newly sequenced genomes is the main target for semi-automated protein structure prediction (virtual structural genomics). A large number of automated methods have been developed recently with very good results in the case of single-domain proteins. Unfortunately, most of these automated methods often fail to properly predict the distant homology between a given multi-domain protein query and structural templates. Therefore a multi-domain protein should be split into domains in order to overcome this limitation. ProteinSplit is designed to identify protein domain boundaries using a novel algorithm that predicts disordered regions in protein sequences. The software utilizes various sequence characteristics to assess the local propensity of a protein to be disordered or ordered in terms of local structure stability. These disordered parts of a protein are likely to create interdomain spacers. Because of its speed and portability, the method was successfully applied to several genome-wide fold annotation experiments. The user can run an automated analysis of sets of proteins or perform semi-automated multiple user projects (saving the results on the server). Additionally the sequences of predicted domains can be sent to the Bioinfo.PL Protein Structure Prediction Meta-Server for further protein three-dimensional structure and function prediction. The program is freely accessible as a web service at http://lucjan.bioinfo.pl/proteinsplit together with detailed benchmark results on the critical assessment of a fully automated structure prediction (CAFASP) set of sequences. The source code of the local version of protein domain boundary prediction is available upon request from the authors
Background Copepods are highly diverse and abundant, resulting in extensive ecological radiation in marine ecosystems. Calanus sinicus dominates continental shelf waters in the northwest Pacific Ocean and plays an important role in the local ecosystem by linking primary production to higher trophic levels. A lack of effective molecular markers has hindered phylogenetic and population genetic studies concerning copepods. As they are genome-level informative, mitochondrial DNA sequences can be used as markers for population genetic studies and phylogenetic studies. Results The mitochondrial genome of C. sinicus is distinct from other arthropods owing to the concurrence of multiple non-coding regions and a reshuffled gene arrangement. Further particularities in the mitogenome of C. sinicus include low A + T-content, symmetrical nucleotide composition between strands, abbreviated stop codons for several PCGs and extended lengths of the genes atp6 and atp8 relative to other copepods. The monophyletic Copepoda should be placed within the Vericrustacea. The close affinity between Cyclopoida and Poecilostomatoida suggests reassigning the latter as subordinate to the former. Monophyly of Maxillopoda is rejected. Within the alignment of 11 C. sinicus mitogenomes, there are 397 variable sites harbouring three 'hotspot' variable sites and three microsatellite loci. Conclusion The occurrence of the circular subgenomic fragment during laboratory assays suggests that special caution should be taken when sequencing mitogenomes using long PCR. Such a phenomenon may provide additional evidence of mitochondrial DNA recombination, which appears to have been a prerequisite for shaping the present mitochondrial profile of C. sinicus during its evolution. The lack of synapomorphic gene arrangements among copepods has cast doubt on the utility of gene order as a useful molecular marker for deep phylogenetic analysis. However, mitochondrial genomic sequences have been valuable markers for
Osipova, Svetlana; Permyakov, Alexey; Permyakova, Marina; Pshenichnikova, Tatyana; Verkhoturov, Vasiliy; Rudikovsky, Alexandr; Rudikovskaya, Elena; Shishparenok, Alexandr; Doroshkov, Alexey; Börner, Andreas
A quantitative trait locus (QTL) approach was taken to reveal the genetic basis in wheat of traits associated with photosynthesis during a period of exposure to water deficit stress. The performance, with respect to shoot biomass, gas exchange and chlorophyll fluorescence, leaf pigment content and the activity of various ascorbate-glutathione cycle enzymes and catalase, of a set of 80 wheat lines, each containing a single chromosomal segment introgressed from the bread wheat D genome progenitor Aegilops tauschii, was monitored in plants exposed to various water regimes. Four of the seven D genome chromosomes (1D, 2D, 5D, and 7D) carried clusters of both major (LOD >3.0) and minor (LOD between 2.0 and 3.0) QTL. A major QTL underlying the activity of glutathione reductase was located on chromosome 2D, and another, controlling the activity of ascorbate peroxidase, on chromosome 7D. A region of chromosome 2D defined by the microsatellite locus Xgwm539 and a second on chromosome 7D flanked by the marker loci Xgwm1242 and Xgwm44 harbored a number of QTL associated with the water deficit stress response.
Huang, Chen; Leung, Ross Ka-Kit; Guo, Min; Tuo, Li; Guo, Lin; Yew, Wing Wai; Lou, Inchio; Lee, Simon Ming Yuen; Sun, Chenghang
Microbial secondary metabolites are valuable resources for novel drug discovery. In particular, actinomycetes expressed a range of antibiotics against a spectrum of bacteria. In genus level, strain Allosalinactinospora lopnorensis CA15-2T is the first new actinomycete isolated from the Lop Nor region, China. Antimicrobial assays revealed that the strain could inhibit the growth of certain types of bacteria, including Acinetobacter baumannii and Staphylococcus aureus, highlighting its clinical significance. Here we report the 5,894,259 base pairs genome of the strain, containing 5,662 predicted genes, and 832 of them cannot be detected by sequence similarity-based methods, suggesting the new species may carry a novel gene pool. Furthermore, our genome-mining investigation reveals that A. lopnorensis CA15-2T contains 17 gene clusters coding for known or novel secondary metabolites. Meanwhile, at least six secondary metabolites were disclosed from ethyl acetate (EA) extract of the fermentation broth of the strain by high-resolution UPLC-MS. Compared with reported clusters of other species, many new genes were found in clusters, and the physical chromosomal location and order of genes in the clusters are distinct. This study presents evidence in support of A. lopnorensis CA15-2T as a potent natural products source for drug discovery. PMID:26864220
Ma, Xueqing; Li, Pinghua; Sun, Pu; Lu, Zengjun; Bao, Huifang; Bai, Xingwen; Fu, Yuanfang; Cao, Yimei; Li, Dong; Chen, Yingli; Qiao, Zilin; Liu, Zaixin
The deletion of residues 93-102 in non-structure protein 3A of foot-and-mouth disease virus (FMDV) is associated with the inability of FMDV to grow in bovine cells and attenuated virulence in cattle.Whereas, a previously reported FMDV strain O/HKN/21/70 harboring 93-102 deletion in 3A protein grew equally well in bovine and swine cells. This suggests that changes inFMDV genome sequence, in addition to 93-102 deletion in 3A, may also affectthe viral growth phenotype in bovine cellsduring infection and replication.However, it is nuclear that changes in which region (inside or outside of 3A region) influences FMDV growth phenotype in bovine cells.In this study, to determine the region in FMDV genomeaffecting viral growth phenotype in bovine cells, we constructed chimeric FMDVs, rvGZSB-HKN3A and rvHN-HKN3A, by introducing the 3A coding region of O/HKN/21/70 into the context of O/SEA/Mya-98 strain O/GZSB/2011 and O Cathay topotype strain O/HN/CHA/93, respectively, since O/GZSB/2011 containing full-length 3A protein replicated well in bovine and swine cells, and O/HN/CHA/93 harboring 93-102 deletion in 3A protein grew poorly in bovine cells.The chimeric virusesrvGZSB-HKN3A and rvHN-HKN3A displayed growth properties and plaque phenotypes similar to those of the parental virus rvGZSB and rv-HN in BHK-21 and primary fetal porcine kidney (FPK) cells. However, rvHN-HKN3A and rv-HN replicated poorly in primary fetal bovine kidney (FBK) cells with no visible plaques, and rvGZSB-HKN3A exhibited lower growth rate and smaller plaque size phenotypes than those of the parental virus in FBK cells, but similar growth properties and plaque phenotypes to those of the recombinant viruses harboring 93-102 deletion in 3A. These results demonstrate that the difference present in FMDV genome sequence outside the 3A coding region also have influence on FMDV replication ability in bovine cells. Copyright © 2016 Elsevier B.V. All rights reserved.
Tarzami, S T; Kringstein, A M; Conte, R A; Verma, R S
The Wolf-Hirschhorn syndrome (WHS) is caused by a partial deletion in the short arm of chromosome 4 band 16.3 (4p 16.3). A unique-sequence human DNA probe (39 kb) localized within this region has been used to search for sequence homology in the apes' equivalent chromosome 3 by FISH-technique. The WHS loci are conserved in higher primates at the expected position. Nevertheless, a control probe, which detects alphoid sequences of the pericentromeric region of humans, is diverged in chimpanzee, gorilla, and orangutan. The conservation of WHS loci and divergence of DNA alphoid sequences have further added to the controversy concerning human descent.
Prado, Santiago Alvarez; Cabrera-Bosquet, Llorenç; Grau, Antonin; Coupel-Ledru, Aude; Millet, Emilie J; Welcker, Claude; Tardieu, François
Stomatal conductance is central for the trades-off between hydraulics and photosynthesis. We aimed at deciphering its genetic control and that of its responses to evaporative demand and water deficit, a nearly impossible task with gas exchanges measurements. Whole-plant stomatal conductance was estimated via inversion of the Penman-Monteith equation from data of transpiration and plant architecture collected in a phenotyping platform. We have analysed jointly 4 experiments with contrasting environmental conditions imposed to a panel of 254 maize hybrids. Estimated whole-plant stomatal conductance closely correlated with gas-exchange measurements and biomass accumulation rate. Sixteen robust quantitative trait loci (QTLs) were identified by genome wide association studies and co-located with QTLs of transpiration and biomass. Light, vapour pressure deficit, or soil water potential largely accounted for the differences in allelic effects between experiments, thereby providing strong hypotheses for mechanisms of stomatal control and a way to select relevant candidate genes among the 1-19 genes harboured by QTLs. The combination of allelic effects, as affected by environmental conditions, accounted for the variability of stomatal conductance across a range of hybrids and environmental conditions. This approach may therefore contribute to genetic analysis and prediction of stomatal control in diverse environments. © 2017 John Wiley & Sons Ltd.
Gao, Lei; Zhou, Yuan; Wang, Zhi-Wei; Su, Ying-Juan; Wang, Ting
The rpoB-psbZ (BZ) region of some fern plastid genomes (plastomes) has been noted to go through considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the fundamental process shaping fern plastome structure and organization. A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. We found that: (i) a tree fern Plagiogyria japonica contained a novel gene order that can be generated from either the ancestral Angiopteris type or the derived Adiantum type via a single inversion; (ii) the trnY-trnE intergenic spacer (IGS) of the filmy fern Vandenboschia radicans was expanded 3-fold due to the tandem 27-bp repeats which showed strong sequence similarity with the anticodon domain of trnY; (iii) the trnY-trnE IGSs of two horsetail ferns Equisetum ramosissimum and E. arvense underwent an unprecedented 5-kb long expansion, more than a quarter of which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv) ycf66 has independently lost at least four times in ferns. Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ gene order was not detected, supporting that the Adiantum type was generated by two inversions occurring in pairs. The occurrence of Vandenboschia 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS.
Full Text Available Abstract Background The rpoB-psbZ (BZ region of some fern plastid genomes (plastomes has been noted to go through considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the fundamental process shaping fern plastome structure and organization. Results A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. We found that: (i a tree fern Plagiogyria japonica contained a novel gene order that can be generated from either the ancestral Angiopteris type or the derived Adiantum type via a single inversion; (ii the trnY-trnE intergenic spacer (IGS of the filmy fern Vandenboschia radicans was expanded 3-fold due to the tandem 27-bp repeats which showed strong sequence similarity with the anticodon domain of trnY; (iii the trnY-trnE IGSs of two horsetail ferns Equisetum ramosissimum and E. arvense underwent an unprecedented 5-kb long expansion, more than a quarter of which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv ycf66 has independently lost at least four times in ferns. Conclusions Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ gene order was not detected, supporting that the Adiantum type was generated by two inversions occurring in pairs. The occurrence of Vandenboschia 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS.
Full Text Available Chiari-like malformation (CM is a developmental abnormality of the craniocervical junction that is common in the Griffon Bruxellois (GB breed with an estimated prevalence of 65%. This disease is characterized by overcrowding of the neural parenchyma at the craniocervical junction and disturbance of cerebrospinal fluid (CSF flow. The most common clinical sign is pain either as a direct consequence of CM or neuropathic pain as a consequence of secondary syringomyelia. The etiology of CM remains unknown but genetic factors play an important role. To investigate the genetic complexity of the disease, a quantitative trait locus (QTL approach was adopted. A total of 14 quantitative skull and atlas measurements were taken and were tested for association to CM. Six traits were found to be associated to CM and were subjected to a whole-genome association study using the Illumina canine high density bead chip in 74 GB dogs (50 affected and 24 controls. Linear and mixed regression analyses identified associated single nucleotide polymorphisms (SNPs on 5 Canis Familiaris Autosomes (CFAs: CFA2, CFA9, CFA12, CFA14 and CFA24. A reconstructed haplotype of 0.53 Mb on CFA2 strongly associated to the height of the cranial fossa (diameter F and an haplotype of 2.5 Mb on CFA14 associated to both the height of the rostral part of the caudal cranial fossa (AE and the height of the brain (FG were significantly associated to CM after 10 000 permutations strengthening their candidacy for this disease (P = 0.0421, P = 0.0094 respectively. The CFA2 QTL harbours the Sall-1 gene which is an excellent candidate since its orthologue in humans is mutated in Townes-Brocks syndrome which has previously been associated to Chiari malformation I. Our study demonstrates the implication of multiple traits in the etiology of CM and has successfully identified two new QTL associated to CM and a potential candidate gene.
Guldbrandtsen, Bernt; Cai, Zexi; Sahana, Goutam
The American Mink’s (Neovison vison) genome has recently been sequenced. This opens numerous avenues of research both for studying the basic genetics and physiology of the mink as well as genetic improvement in mink. Using genotyping-by-sequencing (GBS) generated marker data for 2,352 Danish farm...... mink runs of homozygosity (ROH) were detect in mink genomes. Detectable ROH made up on average 1.7% of the genome indicating the presence of at most a moderate level of genomic inbreeding. The fraction of genome regions found in ROH varied. Ten percent of the included regions were never found in ROH....... The ability to detect ROH in the mink genome also demonstrates the general reliability of the new mink genome assembly. Keywords: american mink, run of homozygosity, genome, selection, genomic inbreeding...
Igor V Makunin
Full Text Available BACKGROUND: MicroRNAs (miRNAs are short non-coding RNAs that regulate differentiation and development in many organisms and play an important role in cancer. METHODOLOGY/PRINCIPAL FINDINGS: Using a public database of mapped retroviral insertion sites from various mouse models of cancer we demonstrate that MLV-derived retroviral inserts are enriched in close proximity to mouse miRNA loci. Clustered inserts from cancer-associated regions (Common Integration Sites, CIS have a higher association with miRNAs than non-clustered inserts. Ten CIS-associated miRNA loci containing 22 miRNAs are located within 10 kb of known CIS insertions. Only one CIS-associated miRNA locus overlaps a RefSeq protein-coding gene and six loci are located more than 10 kb from any RefSeq gene. CIS-associated miRNAs on average are more conserved in vertebrates than miRNAs associated with non-CIS inserts and their human homologs are also located in regions perturbed in cancer. In addition we show that miRNA genes are enriched around promoter and/or terminator regions of RefSeq genes in both mouse and human. CONCLUSIONS/SIGNIFICANCE: We provide a list of ten miRNA loci potentially involved in the development of blood cancer or brain tumors. There is independent experimental support from other studies for the involvement of miRNAs from at least three CIS-associated miRNA loci in cancer development.
Dinsmore, P K; Klaenhammer, T R
A spontaneous mutant of the lactococcal phage phi31 that is insensitive to the phage defense mechanism AbiA was characterized in an effort to identify the phage factor(s) involved in sensitivity of phi31 to AbiA. A point mutation was localized in the genome of the AbiA-insensitive phage (phi31A) by heteroduplex analysis of a 9-kb region. The mutation (G to T) was within a 738-bp open reading frame (ORF245) and resulted in an arginine-to-leucine change in the predicted amino acid sequence of the protein. The mutant phi31A-ORF245 reduced the sensitivity of phi31 to AbiA when present in trans, indicating that the mutation in ORF245 is responsible for the AbiA insensitivity of phi31A. Transcription of ORF245 occurs early in the phage infection cycles of phi31 and phi31A and is unaffected by AbiA. Expansion of the phi31 sequence revealed ORF169 (immediately upstream of ORF245) and ORF71 (which ends 84 bp upstream of ORF169). Two inverted repeats lie within the 84-bp region between ORF71 and ORF169. Sequence analysis of an independently isolated AbiA-insensitive phage, phi31B, identified a mutation (G to A) in one of the inverted repeats. A 118-bp fragment from phi31, encompassing the 84-bp region between ORF71 and ORF169, eliminates AbiA activity against phi31 when present in trans, establishing a relationship between AbiA and this fragment. The study of this region of phage phi31 has identified an open reading frame (ORF245) and a 118-bp DNA fragment that interact with AbiA and are likely to be involved in the sensitivity of this phage to AbiA.
Ambigapathy, Ganesh; Zheng, Zhaoqing; Keifer, Joyce
Brain-derived neurotrophic factor (BDNF) is an important regulator of neuronal development and synaptic function. The BDNF gene undergoes significant activity-dependent regulation during learning. Here, we identified the BDNF promoter regions, transcription start sites, and potential regulatory sequences for BDNF exons I-III that may contribute to activity-dependent gene and protein expression in the pond turtle Trachemys scripta elegans (tBDNF). By using transfection of BDNF promoter/luciferase plasmid constructs into human neuroblastoma SHSY5Y cells and mouse embryonic fibroblast NIH3T3 cells, we identified the basal regulatory activity of promoter sequences located upstream of each tBDNF exon, designated as pBDNFI-III. Further, through chromatin immunoprecipitation (ChIP) assays, we detected CREB binding directly to exon I and exon III promoters, while BHLHB2, but not CREB, binds within the exon II promoter. Elucidation of the promoter regions and regulatory protein binding sites in the tBDNF gene is essential for understanding the regulatory mechanisms that control tBDNF gene expression.
Duggan, Ana T; Whitten, Mark; Wiebe, Victor; Crawford, Michael; Butthof, Anne; Spitsyn, Victor; Makarov, Sergey; Novgorodov, Innokentiy; Osakovsky, Vladimir; Pakendorf, Brigitte
Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north.
Duggan, Ana T.; Whitten, Mark; Wiebe, Victor; Crawford, Michael; Butthof, Anne; Spitsyn, Victor; Makarov, Sergey; Novgorodov, Innokentiy; Osakovsky, Vladimir; Pakendorf, Brigitte
Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north. PMID:24349531
Ana T Duggan
Full Text Available Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north.
An international conference, Human Genome I, was held Oct. 2-4, 1989 in San Diego, Calif. Selected speakers discussed: Current Status of the Genome Project; Technique Innovations; Interesting regions; Applications; and Organization - Different Views of Current and Future Science and Procedures. Posters, consisting of 119 presentations, were displayed during the sessions. 119 were indexed for inclusion to the Energy Data Base
Full Text Available Genomic RNA of HIV-1 contains localized structures critical for viral replication. Its structural analysis has demonstrated a stem-loop structure, SLSA1, in a nearby region of HIV-1 genomic splicing acceptor 1 (SA1. We have previously shown that the expression level of vif mRNA is considerably altered by some natural single-nucleotide variations (nSNVs clustering in SLSA1 structure. In this study, besides eleven nSNVs previously identified by us, we totally found nine new nSNVs in the SLSA1-containing sequence from SA1, splicing donor 2, and through to the start codon of Vif that significantly affect the vif mRNA level, and designated the sequence SA1D2prox (142 nucleotides for HIV-1 NL4-3. We then examined by extensive variant and mutagenesis analyses how SA1D2prox sequence and SLSA1 secondary structure are related to vif mRNA level. While the secondary structure and stability of SLSA1 was largely changed by nSNVs and artificial mutations introduced to restore the original NL4-3 form from altered ones by nSNVs, no clear association of the two SLSA1 properties with vif mRNA level was observed. In contrast, when naturally occurring SA1D2prox sequences that contain multiple nSNVs were examined, we attained significant inverse correlation between the vif level and SLSA1 stability. These results may suggest that SA1D2prox sequence adapts over time, and also that the altered SA1D2prox sequence, SLSA1 stability, and vif level are mutually related. In total, we show here that the entire SA1D2prox sequence and SLSA1 stability critically contribute to the modulation of vif mRNA level.
Full Text Available Bovine viral diarrhoea virus 2 (BVDV-2 strains demonstrated in cattle, sheep and adventitious contaminants of biological products were evaluated by the palindromic nucleotide substitutions (PNS method at the three variable loci (V1, V2 and V3 in the 5’ untranslated region (UTR, to determine their taxonomic status. Variation in conserved genomic sequences was used as a parameter for the epidemiological evaluation of the species in relation to geographic distribution, animal host and virulence. Four genotypes were identified within the species. Taxonomic segregation corresponded to geographic distribution of genotype variants. Genotype 2a was distributed worldwide and was also the only genotype that was circulating in sheep and cattle. Genotypes 2b, 2c and 2d were restricted to South America. Genotypes 2a and 2d were related to the contamination of biological products. Genetic variation could be related to the spread of BVDV-2 species variants in different geographic areas. Chronologically, the species emerged in North America in 1978 and spread to the United Kingdom and Japan, continental Europe, South America and New Zealand. Correlation between clinical features related with isolation of BVDV-2 strains and genetic variation indicated that subgenotype 1, variant 4 of genotype 2a, was related to a haemorrhagic syndrome. These observations suggest that the evaluation of genomic secondary structures, by identifying markers for expression of virus biological activities and species evolutionary history, may be a useful tool for the epidemiological evaluation of BVDV-2 species and possibly of other species of the genus Pestivirus.
Full Text Available Abstract Background The highly improved cognitive function is the most significant change in human evolutionary history. Recently, several large-scale studies reported the evolutionary roles of DNA methylation; however, the role of DNA methylation on brain evolution is largely unknown. Results To test if DNA methylation has contributed to the evolution of human brain, with the use of MeDIP-Chip and SEQUENOM MassARRAY, we conducted a genome-wide analysis to identify differentially methylated regions (DMRs in the brain between humans and rhesus macaques. We first identified a total of 150 candidate DMRs by the MeDIP-Chip method, among which 4 DMRs were confirmed by the MassARRAY analysis. All 4 DMRs are within or close to the CpG islands, and a MIR3 repeat element was identified in one DMR, but no repeat sequence was observed in the other 3 DMRs. For the 4 DMR genes, their proteins tend to be conserved and two genes have neural related functions. Bisulfite sequencing and phylogenetic comparison among human, chimpanzee, rhesus macaque and rat suggested several regions of lineage specific DNA methylation, including a human specific hypomethylated region in the promoter of K6IRS2 gene. Conclusions Our study provides a new angle of studying human brain evolution and understanding the evolutionary role of DNA methylation in the central nervous system. The results suggest that the patterns of DNA methylation in the brain are in general similar between humans and non-human primates, and only a few DMRs were identified.
Anton S. Sulima
Full Text Available During the initial step of the symbiosis between legumes (Fabaceae and nitrogen-fixing bacteria (rhizobia, the bacterial signal molecule known as the Nod factor (nodulation factor is recognized by plant LysM motif-containing receptor-like kinases (LysM-RLKs. The fifth chromosome of barrel medic (Medicago truncatula Gaertn. contains a cluster of paralogous LysM-RLK genes, one of which is known to participate in symbiosis. In the syntenic region of the pea (Pisum sativum L. genome, three genes have been identified: PsK1 and PsSym37, two symbiosis-related LysM-RLK genes with known sequences, and the unsequenced PsSym2 gene which presumably encodes a LysM-RLK and is associated with increased selectivity to certain Nod factors. In this work, we identified a new gene encoding a LysM-RLK, designated as PsLykX, within the Sym2 genomic region. We sequenced the first exons (corresponding to the protein receptor domain of PsSym37, PsK1, and PsLykX from a large set of pea genotypes of diverse origin. The nucleotide diversity of these fragments was estimated and groups of haplotypes for each gene were revealed. Footprints of selection pressure were detected via comparative analyses of SNP distribution across the first exons of these genes and their homologs MtLYK2, MtLYK3, and MtLYK4 from M. truncatula retrieved from the Medicago Hapmap project. Despite the remarkable similarity among all the studied genes, they exhibited contrasting selection signatures, possibly pointing to diversification of their functions. Signatures of balancing selection were found in LysM1-encoding parts of PsSym37 and PsK1, suggesting that the diversity of these parts may be important for pea LysM-RLKs. The first exons of PsSym37 and PsK1 displayed signatures of purifying selection, as well as MtLYK2 of M. truncatula. Evidence of positive selection affecting primarily LysM domains was found in all three investigated M. truncatula genes, as well as in the pea gene PsLykX. The data
Wang, Ningning; Zhang, Di; Wang, Zhenhui; Xun, Hongwei; Ma, Jian; Wang, Hui; Huang, Wei; Liu, Ying; Lin, Xiuyun; Li, Ning; Ou, Xiufang; Zhang, Chunyu; Wang, Ming-Bo; Liu, Bao
Endogenous small (sm) RNAs (primarily si- and miRNAs) are important trans/cis-acting regulators involved in diverse cellular functions. In plants, the RNA-dependent RNA polymerases (RDRs) are essential for smRNA biogenesis. It has been established that RDR2 is involved in the 24 nt siRNA-dependent RNA-directed DNA methylation (RdDM) pathway. Recent studies have suggested that RDR1 is involved in a second RdDM pathway that relies mostly on 21 nt smRNAs and functions to silence a subset of genomic loci that are usually refractory to the normal RdDM pathway in Arabidopsis. Whether and to what extent the homologs of RDR1 may have similar functions in other plants remained unknown. We characterized a loss-of-function mutant (Osrdr1) of the OsRDR1 gene in rice (Oryza sativa L.) derived from a retrotransposon Tos17 insertion. Microarray analysis identified 1,175 differentially expressed genes (5.2% of all expressed genes in the shoot-tip tissue of rice) between Osrdr1 and WT, of which 896 and 279 genes were up- and down-regulated, respectively, in Osrdr1. smRNA sequencing revealed regional alterations in smRNA clusters across the rice genome. Some of the regions with altered smRNA clusters were associated with changes in DNA methylation. In addition, altered expression of several miRNAs was detected in Osrdr1, and at least some of which were associated with altered expression of predicted miRNA target genes. Despite these changes, no phenotypic difference was identified in Osrdr1 relative to WT under normal condition; however, ephemeral phenotypic fluctuations occurred under some abiotic stress conditions. Our results showed that OsRDR1 plays a role in regulating a substantial number of endogenous genes with diverse functions in rice through smRNA-mediated pathways involving DNA methylation, and which participates in abiotic stress response.
Bai, Yongsheng; Kinne, Jeff; Ding, Lizhong; Rath, Ethan C; Cox, Aaron; Naidu, Siva Dharman
It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The "Read-Split-Walk" (RSW) and "Read-Split-Run" (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, "Read-Split-Fly" (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5'ss and 3'ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5'ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. Our RSF pipeline is able to detect many possible junctions
Full Text Available Abstract Background Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Methods Array-comparative genomic hybridization (array-CGH was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR, chromogenic in situ hybridization (CISH, reverse transcriptase-qPCR (RT-qPCR, and immunohistochemistry (IHC in more patients. Results We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1 functioning in Rho activity control, FRAT2 (10q24.1 involved in Wnt signaling, PAFAH1B1 (17p13.3 functioning in motility control, and ZNF322A (6p22.1 involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (PP=0.06. In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of
Lo, Fang-Yi; Nandi, Suvobroto; Salgia, Ravi; Wang, Yi-Ching; Chang, Jer-Wei; Chang, I-Shou; Chen, Yann-Jang; Hsu, Han-Shui; Huang, Shiu-Feng Kathy; Tsai, Fang-Yu; Jiang, Shih Sheng; Kanteti, Rajani
Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Array-comparative genomic hybridization (array-CGH) was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR), chromogenic in situ hybridization (CISH), reverse transcriptase-qPCR (RT-qPCR), and immunohistochemistry (IHC) in more patients. We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1) functioning in Rho activity control, FRAT2 (10q24.1) involved in Wnt signaling, PAFAH1B1 (17p13.3) functioning in motility control, and ZNF322A (6p22.1) involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (P<0.001~P=0.06). In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of PAFAH1B1 protein overexpression was 68
Roshandel, Delnaz; Gubitosi-Klug, Rose; Bull, Shelley B; Canty, Angelo J; Pezzolesi, Marcus G; King, George L; Keenan, Hillary A; Snell-Bergeon, Janet K; Maahs, David M; Klein, Ronald; Klein, Barbara E K; Orchard, Trevor J; Costacou, Tina; Weedon, Michael N; Oram, Richard A; Paterson, Andrew D
The aim of this study was to identify genetic variants associated with beta cell function in type 1 diabetes, as measured by serum C-peptide levels, through meta-genome-wide association studies (meta-GWAS). We performed a meta-GWAS to combine the results from five studies in type 1 diabetes with cross-sectionally measured stimulated, fasting or random C-peptide levels, including 3479 European participants. The p values across studies were combined, taking into account sample size and direction of effect. We also performed separate meta-GWAS for stimulated (n = 1303), fasting (n = 2019) and random (n = 1497) C-peptide levels. In the meta-GWAS for stimulated/fasting/random C-peptide levels, a SNP on chromosome 1, rs559047 (Chr1:238753916, T>A, minor allele frequency [MAF] 0.24-0.26), was associated with C-peptide (p = 4.13 × 10 -8 ), meeting the genome-wide significance threshold (p C>T, MAF 0.07-0.10, p = 8.43 × 10 -8 ). In the stimulated C-peptide meta-GWAS, rs61211515 (Chr6:30100975, T/-, MAF 0.17-0.19) in the MHC region was associated with stimulated C-peptide (β [SE] = - 0.39 [0.07], p = 9.72 × 10 -8 ). rs61211515 was also associated with the rate of stimulated C-peptide decline over time in a subset of individuals (n = 258) with annual repeated measures for up to 6 years (p = 0.02). In the meta-GWAS of random C-peptide, another MHC region, SNP rs3135002 (Chr6:32668439, C>A, MAF 0.02-0.06), was associated with C-peptide (p = 3.49 × 10 -8 ). Conditional analyses suggested that the three identified variants in the MHC region were independent of each other. rs9260151 and rs3135002 have been associated with type 1 diabetes, whereas rs559047 and rs61211515 have not been associated with a risk of developing type 1 diabetes. We identified a locus on chromosome 1 and multiple variants in the MHC region, at least some of which were distinct from type 1 diabetes risk loci, that were associated with C
Full Text Available Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR to map a genome region linked to Alternaria brown spot (ABS resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.
Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis
Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.
The aneupolyploidy genome of sugarcane (Saccharum hybrids spp.) and lack of a classical genetic linkage map make genetics research most difficult for sugarcane. Whole genome sequencing and genetic characterization of sugarcane and related taxa are far behind other crops. In this study, universal PCR...
Solomon, N.M.; Ross, S.; Morgan, T.; Belsky, J.L.; Hol, F.A.; Karnes, P.; Hopwood, N.J.; Myers, S.E.; Tan, A.; Warne, G.L.; Forrest, S.M.; Thomas, P.Q.
INTRODUCTION: Array comparative genomic hybridisation (array CGH) is a powerful method that detects alteration of gene copy number with greater resolution and efficiency than traditional methods. However, its ability to detect disease causing duplications in constitutional genomic DNA has not been
Karn, Robert C; Chung, Amanda G; Laukaitis, Christina M
The Androgen-binding protein (Abp) region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1) no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2) substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3) that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution.
Suitability of the molecular subtyping methods intergenic spacer region, direct genome restriction analysis, and pulsed-field gel electrophoresis for clinical and environmental Vibrio parahaemolyticus isolates.
Lüdeke, Catharina H M; Fischer, Markus; LaFon, Patti; Cooper, Kara; Jones, Jessica L
Vibrio parahaemolyticus is the leading cause of infectious illness associated with seafood consumption in the United States. Molecular fingerprinting of strains has become a valuable research tool for understanding this pathogen. However, there are many subtyping methods available and little information on how they compare to one another. For this study, a collection of 67 oyster and 77 clinical V. parahaemolyticus isolates were analyzed by three subtyping methods--intergenic spacer region (ISR-1), direct genome restriction analysis (DGREA), and pulsed-field gel electrophoresis (PFGE)--to determine the utility of these methods for discriminatory subtyping. ISR-1 analysis, run as previously described, provided the lowest discrimination of all the methods (discriminatory index [DI]=0.8665). However, using a broader analytical range than previously reported, ISR-1 clustered isolates based on origin (oyster versus clinical) and had a DI=0.9986. DGREA provided a DI=0.9993-0.9995, but did not consistently cluster the isolates by any identifiable characteristics (origin, serotype, or virulence genotype) and ∼ 15% of isolates were untypeable by this method. PFGE provided a DI=0.9998 when using the combined pattern analysis of both restriction enzymes, SfiI and NotI. This analysis was more discriminatory than using either enzyme pattern alone and primarily grouped isolates by serotype, regardless of strain origin (clinical or oyster) or presence of currently accepted virulence markers. These results indicate that PFGE and ISR-1 are more reliable methods for subtyping V. parahemolyticus, rather than DGREA. Additionally, ISR-1 may provide an indication of pathogenic potential; however, more detailed studies are needed. These data highlight the diversity within V. parahaemolyticus and the need for appropriate selection of subtyping methods depending on the study objectives.
Kalcheva, I D; Matsuda, Y; Plass, C; Chapman, V M
Representational difference analysis was used to identify strain-specific differences in the pseudoautosomal region (PAR) of mouse X and Y chromosomes. One second generation (C57BL/6 x Mus spretus) x Mus spretus interspecific backcross male carrying the C57BL/6 (B6) PAR was used for tester DNA. DNA from five backcross males from the same generation that were M. spretus-type for the PAR was pooled for the driver. A cloned probe designated B6-38 was recovered that is B6-specific in Southern analysis. Analysis of genomic DNA from several inbred strains of laboratory mice and diverse Mus species and subspecies identified a characteristic Pst I pattern of fragment sizes that is present only in the C57BL family of strains. Hybridization was observed with sequences in DBA/2J and to a limited extent with Mus musculus (PWK strain) and Mus castaneus DNA. No hybridization was observed in DNA of different Mus species, M. spretus, M. hortulanus, and M. caroli. Genetic analyses of B6-38 was conducted using C57BL congenic males that carry M. spretus alleles for distal X chromosome loci and the PAR and outcrosses of heterozygous congenic females with M. spretus. These analyses demonstrated that the B6-38 sequences were inherited with both the X and Y chromosome. B6-38 sequences were genetically mapped as a locus within the PAR using two interspecific backcrosses. The locus defined by B6-38 is designated DXYRp1. Preliminary analyses of recombination between the distal X chromosome gene amelogenin (Amg) and the PAR loci for either TelXY or sex chromosome association (Sxa) suggest that the locus DXYRp1 maps to the distal portion of the PAR.
Hamzah, Haider Mousa
In the microbial fuel cell (MFC) project, power generation from Shewanella oneidensis MR-1 was analyzed looking for a novel system for both energy generation and sustainability. The results suggest the possibility of generating electricity from different organic substances, which include agricultural and industrial by-products. Shewanella oneidensis MR-1 generates usable electrons at 30°C using both submerged and solid state cultures. In the MFC biocathode experiment, most of the CO2 generated at the anodic chamber was converted into bicarbonate due the activity of carbonic anhydrase (CA) of the Gluconobacter sp.33 strain. These findings demonstrate the possibility of generation of electricity while at the same time allowing the biomimetic sequestration of CO2 using bacterial CA. In the mitochondrial genomes project, the filamentous fungal species Fusarium oxysporum was used as a model. This species causes wilt of several important agricultural crops. A previous study revealed that a highly variable region (HVR) in the mitochondrial DNA (mtDNA) of three species of Fusarium contained a large, variable unidentified open reading frame (LV-uORF). Using specific primers for two regions of the LV-uORF, six strains were found to contain the ORF by PCR and database searches identified 18 other strains outside of the Fusarium oxysporum species complex. The LV-uORF was also identified in three isolates of the F. oxysporum species complex. Interestingly, several F. oxysporum isolates lack the LV-uORF and instead contain 13 ORFs in the HVR, nine of which are unidentified. The high GC content and codon usage of the LV-uORF indicate that it did not co-evolve with other mt genes and was horizontally acquired and was introduced to the Fusarium lineage prior to speciation. The nonsynonymous/synonymous (dN/dS) ratio of the LV-uORFs (0.43) suggests it is under purifying selection and the putative polypeptide is predicted to be located in the mitochondrial membrane. Growth assays
National Aeronautics and Space Administration — The whole-genome sequences of eight fungal strains that were selected for exposure to microgravity at the International Space Station are presented here. These...
Nair, Nidhi; Shoaib, Muhammad; Sørensen, Claus Storgaard
Genomic DNA is compacted into chromatin through packaging with histone and non-histone proteins. Importantly, DNA accessibility is dynamically regulated to ensure genome stability. This is exemplified in the response to DNA damage where chromatin relaxation near genomic lesions serves to promote...... access of relevant enzymes to specific DNA regions for signaling and repair. Furthermore, recent data highlight genome maintenance roles of chromatin through the regulation of endogenous DNA-templated processes including transcription and replication. Here, we review research that shows the importance...... of chromatin structure regulation in maintaining genome integrity by multiple mechanisms including facilitating DNA repair and directly suppressing endogenous DNA damage....
Earl, Ashlee M; Losick, Richard; Kolter, Roberto
Microarray-based comparative genomic hybridization (M-CGH) is a powerful method for rapidly identifying regions of genome diversity among closely related organisms. We used M-CGH to examine the genome diversity of 17 strains belonging to the nonpathogenic species Bacillus subtilis. Our M-CGH results indicate that there is considerable genetic heterogeneity among members of this species; nearly one-third of Bsu168-specific genes exhibited variability, as measured by the microarray hybridization intensities. The variable loci include those encoding proteins involved in antibiotic production, cell wall synthesis, sporulation, and germination. The diversity in these genes may reflect this organism's ability to survive in diverse natural settings.
Molineris, I.; Sales, G.
The amount of information about genomes, both in the form of complete sequences and annotations, has been exponentially increasing in the last few years. As a result there is the need for tools providing a graphical representation of such information that should be comprehensive and intuitive. Visual representation is especially important in the comparative genomics field since it should provide a combined view of data belonging to different genomes. We believe that existing tools are limited in this respect as they focus on a single genome at a time (conservation histograms) or compress alignment representation to a single dimension. We have therefore developed a web-based tool called Comparative Genome Viewer (Cgv): it integrates a bidimensional representation of alignments between two regions, both at small and big scales, with the richness of annotations present in other genome browsers. We give access to our system through a web-based interface that provides the user with an interactive representation that can be updated in real time using the mouse to move from region to region and to zoom in on interesting details.
DeLong, Edward F
The complete genome sequence of Thermoplasma acidophilum, an acid- and heat-loving archaeon, has recently been reported. Comparative genomic analysis of this 'extremophile' is providing new insights into the metabolic machinery, ecology and evolution of thermophilic archaea.
A gene-based high-resolution comparative radiation hybrid map as a framework for genome sequence assembly of a bovine chromosome 6 region associated with QTL for growth, body composition, and milk performance traits
Full Text Available Abstract Background A number of different quantitative trait loci (QTL for various phenotypic traits, including milk production, functional, and conformation traits in dairy cattle as well as growth and body composition traits in meat cattle, have been mapped consistently in the middle region of bovine chromosome 6 (BTA6. Dense genetic and physical maps and, ultimately, a fully annotated genome sequence as well as their mutual connections are required to efficiently identify genes and gene variants responsible for genetic variation of phenotypic traits. A comprehensive high-resolution gene-rich map linking densely spaced bovine markers and genes to the annotated human genome sequence is required as a framework to facilitate this approach for the region on BTA6 carrying the QTL. Results Therefore, we constructed a high-resolution radiation hybrid (RH map for the QTL containing chromosomal region of BTA6. This new RH map with a total of 234 loci including 115 genes and ESTs displays a substantial increase in loci density compared to existing physical BTA6 maps. Screening the available bovine genome sequence resources, a total of 73 loci could be assigned to sequence contigs, which were already identified as specific for BTA6. For 43 loci, corresponding sequence contigs, which were not yet placed on the bovine genome assembly, were identified. In addition, the improved potential of this high-resolution RH map for BTA6 with respect to comparative mapping was demonstrated. Mapping a large number of genes on BTA6 and cross-referencing them with map locations in corresponding syntenic multi-species chromosome segments (human, mouse, rat, dog, chicken achieved a refined accurate alignment of conserved segments and evolutionary breakpoints across the species included. Conclusion The gene-anchored high-resolution RH map (1 locus/300 kb for the targeted region of BTA6 presented here will provide a valuable platform to guide high-quality assembling and
Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou
In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.
Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya
For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...
Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun
The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.
Stephanie A Bien
Full Text Available The evaluation of less frequent genetic variants and their effect on complex disease pose new challenges for genomic research. To investigate whether epigenetic data can be used to inform aggregate rare-variant association methods (RVAM, we assessed whether variants more significantly associated with colorectal cancer (CRC were preferentially located in non-coding regulatory regions, and whether enrichment was specific to colorectal tissues.Active regulatory elements (ARE were mapped using data from 127 tissues and cell-types from NIH Roadmap Epigenomics and Encyclopedia of DNA Elements (ENCODE projects. We investigated whether CRC association p-values were more significant for common variants inside versus outside AREs, or 2 inside colorectal (CR AREs versus AREs of other tissues and cell-types. We employed an integrative epigenomic RVAM for variants with allele frequency <1%. Gene sets were defined as ARE variants within 200 kilobases of a transcription start site (TSS using either CR ARE or ARE from non-digestive tissues. CRC-set association p-values were used to evaluate enrichment of less frequent variant associations in CR ARE versus non-digestive ARE.ARE from 126/127 tissues and cell-types were significantly enriched for stronger CRC-variant associations. Strongest enrichment was observed for digestive tissues and immune cell types. CR-specific ARE were also enriched for stronger CRC-variant associations compared to ARE combined across non-digestive tissues (p-value = 9.6 × 10-4. Additionally, we found enrichment of stronger CRC association p-values for rare variant sets of CR ARE compared to non-digestive ARE (p-value = 0.029.Integrative epigenomic RVAM may enable discovery of less frequent variants associated with CRC, and ARE of digestive and immune tissues are most informative. Although distance-based aggregation of less frequent variants in CR ARE surrounding TSS showed modest enrichment, future association studies would likely
Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner
Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...
Valles, Iliana Espinoza; Vora, Gary J; Lin, Baochuan
Vibrio harveyi CAIM 1792 is a marine bacterial strain that causes mortality in farmed shrimp in north-west Mexico, and the identification of virulence genes in this strain is important for understanding its pathogenicity. The aim of this work was to compare the V. harveyi CAIM 1792 genome....... The proteome of CAIM 1792 had higher similarity to those of other V. harveyi strains (78 %) than to those of the other closely related species Vibrio owensii (67 %), Vibrio rotiferianus (63 %) and Vibrio campbellii (59 %). Pan-genome ORFans trees showed the best fit with the accepted phylogeny based on DNA......-DNA hybridization and multi-locus sequence analysis of 11 concatenated housekeeping genes. SNP analysis clustered 34/38 genomes within their accepted species. The pangenomic and SNP trees showed that V. harveyi is the most conserved of the four species studied and V. campbellii may be divided into at least three...
Swithers, Kristen S.; DiPippo, Jonathan L.; Bruce, David C.; Detter, Christopher; Tapia, Roxanne; Han, Shunsheng; Saunders, Elizabeth; Goodwin, Lynne A.; Han, James; Woyke, Tanja; Pitluck, Sam; Pennacchio, Len; Nolan, Matthew; Mikhailova, Natalia; Lykidis, Athanasios; Land, Miriam L.; Brettin, Thomas; Stetter, Karl O.; Nelson, Karen E.; Gogarten, J. Peter; Noll, Kenneth M.
Thermotoga sp. strain RQ2 is probably a strain of Thermotoga maritima. Its complete genome sequence allows for an examination of the extent and consequences of gene flow within Thermotoga species and strains. Thermotoga sp. RQ2 differs from T. maritima in its genes involved in myo-inositol metabolism. Its genome also encodes an apparent fructose phosphotransferase system (PTS) sugar transporter. This operon is also found in Thermotoga naphthophila strain RKU-10 but no other Thermotogales. These are the first reported PTS transporters in the Thermotogales. PMID:21952543
Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.
Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R
We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.
Benferhat, Rima; Josse, Thibaut; Albaud, Benoit; Gentien, David; Mansuroglu, Zeyni; Marcato, Vasco; Souès, Sylvie; Le Bonniec, Bernard; Bouloy, Michèle; Bonnefoy, Eliette
Rift Valley fever virus (RVFV) is a highly pathogenic Phlebovirus that infects humans and ruminants. Initially confined to Africa, RVFV has spread outside Africa and presently represents a high risk to other geographic regions. It is responsible for high fatality rates in sheep and cattle. In humans, RVFV can induce hepatitis, encephalitis, retinitis, or fatal hemorrhagic fever. The nonstructural NSs protein that is the major virulence factor is found in the nuclei of infected cells where it associates with cellular transcription factors and cofactors. In previous work, we have shown that NSs interacts with the promoter region of the beta interferon gene abnormally maintaining the promoter in a repressed state. In this work, we performed a genome-wide analysis of the interactions between NSs and the host genome using a genome-wide chromatin immunoprecipitation combined with promoter sequence microarray, the ChIP-on-chip technique. Several cellular promoter regions were identified as significantly interacting with NSs, and the establishment of NSs interactions with these regions was often found linked to deregulation of expression of the corresponding genes. Among annotated NSs-interacting genes were present not only genes regulating innate immunity and inflammation but also genes regulating cellular pathways that have not yet been identified as targeted by RVFV. Several of these pathways, such as cell adhesion, axonal guidance, development, and coagulation were closely related to RVFV-induced disorders. In particular, we show in this work that NSs targeted and modified the expression of genes coding for coagulation factors, demonstrating for the first time that this hemorrhagic virus impairs the host coagulation cascade at the transcriptional level.
Doekes, Harmen P.; Veerkamp, Roel F.; Bijma, Piter; Hiemstra, Sipke J.; Windig, Jack J.
Background: In recent decades, Holstein-Friesian (HF) selection schemes have undergone profound changes, including the introduction of optimal contribution selection (OCS; around 2000), a major shift in breeding goal composition (around 2000) and the implementation of genomic selection (GS;
Shiga toxin-producing Escherichia coli is a foodborne and waterborne pathogen and is responsible for outbreaks of human gastroenteritis. This report documents the draft genome sequences of seven O113:H21 strains recovered from livestock, wildlife, and soil samples collected in a major agricultural r...
Zhang, Yanzhen; Ma, Ji; Yang, Bingxian; Li, Ruyi; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Zhang, Lin
Taxus chinensis var. mairei (Taxaceae) is a domestic variety of yew species in local China. This plant is one of the sources for paclitaxel, which is a promising antineoplastic chemotherapy drugs during the last decade. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of T. chinensis var. mairei. The T. chinensis var. mairei cp genome is 129,513 bp in length, with 113 single copy genes and two duplicated genes (trnI-CAU, trnQ-UUG). Among the 113 single copy genes, 9 are intron-containing. Compared to other land plant cp genomes, the T. chinensis var. mairei cp genome has lost one of the large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperm such as Cycas revoluta and Ginkgo biloba L. Compared to related species, the gene order of T. chinensis var. mairei has a large inversion of ~110kb including 91 genes (from rps18 to accD) with gene contents unarranged. Repeat analysis identified 48 direct and 2 inverted repeats 30 bp long or longer with a sequence identity greater than 90%. Repeated short segments were found in genes rps18, rps19 and clpP. Analysis also revealed 22 simple sequence repeat (SSR) loci and almost all are composed of A or T. Copyright © 2014 Elsevier B.V. All rights reserved.
Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.; Machida, Masayuki; Hirano, Takashi
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome. PMID:22912434
Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E; Machida, Masayuki; Hirano, Takashi
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome.
Gandar, Frédéric; Wilkie, Gavin S.; Gatherer, Derek; Kerr, Karen; Marlier, Didier; Diez, Marianne; Marschang, Rachel E.; Mast, Jan; Dewals, Benjamin G.
vitro, and investigated the pathogenesis of strain 4295, which consists of three deletion mutants. The major findings are that (i) TeHV-3 has a novel genome structure, (ii) its closest relative is a turtle herpesvirus, (iii) it contains interleukin-10 and semaphorin genes (the first time these have been reported in an alphaherpesvirus), (iv) a sizeable region of the genome is not required for viral replication in vitro or virulence in vivo, and (v) one of the components of strain 4295, which has a deletion of 22.4 kb, exhibits properties indicating that it may serve as the starting point for an attenuated vaccine. PMID:26339050
Full Text Available The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid
Piskur, Jure; Langkjær, Rikke Breinhold
For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...
Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E
The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Arabidopsis thaliana population analysis reveals high plasticity of the genomic region spanning MSH2, AT3G18530 and AT3G18535 genes and provides evidence for NAHR-driven recurrent CNV events occurring in this location.
Zmienko, Agnieszka; Samelak-Czajka, Anna; Kozlowski, Piotr; Szymanska, Maja; Figlerowicz, Marek
Intraspecies copy number variations (CNVs), defined as unbalanced structural variations of specific genomic loci, ≥1 kb in size, are present in the genomes of animals and plants. A growing number of examples indicate that CNVs may have functional significance and contribute to phenotypic diversity. In the model plant Arabidopsis thaliana at least several hundred protein-coding genes might display CNV; however, locus-specific genotyping studies in this plant have not been conducted. We analyzed the natural CNVs in the region overlapping MSH2 gene that encodes the DNA mismatch repair protein, and AT3G18530 and AT3G18535 genes that encode poorly characterized proteins. By applying multiplex ligation-dependent probe amplification and droplet digital PCR we genotyped those genes in 189 A. thaliana accessions. We found that AT3G18530 and AT3G18535 were duplicated (2-14 times) in 20 and deleted in 101 accessions. MSH2 was duplicated in 12 accessions (up to 12-14 copies) but never deleted. In all but one case, the MSH2 duplications were associated with those of AT3G18530 and AT3G18535. Considering the structure of the CNVs, we distinguished 5 genotypes for this region, determined their frequency and geographical distribution. We defined the CNV breakpoints in 35 accessions with AT3G18530 and AT3G18535 deletions and tandem duplications and showed that they were reciprocal events, resulting from non-allelic homologous recombination between 99 %-identical sequences flanking these genes. The widespread geographical distribution of the deletions supported by the SNP and linkage disequilibrium analyses of the genomic sequence confirmed the recurrent nature of this CNV. We characterized in detail for the first time the complex multiallelic CNV in Arabidopsis genome. The region encoding MSH2, AT3G18530 and AT3G18535 genes shows enormous variation of copy numbers among natural ecotypes, being a remarkable example of high Arabidopsis genome plasticity. We provided the molecular
Richard A Hurt
Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL
Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
Marilyn C Cornelis
Full Text Available We report the first genome-wide association study of habitual caffeine intake. We included 47,341 individuals of European descent based on five population-based studies within the United States. In a meta-analysis adjusted for age, sex, smoking, and eigenvectors of population variation, two loci achieved genome-wide significance: 7p21 (P = 2.4 × 10(-19, near AHR, and 15q24 (P = 5.2 × 10(-14, between CYP1A1 and CYP1A2. Both the AHR and CYP1A2 genes are biologically plausible candidates as CYP1A2 metabolizes caffeine and AHR regulates CYP1A2.
Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and ...
Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham
Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Full Text Available Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
the cell nucleus (mitochondrial and chloroplast genomes), and. (3) traits governed ... tively good embryonic development but very poor development of membranes and ... Human homologies for the type of situation described above are naturally ..... imprint; (b) New modifications of the paternal genome in germ cells of each ...
Oers, van M.M.; Vlak, J.M.
Baculovirus genomes are covalently closed circles of double stranded-DNA varying in size between 80 and 180 kilobase-pair. The genomes of more than fourty-one baculoviruses have been sequenced to date. The majority of these (37) are pathogenic to lepidopteran hosts; three infect sawflies
... this database. Top of Page Evaluation of Genomic Applications in Practice and Prevention (EGAPP™) In 2004, the Centers for Disease Control and Prevention launched the EGAPP initiative to establish and test a ... and other applications of genomic technology that are in transition from ...
Hoelzel, A Rus
Ever since its invention, the polymerase chain reaction has been the method of choice for work with ancient DNA. In an application of modern genomic methods to material from the Pleistocene, a recent study has instead undertaken to clone and sequence a portion of the ancient genome of the cave bear.
After the recent successes of genome-wide association studies (GWAS), one key challenge is to identify genetic variants that might have a significant joint effect on complex diseases but have failed to be identified individually due to weak to moderate marginal effect. One popular and effective approach is gene set based analysis, which investigates the joint effect of multiple functionally related genes (eg, pathways). However, a typical gene set analysis method is biased towards long genes, a problem that is especially severe in psychiatric diseases.
Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035
Wang, Long; Shi, Qinghua; Su, Handong; Wang, Yi; Sha, Lina; Fan, Xing; Kang, Houyang; Zhang, Haiqin; Zhou, Yonghong
The St genome is one of the most fundamental genomes in Triticeae. Repetitive sequences are widely used to distinguish different genomes or species. The primary objectives of this study were to (i) screen a new sequence that could easily distinguish the chromosome of the St genome from those of other genomes by fluorescence in situ hybridization (FISH) and (ii) investigate the genome constitution of some species that remain uncertain and controversial. We used degenerated oligonucleotide primer PCR (Dop-PCR), Dot-blot, and FISH to screen for a new marker of the St genome and to test the efficiency of this marker in the detection of the St chromosome at different ploidy levels. Signals produced by a new FISH marker (denoted St 2 -80) were present on the entire arm of chromosomes of the St genome, except in the centromeric region. On the contrary, St 2 -80 signals were present in the terminal region of chromosomes of the E, H, P, and Y genomes. No signal was detected in the A and B genomes, and only weak signals were detected in the terminal region of chromosomes of the D genome. St 2 -80 signals were obvious and stable in chromosomes of different genomes, whether diploid or polyploid. Therefore, St 2 -80 is a potential and useful FISH marker that can be used to distinguish the St genome from those of other genomes in Triticeae.
Full Text Available Apomixis, or clonal propagation through seed, is a trait identified within multiple species of the grass family (Poaceae. The genetic locus controlling apomixis in Pennisetum squamulatum (syn Cenchrus squamulatus and Cenchrus ciliaris (syn Pennisetum ciliare, buffelgrass is the apospory-specific genomic region (ASGR. Previously, the ASGR was shown to be highly conserved but inverted in marker order between P. squamulatum and C. ciliaris based on fluorescence in situ hybridization (FISH and varied in both karyotype and position of the ASGR on the ASGR-carrier chromosome among other apomictic Cenchrus/Pennisetum species. Using in silico transcript mapping and verification of physical positions of some of the transcripts via FISH, we discovered that the ASGR-carrier chromosome from P. squamulatum is collinear with chromosome 2 of foxtail millet and sorghum outside of the ASGR. The in silico ordering of the ASGR-carrier chromosome markers, previously unmapped in P. squamulatum, allowed for the identification of a backcross line with structural changes to the P. squamulatum ASGR-carrier chromosome derived from gamma irradiated pollen.
Sapkota, Sirjan; Conner, Joann A; Hanna, Wayne W; Simon, Bindu; Fengler, Kevin; Deschamps, Stéphane; Cigan, Mark; Ozias-Akins, Peggy
Apomixis, or clonal propagation through seed, is a trait identified within multiple species of the grass family (Poaceae). The genetic locus controlling apomixis in Pennisetum squamulatum (syn Cenchrus squamulatus) and Cenchrus ciliaris (syn Pennisetum ciliare, buffelgrass) is the apospory-specific genomic region (ASGR). Previously, the ASGR was shown to be highly conserved but inverted in marker order between P. squamulatum and C. ciliaris based on fluorescence in situ hybridization (FISH) and varied in both karyotype and position of the ASGR on the ASGR-carrier chromosome among other apomictic Cenchrus/Pennisetum species. Using in silico transcript mapping and verification of physical positions of some of the transcripts via FISH, we discovered that the ASGR-carrier chromosome from P. squamulatum is collinear with chromosome 2 of foxtail millet and sorghum outside of the ASGR. The in silico ordering of the ASGR-carrier chromosome markers, previously unmapped in P. squamulatum, allowed for the identification of a backcross line with structural changes to the P. squamulatum ASGR-carrier chromosome derived from gamma irradiated pollen.
Bakker, Freek T.; Lei, Di; Yu, Jiaying
Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...
Full Text Available Abstract Background The superfamily of serine proteinase inhibitors (serpins is involved in numerous fundamental biological processes as inflammation, blood coagulation and apoptosis. Our interest is focused on the SERPINA3 sub-family. The major human plasma protease inhibitor, α1-antichymotrypsin, encoded by the SERPINA3 gene, is homologous to genes organized in clusters in several mammalian species. However, although there is a similar genic organization with a high degree of sequence conservation, the reactive-centre-loop domains, which are responsible for the protease specificity, show significant divergences. Results We provide additional information by analyzing the situation of SERPINA3 in the bovine genome. A cluster of eight genes and one pseudogene sharing a high degree of identity and the same structural organization was characterized. Bovine SERPINA3 genes were localized by radiation hybrid mapping on 21q24 and only spanned over 235 Kilobases. For all these genes, we propose a new nomenclature from SERPINA3-1 to SERPINA3-8. They share approximately 70% of identity with the human SERPINA3 homologue. In the cluster, we described an original sub-group of six members with an unexpected high degree of conservation for the reactive-centre-loop domain, suggesting a similar peptidase inhibitory pattern. Preliminary expression analyses of these bovSERPINA3s showed different tissue-specific patterns and diverse states of glycosylation and phosphorylation. Finally, in the context of phylogenetic analyses, we improved our knowledge on mammalian SERPINAs evolution. Conclusion Our experimental results update data of the bovine genome sequencing, substantially increase the bovSERPINA3 sub-family and enrich the phylogenetic tree of serpins. We provide new opportunities for future investigations to approach the biological functions of this unusual subset of serine proteinase inhibitors.
Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan
An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To
Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...
The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...
Role of heteroplasmic mutations in the mitochondrial genome and the ID4 gene promoter methylation region in the pathogenesis of chronic aplastic anemia in patients suffering from Kidney yin deficiency.
Cui, Xing; Wang, Jing-Yi; Liu, Kui; Cui, Si-Yuan; Zhang, Jie; Luo, Ya-Qin; Wang, Xin
To analyze changes in gene amplification in the mitochondrial genome and in the ID4 gene promoter methylation region in patients with chronic aplastic anemia (CAA) suffering from Kidney (Shen) yin deficiency or Kidney yang deficiency. Bone marrow and oral epithelium samples were collected from CAA patients with Kidney yin deficiency or Kidney yang deficiency (20 cases). Bone marrow samples were collected from 20 healthy volunteers. The mitochondrial genome was amplified by polymerase chain reaction (PCR), and PCR products were used for sequencing and analysis. Higher mutational rates were observed in the ND1-2, ND4-6, and CYTB genes in CAA patients suffering from Kidney yin deficiency. Moreover, the ID4 gene was unmethylated in bone marrow samples from healthy individuals, but was methylated in some CAA patients suffering from Kidney yin deficiency (positive rate, 60%) and Kidney yang deficiency (positive rate, 55%). These data supported that gene mutations can alter the expression of respiratory chain enzyme complexes in CAA patients, resulting in energy metabolism impairment and promoting the physiological and pathological processes of hematopoietic failure. Functional impairment of the mitochondrial respiration chain induced by gene mutation may be an important reason for hematopoietic failure in patients with CAA. This change is closely related to maternal inheritance and Kidney yin deficiency. Finally, these data supported the assertion that it is easy to treat disease in patients suffering from yang deficiency and difficult to treat disease in patients suffering from yin deficiency.
Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R.; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R.; Mullikin, James C.; Meader, Stephen J.; Ponting, Chris P.; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E.; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M.; Fischer, Anne; Ptak, Susan E.; Lachmann, Michael; Symer, David E.; Mailund, Thomas; Schierup, Mikkel H.; Andrés, Aida M.; Kelso, Janet; Pääbo, Svante
Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1–4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other. PMID:22722832
Full Text Available Abstract Background Carcass fatness is an important trait in most pig breeding programs. Following market requests, breeding plans for fresh pork consumption are usually designed to reduce carcass fat content and increase lean meat deposition. However, the Italian pig industry is mainly devoted to the production of Protected Designation of Origin dry cured hams: pigs are slaughtered at around 160 kg of live weight and the breeding goal aims at maintaining fat coverage, measured as backfat thickness to avoid excessive desiccation of the hams. This objective has shaped the genetic pool of Italian heavy pig breeds for a few decades. In this study we applied a selective genotyping approach within a population of ~ 12,000 performance tested Italian Large White pigs. Within this population, we selectively genotyped 304 pigs with extreme and divergent backfat thickness estimated breeding value by the Illumina PorcineSNP60 BeadChip and performed a genome wide association study to identify loci associated to this trait. Results We identified 4 single nucleotide polymorphisms with P≤5.0E-07 and additional 119 ones with 5.0E-07 Conclusions Further investigations are needed to evaluate the effects of the identified single nucleotide polymorphisms associated with backfat thickness on other traits as a pre-requisite for practical applications in breeding programs. Reported results could improve our understanding of the biology of fat metabolism and deposition that could also be relevant for other mammalian species including humans, confirming the role of neuronal genes on obesity.
Albertin, Caroline B.; Bonnaud, Laure; Brown, C. Titus
The Cephalopod Sequencing Consortium (CephSeq Consortium) was established at a NESCent Catalysis Group Meeting, ``Paths to Cephalopod Genomics-Strategies, Choices, Organization,'' held in Durham, North Carolina, USA on May 24-27, 2012. Twenty-eight participants representing nine countries (Austria......, Australia, China, Denmark, France, Italy, Japan, Spain and the USA) met to address the pressing need for genome sequencing of cephalopod mollusks. This group, drawn from cephalopod biologists, neuroscientists, developmental and evolutionary biologists, materials scientists, bioinformaticians and researchers...... active in sequencing, assembling and annotating genomes, agreed on a set of cephalopod species of particular importance for initial sequencing and developed strategies and an organization (CephSeq Consortium) to promote this sequencing. The conclusions and recommendations of this meeting are described...
Biological and immunological characterization of recombinant Yellow Fever 17D Viruses expressing a Trypanosoma cruzi Amastigote Surface Protein-2 CD8+ T cell epitope at two distinct regions of the genome
Bonaldo Myrna C
Full Text Available Abstract Background The attenuated Yellow fever (YF 17D vaccine virus is one of the safest and most effective viral vaccines administered to humans, in which it elicits a polyvalent immune response. Herein, we used the YF 17D backbone to express a Trypanosoma cruzi CD8+ T cell epitope from the Amastigote Surface Protein 2 (ASP-2 to provide further evidence for the potential of this virus to express foreign epitopes. The TEWETGQI CD8+ T cell epitope was cloned and expressed based on two different genomic insertion sites: in the fg loop of the viral Envelope protein and the protease cleavage site between the NS2B and NS3. We investigated whether the site of expression had any influence on immunogenicity of this model epitope. Results Recombinant viruses replicated similarly to vaccine virus YF 17D in cell culture and remained genetically stable after several serial passages in Vero cells. Immunogenicity studies revealed that both recombinant viruses elicited neutralizing antibodies to the YF virus as well as generated an antigen-specific gamma interferon mediated T-cell response in immunized mice. The recombinant viruses displayed a more attenuated phenotype than the YF 17DD vaccine counterpart in mice. Vaccination of a mouse lineage highly susceptible to infection by T. cruzi with a homologous prime-boost regimen of recombinant YF viruses elicited TEWETGQI specific CD8+ T cells which might be correlated with a delay in mouse mortality after a challenge with a lethal dose of T. cruzi. Conclusions We conclude that the YF 17D platform is useful to express T. cruzi (Protozoan antigens at different functional regions of its genome with minimal reduction of vector fitness. In addition, the model T. cruzi epitope expressed at different regions of the YF 17D genome elicited a similar T cell-based immune response, suggesting that both expression sites are useful. However, the epitope as such is not protective and it remains to be seen whether expression
Sánchez-Betancourt, J I; Cervantes-Torres, J B; Saavedra-Montañez, M; Segura-Velázquez, R A
The aim of this study was to perform the complete genome sequence of a swine influenza A H1N2 virus strain isolated from a pig in Guanajuato, México (A/swine/Mexico/GtoDMZC01/2014) and to report its seroprevalence in 86 counties at the Central Bajio zone. To understand the evolutionary dynamics of the isolate, we undertook a phylogenetic analysis of the eight gene segments. These data revealed that the isolated virus is a reassortant H1N2 subtype, as its genes are derived from human (HA, NP, PA) and swine (M, NA, PB1, PB2 and NS) influenza viruses. Pig serum samples were analysed by the hemagglutination inhibition test, using wild H1N2 and H3N2 strains (A/swine/México/Mex51/2010 [H3N2]) as antigen sources. Positive samples to the H1N2 subtype were processed using the field-isolated H1N1 subtype (A/swine/México/Ver37/2010 [H1N1]). Seroprevalence to the H1N2 subtype was 26.74% in the sampled counties, being Jalisco the state with highest seroprevalence to this subtype (35.30%). The results herein reported demonstrate that this new, previously unregistered influenza virus subtype in México that shows internal genes from other swine viral subtypes isolated in the past 5 years, along with human virus-originated genes, is widely distributed in this area of the country. © 2017 Blackwell Verlag GmbH.
Sato, Shusei; Andersen, Stig Uggerhøj
The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...
Home; Journals; Resonance – Journal of Science Education; Volume 11; Issue 8. Comparative Genomics - A Powerful New Tool in Biology. Anand K Bachhawat. General Article Volume 11 Issue 8 August 2006 pp 22-40. Fulltext. Click here to view fulltext PDF. Permanent link:
Oct 19, 2009 ... sativa) and Chinese cabbage (Brassica rapa) genomes. The genome ... tant staple food for a large part of the world's human population. .... some banding region for selection and the overview panel shows the location of ...
Rodríguez-Blanco, Arturo; Lemos, Manuel L; Osorio, Carlos R
Integrating conjugative elements (ICEs) of the SXT/R391 family have been identified in fish-isolated bacterial strains collected from marine aquaculture environments of the northwestern Iberian Peninsula. Here we analysed the variable regions of two ICEs, one preliminarily characterised in a previous study (ICEVscSpa3) and one newly identified (ICEPspSpa1). Bacterial strains harboring these ICEs were phylogenetically assigned to Vibrio scophthalmi and Pseudoalteromonas sp., thus constituting the first evidence of SXT/R391-like ICEs in the genus Pseudoalteromonas to date. Variable DNA regions, which confer element-specific properties to ICEs of this family, were characterised. Interestingly, the two ICEs contained 29 genes not found in variable DNA insertions of previously described ICEs. Most notably, variable gene content for ICEVscSpa3 showed similarity to genes potentially involved in housekeeping functions of replication, nucleotide metabolism and transcription. For these genes, closest homologues were found clustered in the genome of Pseudomonas psychrotolerans L19, suggesting a transfer as a block to ICEVscSpa3. Genes encoding antibiotic resistance, restriction modification systems and toxin/antitoxin systems were absent from hotspots of ICEVscSpa3. In contrast, the variable gene content of ICEPspSpa1 included genes involved in restriction/modification functions in two different hotspots and genes related to ICE maintenance. The present study unveils a relatively large number of novel genes in SXT/R391-ICEs, and demonstrates the major role of ICE elements as contributors to horizontal gene transfer.
Cardone Maria Francesca
Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.
Construction of a high-density DArTseq SNP-based genetic map and identification of genomic regions with segregation distortion in a genetic population derived from a cross between feral and cultivated-type watermelon.
Ren, Runsheng; Ray, Rumiana; Li, Pingfang; Xu, Jinhua; Zhang, Man; Liu, Guang; Yao, Xiefeng; Kilian, Andrzej; Yang, Xingping
Watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai] is an economically important vegetable crop grown extensively worldwide. To facilitate the identification of agronomically important traits and provide new information for genetic and genomic research on this species, a high-density genetic linkage map of watermelon was constructed using an F2 population derived from a cross between elite watermelon cultivar K3 and wild watermelon germplasm PI 189225. Based on a sliding window approach, a total of 1,161 bin markers representing 3,465 SNP markers were mapped onto 11 linkage groups corresponding to the chromosome pair number of watermelon. The total length of the genetic map is 1,099.2 cM, with an average distance between bins of 1.0 cM. The number of markers in each chromosome varies from 62 in chromosome 07 to 160 in chromosome 05. The length of individual chromosomes ranged between 61.8 cM for chromosome 07 and 140.2 cM for chromosome 05. A total of 616 SNP bin markers showed significant (P watermelon cultivar K3 allele and 103 were skewed toward PI 189225. The number of SNPs and InDels per Mb varied considerably across the segregation distorted regions (SDRs) on each chromosome, and a mixture of dense and sparse SNPs and InDel SDRs coexisted on some chromosomes suggesting that SDRs were randomly distributed throughout the genome. Recombination rates varied greatly among each chromosome, from 2.0 to 4.2 centimorgans per megabase (cM/Mb). An inconsistency was found between the genetic and physical positions on the map for a segment on chromosome 11. The high-density genetic map described in the present study will facilitate fine mapping of quantitative trait loci, the identification of candidate genes, map-based cloning, as well as marker-assisted selection (MAS) in watermelon breeding programs.
Full Text Available Incoming papillomaviruses (PVs depend on mitotic nuclear envelope breakdown to gain initial access to the nucleus for viral transcription and replication. In our previous work, we hypothesized that the minor capsid protein L2 of PVs tethers the incoming vDNA to mitotic chromosomes to direct them into the nascent nuclei. To re-evaluate how dynamic L2 recruitment to cellular chromosomes occurs specifically during prometaphase, we developed a quantitative, microscopy-based assay for measuring the degree of chromosome recruitment of L2-EGFP. Analyzing various HPV16 L2 truncation-mutants revealed a central chromosome-binding region (CBR of 147 amino acids that confers binding to mitotic chromosomes. Specific mutations of conserved motifs (IVAL286AAAA, RR302/5AA, and RTR313EEE within the CBR interfered with chromosomal binding. Moreover, assembly-competent HPV16 containing the chromosome-binding deficient L2(RTR313EEE or L2(IVAL286AAAA were inhibited for infection despite their ability to be transported to intracellular compartments. Since vDNA and L2 were not associated with mitotic chromosomes either, the infectivity was likely impaired by a defect in tethering of the vDNA to mitotic chromosomes. However, L2 mutations that abrogated chromatin association also compromised translocation of L2 across membranes of intracellular organelles. Thus, chromatin recruitment of L2 may in itself be a requirement for successful penetration of the limiting membrane thereby linking both processes mechanistically. Furthermore, we demonstrate that the association of L2 with mitotic chromosomes is conserved among the alpha, beta, gamma, and iota genera of Papillomaviridae. However, different binding patterns point to a certain variance amongst the different genera. Overall, our data suggest a common strategy among various PVs, in which a central region of L2 mediates tethering of vDNA to mitotic chromosomes during cell division thereby coordinating membrane
Full Text Available A previous study provided an in-depth understanding of molecular population genetics of European and Asian wheat gene pools using a sequenced 3.1-Mb contig (ctg954 on chromosome 3BS. This region is believed to carry the Fhb1 gene for response to Fusarium head blight. In this study, 266 wheat accessions were evaluated in three environments for Type II FHB response based on the single floret inoculation method. Hierarchical clustering (UPGMA based on a Manhattan dissimilarity matrix divided the accessions into eight groups according to five FHB-related traits which have a high correlation between them; Group VIII comprised six accessions with FHB response levels similar to variety Sumai 3. Based on the compressed mixed linear model (MLM, association analysis between five FHB-related traits and 42 molecular markers along the 3.1-Mb region revealed 12 significant association signals at a threshold of P0.1 and P0.05 within each HapB at r(2>0.1 and P<0.001 showed significant differences between the Hap carried by FHB resistant resources, such as Sumai 3 and Wangshuibai, and susceptible genotypes in HapB3 and HapB6. These results suggest that Fhb1 is located within HapB6, with the possibility that another gene is located at or near HapB3. SSR markers and Haps detected in this study will be helpful in further understanding the genetic basis of FHB resistance, and provide useful information for marker-assisted selection of Fhb1 in wheat breeding.
Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun
Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712
Gurwitz, David; Bregman-Eschet, Yael
New companies offering personal whole-genome information services over the internet are dynamic and highly visible players in the personal genomics field. For fees currently ranging from US$399 to US$2500 and a vial of saliva, individuals can now purchase online access to their individual genetic information regarding susceptibility to a range of chronic diseases and phenotypic traits based on a genome-wide SNP scan. Most of the companies offering such services are based in the United States, but their clients may come from nearly anywhere in the world. Although the scientific validity, clinical utility and potential future implications of such services are being hotly debated, several ethical and regulatory questions related to direct-to-consumer (DTC) marketing strategies of genetic tests have not yet received sufficient attention. For example, how can we minimize the risk of unauthorized third parties from submitting other people's DNA for testing? Another pressing question concerns the ownership of (genotypic and phenotypic) information, as well as the unclear legal status of customers regarding their own personal information. Current legislation in the US and Europe falls short of providing clear answers to these questions. Until the regulation of personal genomics services catches up with the technology, we call upon commercial providers to self-regulate and coordinate their activities to minimize potential risks to individual privacy. We also point out some specific steps, along the trustee model, that providers of DTC personal genomics services as well as regulators and policy makers could consider for addressing some of the concerns raised below.
Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan
mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...
Myburg, Alexander A.; Grattapaglia, Dario; Tuskan, Gerald A.; Hellsten, Uffe; Hayes, Richard D.; Grimwood, Jane; Jenkins, Jerry; Lindquist, Erika; Tice, Hope; Bauer, Diane; Goodstein, David M.; Dubchak, Inna; Poliakov, Alexandre; Mizrachi, Eshchar; Kullan, Anand R. K.; Hussey, Steven G.; Pinard, Desre; van der Merwe, Karen; Singh, Pooja; van Jaarsveld, Ida; Silva-Junior, Orzenil B.; Togawa, Roberto C.; Pappas, Marilia R.; Faria, Danielle A.; Sansaloni, Carolina P.; Petroli, Cesar D.; Yang, Xiaohan; Ranjan, Priya; Tschaplinski, Timothy J.; Ye, Chu-Yu; Li, Ting; Sterck, Lieven; Vanneste, Kevin; Murat, Florent; Soler, Marçal; Clemente, Hélène San; Saidi, Naijib; Cassan-Wang, Hua; Dunand, Christophe; Hefer, Charles A.; Bornberg-Bauer, Erich; Kersting, Anna R.; Vining, Kelly; Amarasinghe, Vindhya; Ranik, Martin; Naithani, Sushma; Elser, Justin; Boyd, Alexander E.; Liston, Aaron; Spatafora, Joseph W.; Dharmwardhana, Palitha; Raja, Rajani; Sullivan, Christopher; Romanel, Elisson; Alves-Ferreira, Marcio; Külheim, Carsten; Foley, William; Carocha, Victor; Paiva, Jorge; Kudrna, David; Brommonschenkel, Sergio H.; Pasquali, Giancarlo; Byrne, Margaret; Rigault, Philippe; Tibbits, Josquin; Spokevicius, Antanas; Jones, Rebecca C.; Steane, Dorothy A.; Vaillancourt, René E.; Potts, Brad M.; Joubert, Fourie; Barry, Kerrie; Pappas, Georgios J.; Strauss, Steven H.; Jaiswal, Pankaj; Grima-Pettenati, Jacqueline; Salse, Jérôme; Van de Peer, Yves; Rokhsar, Daniel S.; Schmutz, Jeremy
Eucalypts are the world s most widely planted hardwood trees. Their broad adaptability, rich species diversity, fast growth and superior multipurpose wood, have made them a global renewable resource of fiber and energy that mitigates human pressures on natural forests. We sequenced and assembled >94% of the 640 Mbp genome of Eucalyptus grandis into its 11 chromosomes. A set of 36,376 protein coding genes were predicted revealing that 34% occur in tandem duplications, the largest proportion found thus far in any plant genome. Eucalypts also show the highest diversity of genes for plant specialized metabolism that act as chemical defence against biotic agents and provide unique pharmaceutical oils. Resequencing of a set of inbred tree genomes revealed regions of strongly conserved heterozygosity, likely hotspots of inbreeding depression. The resequenced genome of the sister species E. globulus underscored the high inter-specific genome colinearity despite substantial genome size variation in the genus. The genome of E. grandis is the first reference for the early diverging Rosid order Myrtales and is placed here basal to the Eurosids. This resource expands knowledge on the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.
Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427
Kerkhoven, R.; Enckevort, F.H.J. van; Boekhorst, J.; Molenaar, D; Siezen, R.J.
SUMMARY: A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a
Full Text Available The completion and release of the Brassica rapa genome is of great benefit to researchers of the Brassicas, Arabidopsis, and genome evolution. While its lineage is closely related to the model organism Arabidopsis thaliana, the Brassicas experienced a whole genome triplication subsequent to their divergence. This event contemporaneously created three copies of its ancestral genome, which had diploidized through the process of homeologous gene loss known as fractionation. By the fractionation of homeologous gene content and genetic regulatory binding sites, Brassica’s genome is well placed to use comparative genomic techniques to identify syntenic regions, homeologous gene duplications, and putative regulatory sequences. Here, we use the comparative genomics platform CoGe to perform several different genomic analyses with which to study structural changes of its genome and dynamics of various genetic elements. Starting with whole genome comparisons, the Brassica paleohexaploidy is characterized, syntenic regions with Arabidopsis thaliana are identified, and the TOC1 gene in the circadian rhythm pathway from Arabidopsis thaliana is used to find duplicated orthologs in Brassica rapa. These TOC1 genes are further analyzed to identify conserved noncoding sequences that contain cis-acting regulatory elements and promoter sequences previously implicated in circadian rhythmicity. Each 'cookbook style' analysis includes a step-by-step walkthrough with links to CoGe to quickly reproduce each step of the analytical process.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000
Zhang, Wei; Zhang, Mingyi; Zhu, Xianwen; Cao, Yaping; Sun, Qing; Ma, Guojia; Chao, Shiaoman; Yan, Changhui; Xu, Steven S; Cai, Xiwen
This work pinpointed the goatgrass chromosomal segment in the wheat B genome using modern cytogenetic and genomic technologies, and provided novel insights into the origin of the wheat B genome. Wheat is a typical allopolyploid with three homoeologous subgenomes (A, B, and D). The donors of the subgenomes A and D had been identified, but not for the subgenome B. The goatgrass Aegilops speltoides (genome SS) has been controversially considered a possible candidate for the donor of the wheat B genome. However, the relationship of the Ae. speltoides S genome with the wheat B genome remains largely obscure. The present study assessed the homology of the B and S genomes using an integrative cytogenetic and genomic approach, and revealed the contribution of Ae. speltoides to the origin of the wheat B genome. We discovered noticeable homology between wheat chromosome 1B and Ae. speltoides chromosome 1S, but not between other chromosomes in the B and S genomes. An Ae. speltoides-originated segment spanning a genomic region of approximately 10.46 Mb was detected on the long arm of wheat chromosome 1B (1BL). The Ae. speltoides-originated segment on 1BL was found to co-evolve with the rest of the B genome. Evidently, Ae. speltoides had been involved in the origin of the wheat B genome, but should not be considered an exclusive donor of this genome. The wheat B genome might have a polyphyletic origin with multiple ancestors involved, including Ae. speltoides. These novel findings will facilitate genome studies in wheat and other polyploids.
Jan 28, 2015 ... The mutational bias hypothesis predicted that genes in the GC-rich regions of the genome ... observed codon divided by its expected frequency at equilibrium. An RSCU value close to 1 indicates lack of bias, ..... study our results points to preferred usage of both C or G and A or T at the synonyms sites as ...
Der Sarkissian, Clio; Allentoft, Morten Erik; Avila Arcos, Maria del Carmen
throughput of next generation sequencing platforms and the ability to target short and degraded DNA molecules. Many ancient specimens previously unsuitable for DNA analyses because of extensive degradation can now successfully be used as source materials. Additionally, the analytical power obtained...... by increasing the number of sequence reads to billions effectively means that contamination issues that have haunted aDNA research for decades, particularly in human studies, can now be efficiently and confidently quantified. At present, whole genomes have been sequenced from ancient anatomically modern humans...
Oliveira Ribeiro, Ângela Maria; Foote, Andrew David; Kupczok, Anne
Marine ecosystems occupy 71% of the surface of our planet, yet we know little about their diversity. Although the inventory of species is continually increasing, as registered by the Census of Marine Life program, only about 10% of the estimated two million marine species are known. This lag......-throughput sequencing approaches have been helping to improve our knowledge of marine biodiversity, from the rich microbial biota that forms the base of the tree of life to a wealth of plant and animal species. In this review, we present an overview of the applications of genomics to the study of marine life, from...
Full Text Available Major advances in wheat production are needed to address global food insecurity under future climate conditions, such as high temperatures. The grain yield of bread wheat (Triticum aestivum L. is a quantitatively inherited complex trait that is strongly influenced by interacting genetic and environmental factors. Here, we conducted global QTL analysis for five yield-related traits, including spike yield, yield components and plant height (PH, in the Nongda3338/Jingdong6 doubled haploid (DH population using a high-density SNP and SSR-based genetic map. A total of 12 major genomic regions with stable QTL controlling yield-related traits were detected on chromosomes 1B, 2A, 2B, 2D, 3A, 4A, 4B, 4D, 5A, 6A, and 7A across 12 different field trials with timely sown (normal and late sown (heat stress conditions. Co-location of yield components revealed significant tradeoffs between thousand grain weight (TGW and grain number per spike (GNS on chromosome 4A. Dissection of a “QTL-hotspot” region for grain weight on chromosome 4B was helpful in marker-assisted selection (MAS breeding. Moreover, this study identified a novel QTL for heat susceptibility index of thousand grain weight (HSITGW on chromosome 4BL that explains approximately 10% of phenotypic variation. QPh.cau-4B.2, QPh.cau-4D.1 and QPh.cau-2D.3 were coincident with the dwarfing genes Rht1, Rht2, and Rht8, and haplotype analysis revealed their pleiotropic architecture with yield components. Overall, our findings will be useful for elucidating the genetic architecture of yield-related traits and developing new wheat varieties with high and stable yield.
Sørensen, Peter; Edwards, Stefan McKinnon; Rohde, Palle Duun
-additive genetic mechanisms. These modeling approaches have proven to be highly useful to determine population genetic parameters as well as prediction of genetic risk or value. We present a series of statistical modelling approaches that use prior biological information for evaluating the collective action......Whole-genome sequences and multiple trait phenotypes from large numbers of individuals will soon be available in many populations. Well established statistical modeling approaches enable the genetic analyses of complex trait phenotypes while accounting for a variety of additive and non...... regions and gene ontologies) that provide better model fit and increase predictive ability of the statistical model for this trait....
Liviu C. Andrei
Full Text Available Sustained development is a concept associating other concepts, in its turn, in the EU practice, e.g. regionalism, regionalizing and afferent policies, here including structural policies. This below text, dedicated to integration concepts, will limit on the other hand to regionalizing, otherwise an aspect typical to Europe and to the EU. On the other hand, two aspects come up to strengthen this field of ideas, i.e. the region (al-regionalism-(regional development triplet has either its own history or precise individual outline of terms.
Lynch, M; Blanchard, J L
It is well established on theoretical grounds that the accumulation of mildly deleterious mutations in nonrecombining genomes is a major extinction risk in obligately asexual populations. Sexual populations can also incur mutational deterioration in genomic regions that experience little or no recombination, i.e., autosomal regions near centromeres, Y chromosomes, and organelle genomes. Our results suggest, for a wide array of genes (transfer RNAs, ribosomal RNAs, and proteins) in a diverse collection of species (animals, plants, and fungi), an almost universal increase in the fixation probabilities of mildly deleterious mutations arising in mitochondrial and chloroplast genomes relative to those arising in the recombining nuclear genome. This enhanced width of the selective sieve in organelle genomes does not appear to be a consequence of relaxed selection, but can be explained by the decline in the efficiency of selection that results from the reduction of effective population size induced by uniparental inheritance. Because of the very low mutation rates of organelle genomes (on the order of 10(-4) per genome per year), the reduction in fitness resulting from mutation accumulation in such genomes is a very long-term process, not likely to imperil many species on time scales of less than a million years, but perhaps playing some role in phylogenetic lineage sorting on time scales of 10 to 100 million years.
Choi, Jae Young; Bubnell, Jaclyn E.; Aquadro, Charles F.
Coevolution between Drosophila and its endosymbiont Wolbachia pipientis has many intriguing aspects. For example, Drosophila ananassae hosts two forms of W. pipientis genomes: One being the infectious bacterial genome and the other integrated into the host nuclear genome. Here, we characterize the infectious and integrated genomes of W. pipientis infecting D. ananassae (wAna), by genome sequencing 15 strains of D. ananassae that have either the infectious or integrated wAna genomes. Results indicate evolutionarily stable maternal transmission for the infectious wAna genome suggesting a relatively long-term coevolution with its host. In contrast, the integrated wAna genome showed pseudogene-like characteristics accumulating many variants that are predicted to have deleterious effects if present in an infectious bacterial genome. Phylogenomic analysis of sequence variation together with genotyping by polymerase chain reaction of large structural variations indicated several wAna variants among the eight infectious wAna genomes. In contrast, only a single wAna variant was found among the seven integrated wAna genomes examined in lines from Africa, south Asia, and south Pacific islands suggesting that the integration occurred once from a single infectious wAna genome and then spread geographically. Further analysis revealed that for all D. ananassae we examined with the integrated wAna genomes, the majority of the integrated wAna genomic regions is represented in at least two copies suggesting a double integration or single integration followed by an integrated genome duplication. The possible evolutionary mechanism underlying the widespread geographical presence of the duplicate integration of the wAna genome is an intriguing question remaining to be answered. PMID:26254486
Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long
In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.
Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael
Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Fernandez-Lozano, Carlos; Lopez-Campos, Guillermo; Seoane, Jose A; Lopez-Alonso, Victoria; Dorado, Julian; Martín-Sanchez, Fernando; Pazos, Alejandro
The development of personalized medicine is tightly linked with the correct exploitation of molecular data, especially those associated with the genome sequence along with these use of genomic data there is an increasing demand to share these data for research purposes. Transition of clinical data to research is based in the anonymization of these data so the patient cannot be identified, the use of genomic data poses a great challenge because its nature of identifying data. In this work we have analyzed current methods for genome anonymization and propose a one way encryption method that may enable the process of genomic data sharing accessing only to certain regions of genomes for research purposes.
Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc
Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...
Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Kooij, Taco W.A.
The aim of the studies described in this thesis was to investigate the genome organization of rodent malaria parasites (RMPs) and compare the organization and gene content of the genomes of RMPs and the human malaria parasite P. falciparum. The release of the complete genome sequence of P.
Funding Opportunity CCG, Funding Opportunity Center for Cancer Genomics, CCG, Center for Cancer Genomics, CCG RFA, Center for cancer genomics rfa, genomic data analysis network, genomic data analysis network centers,
Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai
Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps.
Vilella Albert J
Full Text Available Abstract Background DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. Results We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i exhaustive population-genetic analyses including those based on the coalescent theory; ii analysis adapted to the shallow data generated by the high-throughput genome projects; iii use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v visualization of the results integrated with current genome annotations in commonly available genome browsers. Conclusion VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Flores, Margarita; Morales, Lucía; Gonzaga-Jauregui, Claudia
Several lines of evidence suggest that reiterated sequences in the human genome are targets for nonallelic homologous recombination (NAHR), which facilitates genomic rearrangements. We have used a PCR-based approach to identify breakpoint regions of rearranged structures in the human genome...... to human genomic variation is discussed........ In particular, we have identified intrachromosomal identical repeats that are located in reverse orientation, which may lead to chromosomal inversions. A bioinformatic workflow pathway to select appropriate regions for analysis was developed. Three such regions overlapping with known human genes, located...
Flannery, Maura C.
Points out the importance of genomes other than the human genome project and provides information on the identified bacterial genomes Pseudomonas aeuroginosa, Leprosy, Cholera, Meningitis, Tuberculosis, Bubonic Plague, and plant pathogens. Considers the computer's use in genome studies. (Contains 14 references.) (YDS)
Evans, Ben J; Upham, Nathan S; Golding, Goeffrey B; Ojeda, Ricardo A; Ojeda, Agustina A
The genome of the red vizcacha rat (Rodentia, Octodontidae, Tympanoctomys barrerae) is the largest of all mammals, and about double the size of their close relative, the mountain vizcacha rat Octomys mimax, even though the lineages that gave rise to these species diverged from each other only about 5 Ma. The mechanism for this rapid genome expansion is controversial, and hypothesized to be a consequence of whole genome duplication or accumulation of repetitive elements. To test these alternative but nonexclusive hypotheses, we gathered and evaluated evidence from whole transcriptome and whole genome sequences of T. barrerae and O. mimax. We recovered support for genome expansion due to accumulation of a diverse assemblage of repetitive elements, which represent about one half and one fifth of the genomes of T. barrerae and O. mimax, respectively, but we found no strong signal of whole genome duplication. In both species, repetitive sequences were rare in transcribed regions as compared with the rest of the genome, and mostly had no close match to annotated repetitive sequences from other rodents. These findings raise new questions about the genomic dynamics of these repetitive elements, their connection to widespread chromosomal fissions that occurred in the T. barrerae ancestor, and their fitness effects-including during the evolution of hypersaline dietary tolerance in T. barrerae. ©The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Sukhamrit Kaur; Sandeep Kaur
Abstract Genomics is study of genome which provides large amount of data for which large storage and computation power is needed. These issues are solved by cloud computing that provides various cloud platforms for genomics. These platforms provides many services to user like easy access to data easy sharing and transfer providing storage in hundreds of terabytes more computational power. Some cloud platforms are Google genomics DNAnexus and Globus genomics. Various features of cloud computin...
The nucleolar organizing regions (NORs), a few telomeres, most centromeric regions and numerous interstitial sites were detected. The signals in small genomes were relatively sparse and unevenly distributed along chromosomes, whereas those in large genomes were dense and basically evenly distributed.
Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín
Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org.
Grigoriev, Igor V.
Genomes of energy and environment fungi are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 50 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such 'parts' suggested by comparative genomics and functional analysis in these areas are presented here
Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 150 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.
Full Text Available Endometriosis is a complex and multifactorial disease. Chromosomal imbalance screening in endometriotic tissue can be used to detect hot-spot regions in the search for a possible genetic marker for endometriosis. The objective of the present study was to detect chromosomal imbalances by comparative genomic hybridization (CGH in ectopic tissue samples from ovarian endometriomas and eutopic tissue from the same patients. We evaluated 10 ovarian endometriotic tissues and 10 eutopic endometrial tissues by metaphase CGH. CGH was prepared with normal and test DNA enzymatically digested, ligated to adaptors and amplified by PCR. A second PCR was performed for DNA labeling. Equal amounts of both normal and test-labeled DNA were hybridized in human normal metaphases. The Isis FISH Imaging System V 5.0 software was used for chromosome analysis. In both eutopic and ectopic groups, 4/10 samples presented chromosomal alterations, mainly chromosomal gains. CGH identified 11q12.3-q13.1, 17p11.1-p12, 17q25.3-qter, and 19p as critical regions. Genomic imbalances in 11q, 17p, 17q, and 19p were detected in normal eutopic and/or ectopic endometrium from women with ovarian endometriosis. These regions contain genes such as POLR2G, MXRA7 and UBA52 involved in biological processes that may lead to the establishment and maintenance of endometriotic implants. This genomic imbalance may affect genes in which dysregulation impacts both eutopic and ectopic endometrium.
Full Text Available Abstract Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any
John C Chambers
Full Text Available The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.
Full Text Available Abstract Genomics is study of genome which provides large amount of data for which large storage and computation power is needed. These issues are solved by cloud computing that provides various cloud platforms for genomics. These platforms provides many services to user like easy access to data easy sharing and transfer providing storage in hundreds of terabytes more computational power. Some cloud platforms are Google genomics DNAnexus and Globus genomics. Various features of cloud computing to genomics are like easy access and sharing of data security of data less cost to pay for resources but still there are some demerits like large time needed to transfer data less network bandwidth.
Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael
Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.
Labbe, Jessy L [ORNL; Uehling, Jessie K [ORNL; Payen, Thibaut [INRA; Plett, Jonathan [University of Western Sydney, Australia
The last 10 years have seen the cost of sequencing complete genomes decrease at an incredible speed. This has led to an increase in the number of genomes sequenced in all the fungal tree of life as well as a wide variety of plant genomes. The increase in sequencing has permitted us to study the evolution of organisms on a genomic scale. A number of talks during the conference discussed the importance of transposable elements (TEs) that are present in almost all species of fungi. These TEs represent an especially large percentage of genomic space in fungi that interact with plants. Thierry Rouxel (INRA, Nancy, France) showed the link between speciation in the Leptosphaeria complex and the expansion of TE families. For example in the Leptosphaeria complex, one species associated with oilseed rape has experienced a recent and massive burst of movement by a few TE families. The alterations caused by these TEs took place in discrete regions of the genome leading to shuffling of the genomic landscape and the appearance of genes specific to the species, such as effectors useful for the interactions with a particular plant (Rouxel et al., 2011). Other presentations showed the importance of TEs in affecting genome organization. For example, in Amanita different species appear to have been invaded by different TE families (Veneault-Fourrey & Martin, 2011).
Raftis, Emma J.; Salvetti, Elisa; Torriani, Sandra; Felis, Giovanna E.; O'Toole, Paul W.
Strains of Lactobacillus salivarius are increasingly employed as probiotic agents for humans or animals. Despite the diversity of environmental sources from which they have been isolated, the genomic diversity of L. salivarius has been poorly characterized, and the implications of this diversity for strain selection have not been examined. To tackle this, we applied comparative genomic hybridization (CGH) and multilocus sequence typing (MLST) to 33 strains derived from humans, animals, or food. The CGH, based on total genome content, including small plasmids, identified 18 major regions of genomic variation, or hot spots for variation. Three major divisions were thus identified, with only a subset of the human isolates constituting an ecologically discernible group. Omission of the small plasmids from the CGH or analysis by MLST provided broadly concordant fine divisions and separated human-derived and animal-derived strains more clearly. The two gene clusters for exopolysaccharide (EPS) biosynthesis corresponded to regions of significant genomic diversity. The CGH-based groupings of these regions did not correlate with levels of production of bound or released EPS. Furthermore, EPS production was significantly modulated by available carbohydrate. In addition to proving difficult to predict from the gene content, EPS production levels correlated inversely with production of biofilms, a trait considered desirable in probiotic commensals. L. salivarius displays a high level of genomic diversity, and while selection of L. salivarius strains for probiotic use can be informed by CGH or MLST, it also requires pragmatic experimental validation of desired phenotypic traits. PMID:21131523
This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies.
The Genomic Data Commons (GDC), a unified data system that promotes sharing of genomic and clinical data between researchers, launched today with a visit from Vice President Joe Biden to the operations center at the University of Chicago.
U.S. Department of Health & Human Services — The Rat Genome Database (RGD) is a collaborative effort between leading research institutions involved in rat genetic and genomic research to collect, consolidate,...
Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J
A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV
Xavier, Alencar; Xu, Shizhong; Muir, William; Rainey, Katy Martin
Background Genome-wide assisted selection is a critical tool for the?genetic improvement of plants and animals. Whole-genome regression models in Bayesian framework represent the main family of prediction methods. Fitting such models with a large number of observations involves a prohibitive computational burden. We propose the use of subsampling bootstrap Markov chain in genomic prediction. Such method consists of fitting whole-genome regression models by subsampling observations in each rou...
Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav
Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).
Lee, Hayan; Schatz, Michael C
Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net
Hybridization and chromosome doubling entailed by allopolyploidization requires genetic and epigenetic modifications, resulting in the adjustment of different genomes to the same nuclear environment. Recently, the main role of retrotransposon/microsatellite-rich regions of the genome in DNA sequenc...
Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms...
Full Text Available The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC, only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.
Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T
Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal
Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng
Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.
Kirkegaard, Rasmus Hansen; Karst, Søren Michael; Nielsen, Per Halkjær
The advantages of “next generation sequencing” has come at the cost of genome finishing. The dominant sequencing technology provides short reads of 150-300 bp, which has made genome assembly very difficult as the reads do not span important repeat regions. Genomes have thus been added...... to the databases as fragmented assemblies and not as finished contigs that resemble the chromosomes in which the DNA is organised within the cells. This is especially troublesome for genomes derived from complex metagenome sequencing. Databases with incomplete genomes can lead to false conclusions about...... the absence of genes and functional predictions of the organisms. Furthermore, it is common that repetitive elements and marker genes such as the 16S rRNA gene are missing completely from these genome bins. Using nanopore long reads, we demonstrate that it is possible to span these regions and make complete...
Pfeifer, Matthias; Kugler, Karl G; Sandve, Simen R; Zhan, Bujie; Rudi, Heidi; Hvidsten, Torgeir R; Mayer, Klaus F X; Olsen, Odd-Arne
Allohexaploid bread wheat (Triticum aestivum L.) provides approximately 20% of calories consumed by humans. Lack of genome sequence for the three homeologous and highly similar bread wheat genomes (A, B, and D) has impeded expression analysis of the grain transcriptome. We used previously unknown genome information to analyze the cell type-specific expression of homeologous genes in the developing wheat grain and identified distinct co-expression clusters reflecting the spatiotemporal progression during endosperm development. We observed no global but cell type- and stage-dependent genome dominance, organization of the wheat genome into transcriptionally active chromosomal regions, and asymmetric expression in gene families related to baking quality. Our findings give insight into the transcriptional dynamics and genome interplay among individual grain cell types in a polyploid cereal genome. Copyright © 2014, American Association for the Advancement of Science.
Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun
The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979
Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M
The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.
Raghuraman, M K; Winzeler, E A; Collingwood, D; Hunt, S; Wodicka, L; Conway, A; Lockhart, D J; Davis, R W; Brewer, B J; Fangman, W L
Oligonucleotide microarrays were used to map the detailed topography of chromosome replication in the budding yeast Saccharomyces cerevisiae. The times of replication of thousands of sites across the genome were determined by hybridizing replicated and unreplicated DNAs, isolated at different times in S phase, to the microarrays. Origin activations take place continuously throughout S phase but with most firings near mid-S phase. Rates of replication fork movement vary greatly from region to region in the genome. The two ends of each of the 16 chromosomes are highly correlated in their times of replication. This microarray approach is readily applicable to other organisms, including humans.
Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops
Kurobe, Tomofumi; Gatherer, Derek; Cunningham, Charles; Korf, Ian; Fukuda, Hideo; Hedrick, Ronald P.; Waltzek, Thomas B.
Three alloherpesviruses are known to cause disease in cyprinid fish: cyprinid herpesviruses 1 and 3 (CyHV1 and CyHV3) in common carp and koi and cyprinid herpesvirus 2 (CyHV2) in goldfish. We have determined the genome sequences of CyHV1 and CyHV2 and compared them with the published CyHV3 sequence. The CyHV1 and CyHV2 genomes are 291,144 and 290,304 bp, respectively, in size, and thus the CyHV3 genome, at 295,146 bp, remains the largest recorded among the herpesviruses. Each of the three genomes consists of a unique region flanked at each terminus by a sizeable direct repeat. The CyHV1, CyHV2, and CyHV3 genomes are predicted to contain 137, 150, and 155 unique, functional protein-coding genes, respectively, of which six, four, and eight, respectively, are duplicated in the terminal repeat. The three viruses share 120 orthologous genes in a largely colinear arrangement, of which up to 55 are also conserved in the other member of the genus Cyprinivirus, anguillid herpesvirus 1. Twelve genes are conserved convincingly in all sequenced alloherpesviruses, and two others are conserved marginally. The reference CyHV3 strain has been reported to contain five fragmented genes that are presumably nonfunctional. The CyHV2 strain has two fragmented genes, and the CyHV1 strain has none. CyHV1, CyHV2, and CyHV3 have five, six, and five families of paralogous genes, respectively. One family unique to CyHV1 is related to cellular JUNB, which encodes a transcription factor involved in oncogenesis. To our knowledge, this is the first time that JUNB-related sequences have been reported in a herpesvirus. PMID:23269803
Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching
The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314
Sun, Yujiao; Lu, Taofeng; Sun, Zhaohui; Guan, Weijun; Liu, Zhensheng; Teng, Liwei; Wang, Shuo; Ma, Yuehui
In this study, the complete mitochondrial genome of Siberian tiger (Panthera tigris altaica) was sequenced, using muscle tissue obtained from a male wild tiger. The total length of the mitochondrial genome is 16,996 bp. The genome structure of this tiger is in accordance with other Siberian tigers and it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes, and 1 control region.
Full Text Available We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000.
Fadista, João; Thomsen, Bo; Holm, Lars-Erik
to genetic variation in cattle. Results We designed and used a set of NimbleGen CGH arrays that tile across the assayable portion of the cattle genome with approximately 6.3 million probes, at a median probe spacing of 301 bp. This study reports the highest resolution map of copy number variation...... in the cattle genome, with 304 CNV regions (CNVRs) being identified among the genomes of 20 bovine samples from 4 dairy and beef breeds. The CNVRs identified covered 0.68% (22 Mb) of the genome, and ranged in size from 1.7 to 2,031 kb (median size 16.7 kb). About 20% of the CNVs co-localized with segmental...... duplications, while 30% encompass genes, of which the majority is involved in environmental response. About 10% of the human orthologous of these genes are associated with human disease susceptibility and, hence, may have important phenotypic consequences. Conclusions Together, this analysis provides a useful...
Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J
Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.
Thomma, Bart P H J; Seidl, Michael F; Shi-Kunne, Xiaoqian; Cook, David E; Bolton, Melvin D; van Kan, Jan A L; Faino, Luigi
Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including those that study non-model organisms. Thus, hundreds of fungal genomes have been sequenced and are publically available today, although these initiatives have typically yielded considerably fragmented genome assemblies that often lack large contiguous genomic regions. Many important genomic features are contained in intergenic DNA that is often missing in current genome assemblies, and recent studies underscore the significance of non-coding regions and repetitive elements for the life style, adaptability and evolution of many organisms. The study of particular types of genetic elements, such as telomeres, centromeres, repetitive elements, effectors, and clusters of co-regulated genes, but also of phenomena such as structural rearrangements, genome compartmentalization and epigenetics, greatly benefits from having a contiguous and high-quality, preferably even complete and gapless, genome assembly. Here we discuss a number of important reasons to produce gapless, finished, genome assemblies to help answer important biological questions. Copyright © 2015 Elsevier Inc. All rights reserved.
Li, Jing; Chen, Chen; Wang, Zhe-Zhi
Complete chloroplast genome sequence is very useful for studying the phylogenetic and evolution of species. In this study, the complete chloroplast genome of Dendrobium strongylanthum was constructed from whole-genome Illumina sequencing data. The chloroplast genome is 153 058 bp in length with 37.6% GC content and consists of two inverted repeats (IRs) of 26 316 bp. The IR regions are separated by large single-copy region (LSC, 85 836 bp) and small single-copy (SSC, 14 590 bp) region. A total of 130 chloroplast genes were successfully annotated, including 84 protein coding genes, 38 tRNA genes, and eight rRNA genes. Phylogenetic analyses showed that the chloroplast genome of Dendrobium strongylanthum is related to that of the Dendrobium officinal.
Fu, Songzhe; Octavia, Sophie; Tanaka, Mark M.; Sintchenko, Vitali
Salmonella enterica serovar Typhimurium is the most common Salmonella serovar causing foodborne infections in Australia and many other countries. Twenty-one S. Typhimurium strains from Salmonella reference collection A (SARA) were analyzed using Illumina high-throughput genome sequencing. Single nucleotide polymorphisms (SNPs) in 21 SARA strains ranged from 46 to 11,916 SNPs, with an average of 1,577 SNPs per strain. Together with 47 strains selected from publicly available S. Typhimurium genomes, the S. Typhimurium core genes (STCG) were determined. The STCG consist of 3,846 genes, a set that is much larger than that of the 2,882 Salmonella core genes (SCG) found previously. The STCG together with 1,576 core intergenic regions (IGRs) were defined as the S. Typhimurium core genome. Using 93 S. Typhimurium genomes from 13 epidemiologically confirmed community outbreaks, we demonstrated that typing based on the S. Typhimurium core genome (STCG plus core IGRs) provides superior resolution and higher discriminatory power than that based on SCG for outbreak investigation and molecular epidemiology of S. Typhimurium. STCG and STCG plus core IGR typing achieved 100% separation of all outbreaks compared to that of SCG typing, which failed to separate isolates from two outbreaks from background isolates. Defining the S. Typhimurium core genome allows standardization of genes/regions to be used for high-resolution epidemiological typing and genomic surveillance of S. Typhimurium. PMID:26019201
Full Text Available Qat (Catha edulis, Celastraceae is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA genes, 8 ribosomal RNA (rRNA genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae.
Tembrock, Luke R.; Zheng, Shaoyu; Wu, Zhiqiang
Qat (Catha edulis, Celastraceae) is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp) genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae. PMID:29425128
Watson, J D; Cook-Deegan, R M
The Human Genome Project has become a reality. Building on a debate that dates back to 1985, several genome projects are now in full stride around the world, and more are likely to form in the next several years. Italy began its genome program in 1987, and the United Kingdom and U.S.S.R. in 1988. The European communities mounted several genome projects on yeast, bacteria, Drosophila, and Arabidospis thaliana (a rapidly growing plant with a small genome) in 1988, and in 1990 commenced a new 2-year program on the human genome. In the United States, we have completed the first year of operation of the National Center for Human Genome Research at the National Institutes of Health (NIH), now the largest single funding source for genome research in the world. There have been dedicated budgets focused on genome-scale research at NIH, the U.S. Department of Energy, and the Howard Hughes Medical Institute for several years, and results are beginning to accumulate. There were three annual meetings on genome mapping and sequencing at Cold Spring Harbor, New York, in the spring of 1988, 1989, and 1990; the talks have shifted from a discussion about how to approach problems to presenting results from experiments already performed. We have finally begun to work rather than merely talk. The purpose of genome projects is to assemble data on the structure of DNA in human chromosomes and those of other organisms. A second goal is to develop new technologies to perform mapping and sequencing. There have been impressive technical advances in the past 5 years since the debate about the human genome project began. We are on the verge of beginning pilot projects to test several approaches to sequencing long stretches of DNA, using both automation and manual methods. Ordered sets of yeast artificial chromosome and cosmid clones have been assembled to span more than 2 million base pairs of several human chromosomes, and a region of 10 million base pairs has been assembled for
Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.
Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP ? subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled an...
Aylward, Janneke; Steenkamp, Emma T; Dreyer, Léanne L; Roets, Francois; Wingfield, Brenda D; Wingfield, Michael J
The majority of plant pathogens are fungi and many of these adversely affect food security. This mini-review aims to provide an analysis of the plant pathogenic fungi for which genome sequences are publically available, to assess their general genome characteristics, and to consider how genomics has impacted plant pathology. A list of sequenced fungal species was assembled, the taxonomy of all species verified, and the potential reason for sequencing each of the species considered. The genomes of 1090 fungal species are currently (October 2016) in the public domain and this number is rapidly rising. Pathogenic species comprised the largest category (35.5 %) and, amongst these, plant pathogens are predominant. Of the 191 plant pathogenic fungal species with available genomes, 61.3 % cause diseases on food crops, more than half of which are staple crops. The genomes of plant pathogens are slightly larger than those of other fungal species sequenced to date and they contain fewer coding sequences in relation to their genome size. Both of these factors can be attributed to the expansion of repeat elements. Sequenced genomes of plant pathogens provide blueprints from which potential virulence factors were identified and from which genes associated with different pathogenic strategies could be predicted. Genome sequences have also made it possible to evaluate adaptability of pathogen genomes and genomic regions that experience selection pressures. Some genomic patterns, however, remain poorly understood and plant pathogen genomes alone are not sufficient to unravel complex pathogen-host interactions. Genomes, therefore, cannot replace experimental studies that can be complex and tedious. Ultimately, the most promising application lies in using fungal plant pathogen genomics to inform disease management and risk assessment strategies. This will ultimately minimize the risks of future disease outbreaks and assist in preparation for emerging pathogen outbreaks.
Sharp, Andrew J; Hansen, Sierra; Selzer, Rebecca R; Cheng, Ze; Regan, Regina; Hurst, Jane A; Stewart, Helen; Price, Sue M; Blair, Edward; Hennekam, Raoul C; Fitzpatrick, Carrie A; Segraves, Rick; Richmond, Todd A; Guiver, Cheryl; Albertson, Donna G; Pinkel, Daniel; Eis, Peggy S; Schwartz, Stuart; Knight, Samantha J L; Eichler, Evan E
Genomic disorders are characterized by the presence of flanking segmental duplications that predispose these regions to recurrent rearrangement. Based on the duplication architecture of the genome, we investigated 130 regions that we hypothesized as candidates for previously undescribed genomic disorders. We tested 290 individuals with mental retardation by BAC array comparative genomic hybridization and identified 16 pathogenic rearrangements, including de novo microdeletions of 17q21.31 found in four individuals. Using oligonucleotide arrays, we refined the breakpoints of this microdeletion, defining a 478-kb critical region containing six genes that were deleted in all four individuals. We mapped the breakpoints of this deletion and of four other pathogenic rearrangements in 1q21.1, 15q13, 15q24 and 17q12 to flanking segmental duplications, suggesting that these are also sites of recurrent rearrangement. In common with the 17q21.31 deletion, these breakpoint regions are sites of copy number polymorphism in controls, indicating that these may be inherently unstable genomic regions.
Full Text Available The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR mediated by low-copy repeats (LCRs. Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ~1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR-mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.
Catacchio, Claudia Rita; Maggiolini, Flavia Angela Maria; D'Addabbo, Pietro; Bitonto, Miriana; Capozzi, Oronzo; Signorile, Martina Lepore; Miroballo, Mattia; Archidiacono, Nicoletta; Eichler, Evan E; Ventura, Mario; Antonacci, Francesca
For many years, inversions have been proposed to be a direct driving force in speciation since they suppress recombination when heterozygous. Inversions are the most common large-scale differences among humans and great apes. Nevertheless, they represent large events easily distinguishable by classical cytogenetics, whose resolution, however, is limited. Here, we performed a genome-wide comparison between human, great ape, and macaque genomes using the net alignments for the most recent releases of genome assemblies. We identified a total of 156 putative inversions, between 103 kb and 91 Mb, corresponding to 136 human loci. Combining literature, sequence, and experimental analyses, we analyzed 109 of these loci and found 67 regions inverted in one or multiple primates, including 28 newly identified inversions. These events overlap with 81 human genes at their breakpoints, and seven correspond to sites of recurrent rearrangements associated with human disease. This work doubles the number of validated primate inversions larger than 100 kb, beyond what was previously documented. We identified 74 sites of errors, where the sequence has been assembled in the wrong orientation, in the reference genomes analyzed. Our data serve two purposes: First, we generated a map of evolutionary inversions in these genomes representing a resource for interrogating differences among these species at a functional level; second, we provide a list of misassembled regions in these primate genomes, involving over 300 Mb of DNA and 1978 human genes. Accurately annotating these regions in the genome references has immediate applications for evolutionary and biomedical studies on primates. © 2018 Catacchio et al.; Published by Cold Spring Harbor Laboratory Press.
Li, Ruiqiang; Li, Yingrui; Zheng, Hancheng
analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing...
Chen, X.; IJkel, W.F.J.; Tarchini, R.; Sun, X.; Sandbrink, H.; Wang, H.; Peters, S.; Zuidema, D.; Klein Lankhorst, R.; Vlak, J.M.; Hu, Z.
The nucleotide sequence of the Helicoverpa armigera single-nucleocapsid nucleopolyhedrovirus (HaSNPV) DNA genome was determined and analysed. The circular genome encompasses 131 403 bp, has a G C content of 39.1 molnd contains five homologous regions with a unique pattern of repeats.
Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning
The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover
Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...
Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun
A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.
CERN. Geneva; Deutsch, Sam; Michielin, Olivier; Thomas, Arthur; Descombes, Patrick
Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological ...
Poke, Fiona S; Vaillancourt, René E; Potts, Brad M; Reid, James B
Eucalyptus L'Hérit. is a genus comprised of more than 700 species that is of vital importance ecologically to Australia and to the forestry industry world-wide, being grown in plantations for the production of solid wood products as well as pulp for paper. With the sequencing of the genomes of Arabidopsis thaliana and Oryza sativa and the recent completion of the first tree genome sequence, Populus trichocarpa, attention has turned to the current status of genomic research in Eucalyptus. For several eucalypt species, large segregating families have been established, high-resolution genetic maps constructed and large EST databases generated. Collaborative efforts have been initiated for the integration of diverse genomic projects and will provide the framework for future research including exploiting the sequence of the entire eucalypt genome which is currently being sequenced. This review summarises the current position of genomic research in Eucalyptus and discusses the direction of future research.
Sun, Siyang; Rao, Venigalla B.; Rossmann, Michael G.
Genome packaging is a fundamental process in a viral life cycle. Many viruses assemble preformed capsids into which the genomic material is subsequently packaged. These viruses use a packaging motor protein that is driven by the hydrolysis of ATP to condense the nucleic acids into a confined space. How these motor proteins package viral genomes had been poorly understood until recently, when a few X-ray crystal structures and cryo-electron microscopy structures became available. Here we discu...
Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.
Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves. PMID:25324969
Langie, Sabine A S; Koppen, Gudrun; Desaulniers, Daniel
function, chromosome segregation, telomere length). The purpose of this review is to describe the crucial aspects of genome instability, to outline the ways in which environmental chemicals can affect this cancer hallmark and to identify candidate chemicals for further study. The overall aim is to make......Genome instability is a prerequisite for the development of cancer. It occurs when genome maintenance systems fail to safeguard the genome's integrity, whether as a consequence of inherited defects or induced via exposure to environmental agents (chemicals, biological agents and radiation). Thus...
The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scale genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.
Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X
The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.
Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Kalkatawi, Manal M.
Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally
Duchaud, Eric; Rochat, Tatiana; Habib, Christophe
genome accounting for similar to 80% of the genes in each genome. The pan-genome seems nevertheless "open" according to the scaling exponent of a power-law fitted on the rate of new gene discovery when genomes are added one-by-one. Recombination is a key component of the evolutionary process...... of recombination and mutations to nucleotide-level differentiation (r/m) was estimated to similar to 13. Within CC-ST10, evolutionary distances computed on non-recombined regions and comparisons between 22 isolates sampled up to 27 years apart suggest a most recent common ancestor in the second half...
Homologous sequences were used in the definition of synteny relationships and subsequent identification of the shared disease response genes. The homologous genes within the human genome were then identified and aligned to the bovine radiation hybrid map in order to identify the mouse/bovine homologous regions.
Moraes, Fernanda; Góes, Andréa
The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
Full Text Available Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I and S. pasteurianus ATCC 43144 (biotype II.2. The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92% and 1607 (86% of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops.
Li, Huie; Guo, Qiqiang
The complete chloroplast (cp) genome of the Sinopodophyllum hexandrum (Berberidaceae) was determined in this study. The circular genome is 157,940 bp in size, and comprises a pair of inverted repeat (IR) regions of 26,077 bp each, a large single-copy (LSC) region of 86,460 bp and a small single-copy (SSC) region of 19,326 bp. The GC content of the whole cp genome was 38.5%. A total of 133 genes were identified, including 88 protein-coding genes, 37 tRNA genes and eight rRNA genes. The whole cp genome consists of 114 unique genes, and 19 genes are duplicated in the IR regions. The phylogenetic analysis revealed that S. hexandrum is closely related to Nandina domestica within the family Berberidaceae.
Shade Larry L
Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.
Chakravarthi, Balabhadrapatruni V S K; Nepal, Saroj; Varambally, Sooryanarayana
Multiple genetic and epigenetic events characterize tumor progression and define the identity of the tumors. Advances in high-throughput technologies, like gene expression profiling, next-generation sequencing, proteomics, and metabolomics, have enabled detailed molecular characterization of various tumors. The integration and analyses of these high-throughput data have unraveled many novel molecular aberrations and network alterations in tumors. These molecular alterations include multiple cancer-driving mutations, gene fusions, amplification, deletion, and post-translational modifications, among others. Many of these genomic events are being used in cancer diagnosis, whereas others are therapeutically targeted with small-molecule inhibitors. Multiple genes/enzymes that play a role in DNA and histone modifications are also altered in various cancers, changing the epigenomic landscape during cancer initiation and progression. Apart from protein-coding genes, studies are uncovering the critical regulatory roles played by noncoding RNAs and noncoding regions of the genome during cancer progression. Many of these genomic and epigenetic events function in tandem to drive tumor development and metastasis. Concurrent advances in genome-modulating technologies, like gene silencing and genome editing, are providing ability to understand in detail the process of cancer initiation, progression, and signaling as well as opening up avenues for therapeutic targeting. In this review, we discuss some of the recent advances in cancer genomic and epigenomic research. Copyright © 2016 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
Wang, Kai; Wang, Yuyu; Yang, Ding
The complete mitochondrial (mt) genome of a stonefly species, Togoperla sp. (Plecoptera: Perlidae), was sequenced. The 15,723 bp long genome has the standard metazoan complement of 37 genes and an A+T-rich region, which is the same as the insect ancestral genome arrangement.
Li, Bi Jun; Li, Hong Lian; Meng, Zining; Zhang, Yong; Lin, Haoran; Yue, Gen Hua; Xia, Jun Hong
Discovering the nature and pattern of genome variation is fundamental in understanding phenotypic diversity among populations. Although several millions of single nucleotide polymorphisms (SNPs) have been discovered in tilapia, the genome-wide characterization of larger structural variants, such as copy number variation (CNV) regions has not been carried out yet. We conducted a genome-wide scan for CNVs in 47 individuals from three tilapia populations. Based on 254 Gb of high-quality paired-end sequencing reads, we identified 4642 distinct high-confidence CNVs. These CNVs account for 1.9% (12.411 Mb) of the used Nile tilapia reference genome. A total of 1100 predicted CNVs were found overlapping with exon regions of protein genes. Further association analysis based on linear model regression found 85 CNVs ranging between 300 and 27,000 base pairs significantly associated to population types (R 2 > 0.9 and P > 0.001). Our study sheds first insights on genome-wide CNVs in tilapia. These CNVs among and within tilapia populations may have functional effects on phenotypes and specific adaptation to particular environments.
Werken, van de H.J.G.
With the ever increasing number of completely sequenced prokaryotic genomes and the subsequent use of functional genomics tools, e.g. DNA microarray and proteomics, computational data analysis and the integration of microbial and molecular data is inevitable. This thesis describes the computational
Sørensen, Claus Storgaard; Syljuåsen, Randi G
Mechanisms that preserve genome integrity are highly important during the normal life cycle of human cells. Loss of genome protective mechanisms can lead to the development of diseases such as cancer. Checkpoint kinases function in the cellular surveillance pathways that help cells to cope with D...
Möller, Mareike; Stukenbrock, Eva H
The fungal kingdom comprises some of the most devastating plant pathogens. Sequencing the genomes of fungal pathogens has shown a remarkable variability in genome size and architecture. Population genomic data enable us to understand the mechanisms and the history of changes in genome size and adaptive evolution in plant pathogens. Although transposable elements predominantly have negative effects on their host, fungal pathogens provide prominent examples of advantageous associations between rapidly evolving transposable elements and virulence genes that cause variation in virulence phenotypes. By providing homogeneous environments at large regional scales, managed ecosystems, such as modern agriculture, can be conducive for the rapid evolution and dispersal of pathogens. In this Review, we summarize key examples from fungal plant pathogen genomics and discuss evolutionary processes in pathogenic fungi in the context of molecular evolution, population genomics and agriculture.
Bohlin, J; Snipen, L; Hardy, S.P.
the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content...
Wang, Jun; Wang, Wei; Li, Ruiqiang
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...... used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP...... identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J...
Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.
Morrison, M.; Nelson, K.E.
Improving microbial degradation of plant cell wall polysaccharides remains one of the highest priority goals for all livestock enterprises, including the cattle herds and draught animals of developing countries. The North American Consortium for Genomics of Fibrolytic Ruminal Bacteria was created to promote the sequencing and comparative analysis of rumen microbial genomes, offering the potential to fully assess the genetic potential in a functional and comparative fashion. It has been found that the Fibrobacter succinogenes genome encodes many more endoglucanases and cellodextrinases than previously isolated, and several new processive endoglucanases have been identified by genome and proteomic analysis of Ruminococcus albus, in addition to a variety of strategies for its adhesion to fibre. The ramifications of acquiring genome sequence data for rumen microorganisms are profound, including the potential to elucidate and overcome the biochemical, ecological or physiological processes that are rate limiting for ruminal fibre degradation. (author)
Doolittle, Russell F.
The publication of the first complete sequence of a bacterial genome in 1995 was a signal event, underscored by the fact that the article has been cited more than 2,100 times during the intervening seven years. It was a marvelous technical achievement, made possible by automatic DNA-sequencing machines. The feat is the more impressive in that complete genome sequencing has now been adopted in many different laboratories around the world. Four years ago in these columns I examined the situation after a dozen microbial genomes had been completed. Now, with upwards of 60 microbial genome sequences determined and twice that many in progress, it seems reasonable to assess just what is being learned. Are new concepts emerging about how cells work? Have there been practical benefits in the fields of medicine and agriculture? Is it feasible to determine the genomic sequence of every bacterial species on Earth? The answers to these questions maybe Yes, Perhaps, and No, respectively.
Full Text Available During the meeting in Arlington, USA in 2001, the scientists grouped in PROMUSA agreed with the launching of the Global Musa Genomics Consortium. The Consortium aims to apply genomics technologies to the improvement of this important crop. These genome projects put banana as the third model species after Arabidopsis and rice that will be analyzed and sequenced. Comparing to Arabidopsis and rice, banana genome provides a unique and powerful insight into structural and in functional genomics that could not be found in those two species. This paper discussed these subjects-including the importance of banana as the fourth main food in the world, the evolution and biodiversity of this genetic resource and its parasite.
Stella, Stefano; Montoya, Guillermo
-Cas system has become the main tool for genome editing in many laboratories. Currently the targeted genome editing technology has been used in many fields and may be a possible approach for human gene therapy. Furthermore, it can also be used to modifying the genomes of model organisms for studying human......In the last 10 years, we have witnessed a blooming of targeted genome editing systems and applications. The area was revolutionized by the discovery and characterization of the transcription activator-like effector proteins, which are easier to engineer to target new DNA sequences than...... sequence). This ribonucleoprotein complex protects bacteria from invading DNAs, and it was adapted to be used in genome editing. The CRISPR ribonucleic acid (RNA) molecule guides to the specific DNA site the Cas9 nuclease to cleave the DNA target. Two years and more than 1000 publications later, the CRISPR...
Matarin, Mar; Simon-Sanchez, Javier; Fung, Hon-Chung; Scholz, Sonja; Gibbs, J. Raphael; Hernandez, Dena G.; Crews, Cynthia; Britton, Angela; Wavrant De Vrieze, Fabienne; Brott, Thomas G.; Brown, Robert D.; Worrall, Bradford B.; Silliman, Scott; Case, L. Douglas; Hardy, John A.; Rich, Stephen S.; Meschia, James F.; Singleton, Andrew B.
Technological advances in molecular genetics allow rapid and sensitive identification of genomic copy number variants (CNVs). This, in turn, has sparked interest in the function such variation may play in disease. While a role for copy number mutations as a cause of Mendelian disorders is well established, it is unclear whether CNVs may affect risk for common complex disorders. We sought to investigate whether CNVs may modulate risk for ischemic stroke (IS) and to provide a catalog of CNVs in patients with this disorder by analyzing copy number metrics produced as a part of our previous genome-wide single-nucleotide polymorphism (SNP)-based association study of ischemic stroke in a North American white population. We examined CNVs in 263 patients with ischemic stroke (IS). Each identified CNV was compared with changes identified in 275 neurologically normal controls. Our analysis identified 247 CNVs, corresponding to 187 insertions (76%; 135 heterozygous; 25 homozygous duplications or triplications; 2 heterosomic) and 60 deletions (24%; 40 heterozygous deletions;3 homozygous deletions; 14 heterosomic deletions). Most alterations (81%) were the same as, or overlapped with, previously reported CNVs. We report here the first genome-wide analysis of CNVs in IS patients. In summary, our study did not detect any common genomic structural variation unequivocally linked to IS, although we cannot exclude that smaller CNVs or CNVs in genomic regions poorly covered by this methodology may confer risk for IS. The application of genome-wide SNP arrays now facilitates the evaluation of structural changes through the entire genome as part of a genome-wide genetic association study. PMID:18288507
Full Text Available We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a identification of horizontally transferred genes, (b identification of genomic islands with special properties and (c binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a calculation of the k-mer based barcode image for a provided DNA sequence; (b detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c clustering of provided DNA sequences into groups having similar barcodes; and (d homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.
Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo
New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
Audit, Benjamin; Zaghloul, Lamia; Baker, Antoine; Arneodo, Alain; Chen, Chun-Long; d'Aubenton-Carafa, Yves; Thermes, Claude
In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.
Martin, Frank N.; Douda, Bensasson; Tyler, Brett M.; Boore,Jeffrey L.
The complete sequences of the mitochondrial genomes of theoomycetes of Phytophthora ramorum and P. sojae were determined during thecourse of their complete nuclear genome sequencing (Tyler, et al. 2006).Both are circular, with sizes of 39,314 bp for P. ramorum and 42,975 bpfor P. sojae. Each contains a total of 37 identifiable protein-encodinggenes, 25 or 26 tRNAs (P. sojae and P. ramorum, respectively)specifying19 amino acids, and a variable number of ORFs (7 for P. ramorum and 12for P. sojae) which are potentially additional functional genes.Non-coding regions comprise approximately 11.5 percent and 18.4 percentof the genomes of P. ramorum and P. sojae, respectively. Relative to P.sojae, there is an inverted repeat of 1,150 bp in P. ramorum thatincludes an unassigned unique ORF, a tRNA gene, and adjacent non-codingsequences, but otherwise the gene order in both species is identical.Comparisons of these genomes with published sequences of the P. infestansmitochondrial genome reveals a number of similarities, but the gene orderin P. infestans differs in two adjacent locations due to inversions.Sequence alignments of the three genomes indicated sequence conservationranging from 75 to 85 percent and that specific regions were morevariable than others.
Goodstein, David; Batra, Sajeev; Carlson, Joseph; Hayes, Richard; Phillips, Jeremy; Shu, Shengqiang; Schmutz, Jeremy; Rokhsar, Daniel
The Dept. of Energy Joint Genome Institute is a genomics user facility supporting DOE mission science in the areas of Bioenergy, Carbon Cycling, and Biogeochemistry. The Plant Program at the JGI applies genomic, analytical, computational and informatics platforms and methods to: 1. Understand and accelerate the improvement (domestication) of bioenergy crops 2. Characterize and moderate plant response to climate change 3. Use comparative genomics to identify constrained elements and infer gene function 4. Build high quality genomic resource platforms of JGI Plant Flagship genomes for functional and experimental work 5. Expand functional genomic resources for Plant Flagship genomes
Williamson, Scott H; Hubisz, Melissa J; Clark, Andrew G
, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome......-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome...
Ruan, Jiangxing; Cheng, Jian; Zhang, Tongcun; Jiang, Huifeng
Exploring the evolutionary patterns of mitochondrial genomes is important for our understanding of the Saccharomyces sensu stricto (SSS) group, which is a model system for genomic evolution and ecological analysis. In this study, we first obtained the complete mitochondrial sequences of two important species, Saccharomyces mikatae and Saccharomyces kudriavzevii. We then compared the mitochondrial genomes in the SSS group with those of close relatives, and found that the non-coding regions evolved rapidly, including dramatic expansion of intergenic regions, fast evolution of introns and almost 20-fold higher rearrangement rates than those of the nuclear genomes. However, the coding regions, and especially the protein-coding genes, are more conserved than those in the nuclear genomes of the SSS group. The different evolutionary patterns of coding and non-coding regions in the mitochondrial and nuclear genomes may be related to the origin of the aerobic fermentation lifestyle in this group. Our analysis thus provides novel insights into the evolution of mitochondrial genomes.
Dermauw, Wannes; Vanholme, Bartel; Tirry, Luc; Van Leeuwen, Thomas
In this study we sequenced and analysed the complete mitochondrial (mt) genome of the Chilean predatory mite Phytoseiulus persimilis Athias-Henriot (Chelicerata: Acari: Mesostigmata: Phytoseiidae: Amblyseiinae). The 16 199 bp genome (79.8% AT) contains the standard set of 13 protein-coding and 24 RNA genes. Compared with the ancestral arthropod mtDNA pattern, the gene order is extremely reshuffled (35 genes changed position) and represents a novel arrangement within the arthropods. This is probably related to the presence of several large noncoding regions in the genome. In contrast with the mt genome of the closely related species Metaseiulus occidentalis (Phytoseiidae: Typhlodrominae) - which was reported to be unusually large (24 961 bp), to lack nad6 and nad3 protein-coding genes, and to contain 22 tRNAs without T-arms - the genome of P. persimilis has all the features of a standard metazoan mt genome. Consequently, we performed additional experiments on the M. occidentalis mt genome. Our preliminary restriction digests and Southern hybridization data revealed that this genome is smaller than previously reported. In addition, we cloned nad3 in M. occidentalis and positioned this gene between nad4L and 12S-rRNA on the mt genome. Finally, we report that at least 15 of the 22 tRNAs in the M. occidentalis mt genome can be folded into canonical cloverleaf structures similar to their counterparts in P. persimilis.
Peter J Walker
Full Text Available RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3' to 5' direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae.
Walker, Peter J; Firth, Cadhla; Widen, Steven G; Blasdell, Kim R; Guzman, Hilda; Wood, Thomas G; Paradkar, Prasad N; Holmes, Edward C; Tesh, Robert B; Vasilakis, Nikos
RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3' to 5' direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae.
Walker, Peter J.; Firth, Cadhla; Widen, Steven G.; Blasdell, Kim R.; Guzman, Hilda; Wood, Thomas G.; Paradkar, Prasad N.; Holmes, Edward C.; Tesh, Robert B.; Vasilakis, Nikos
RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3’ to 5’ direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae. PMID:25679389
De Groot, Anne S; Rappuoli, Rino
Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
Droc, Gaëtan; Larivière, Delphine; Guignon, Valentin; Yahiaoui, Nabila; This, Dominique; Garsmeur, Olivier; Dereeper, Alexis; Hamelin, Chantal; Argout, Xavier; Dufayard, Jean-François; Lengelle, Juliette; Baurens, Franc-Christophe; Cenci, Alberto; Pitollat, Bertrand; D’Hont, Angélique; Ruiz, Manuel; Rouard, Mathieu; Bocs, Stéphanie
Banana is one of the world’s favorite fruits and one of the most important crops for developing countries. The banana reference genome sequence (Musa acuminata) was recently released. Given the taxonomic position of Musa, the completed genomic sequence has particular comparative value to provide fresh insights about the evolution of the monocotyledons. The study of the banana genome has been enhanced by a number of tools and resources that allows harnessing its sequence. First, we set up essential tools such as a Community Annotation System, phylogenomics resources and metabolic pathways. Then, to support post-genomic efforts, we improved banana existing systems (e.g. web front end, query builder), we integrated available Musa data into generic systems (e.g. markers and genetic maps, synteny blocks), we have made interoperable with the banana hub, other existing systems containing Musa data (e.g. transcriptomics, rice reference genome, workflow manager) and finally, we generated new results from sequence analyses (e.g. SNP and polymorphism analysis). Several uses cases illustrate how the Banana Genome Hub can be used to study gene families. Overall, with this collaborative effort, we discuss the importance of the interoperability toward data integration between existing information systems. Database URL: http://banana-genome.cirad.fr/ PMID:23707967
Hacker-Klom, U.B.; Goehde, W.
Ionising irradiation may induce genomic instability. The broad spectrum of stress reactions in eukaryontic cells to irradiation complicates the discovery of cellular targets and pathways inducing genomic instability. Irradiation may initiate genomic instability by deletion of genes controlling stability, by induction of genes stimulating instability and/or by activating endogeneous cellular viruses. Alternatively or additionally it is discussed that the initiation of genomic instability may be a consequence of radiation or other agents independently of DNA damage implying non nuclear targets, e.g. signal cascades. As a further mechanism possibly involved our own results may suggest radiation-induced changes in chromatin structure. Once initiated the process of genomic instability probably is perpetuated by endogeneous processes necessary for proliferation. Genomic instability may be a cause or a consequence of the neoplastic phenotype. As a conclusion from the data available up to now a new interpretation of low level radiation effects for radiation protection and in radiotherapy appears useful. The detection of the molecular mechanisms of genomic instability will be important in this context and may contribute to a better understanding of phenomenons occurring at low doses <10 cSv which are not well understood up to now. (orig.)
Gao, Lei; Yi, Xuan; Yang, Yong-Xia; Su, Ying-Juan; Wang, Ting
Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp) genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae). The Alsophila cp genome is 156,661 base pairs (bp) in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp) regions separated by two copies of an inverted repeat (IRs, 24,365 bp each). This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG) is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome of the polypod fern Adiantum capillus-veneris, including the absence of 5 tRNA genes that exist in most other cp genomes. The genome shows a high degree of synteny with that of Adiantum, but differs considerably from two basal ferns (Angiopteris evecta and Psilotum nudum). At one endpoint of an ancient inversion we detected a highly repeated 565-bp-region that is absent from the Adiantum cp genome. An additional minor inversion of the trnD-GUC, which is possibly shared by all ferns, was identified by comparison between the fern and other land plant cp genomes. By comparing four fern cp genome sequences it was confirmed that two major rearrangements distinguish higher leptosporangiate ferns from basal fern lineages. The Alsophila cp genome is very similar to that of the
Full Text Available Abstract Background Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae. Results The Alsophila cp genome is 156,661 base pairs (bp in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp and small single copy (SSC, 21,623 bp regions separated by two copies of an inverted repeat (IRs, 24,365 bp each. This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome