Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S
Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.
Katoh, Masuko; Katoh, Masaru
DAND1 (NBL1), DAND2 (CKTSF1B1 or GREM1 or GREMLIN), DAND3 (CKTSF1B2 or GREM2 or PRDC), DAND4 (CER1), DAND5 (CKTSF1B3 or GREM3 or DANTE), MUC2, MUC5AC, MUC5B, MUC6, MUC19, WISP1, WISP2, WISP3, VWF, NOV and Norrie disease (NDP or NORRIN) genes encode proteins with cysteine knot domain. Cysteine-knot superfamily proteins regulate ligand-receptor interactions for a variety of signaling pathways implicated in embryogenesis, homeostasis, and carcinogenesis. Although Ndp is unrelated to Wnt family members, Ndp is claimed to function as a ligand for Fzd4. Here, we identified and characterized rat Ndp, cow Ndp, chicken ndp and zebrafish ndp genes by using bioinformatics. Rat Ndp gene, consisting of three exons, was located within AC105563.4 genome sequence. Cow Ndp and chicken ndp complete CDS were derived from CB467544.1 EST and BX932859.2 cDNA, respectively. Zebrafish ndp gene was located within BX572627.5 genome sequence. Rat Ndp (131 aa) was a secreted protein with C-terminal cysteine knot-like (CTCK) domain. Rat Ndp showed 100, 96.9, 95.4, 87.8 and 66.4 total-amino-acid identity with mouse Ndp, cow Ndp, human NDP, chicken ndp and zebrafish ndp, respectively. Exon-intron structure of mammalian Ndp orthologs was well conserved. FOXA2, CUTL1 (CCAAT displacement protein), LMO2, CEBPA (C/EBPalpha)-binding sites and triple POU2F1 (OCT1)-binding sites were conserved among promoters of mammalian Ndp orthologs.
Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.
Singh, Sangeeta; Chand, Suresh; Singh, N. K.; Sharma, Tilak Raj
The resistance (R) genes and defense response (DR) genes have become very important resources for the development of disease resistant cultivars. In the present investigation, genome-wide identification, expression, phylogenetic and synteny analysis was done for R and DR-genes across three species of rice viz: Oryza sativa ssp indica cv 93-11, Oryza sativa ssp japonica and wild rice species, Oryza brachyantha. We used the in silico approach to identify and map 786 R -genes and 167 DR-genes, 672 R-genes and 142 DR-genes, 251 R-genes and 86 DR-genes in the japonica, indica and O. brachyanth a genomes, respectively. Our analysis showed that 60.5% and 55.6% of the R-genes are tandemly repeated within clusters and distributed over all the rice chromosomes in indica and japonica genomes, respectively. The phylogenetic analysis along with motif distribution shows high degree of conservation of R- and DR-genes in clusters. In silico expression analysis of R-genes and DR-genes showed more than 85% were expressed genes showing corresponding EST matches in the databases. This study gave special emphasis on mechanisms of gene evolution and duplication for R and DR genes across species. Analysis of paralogs across rice species indicated 17% and 4.38% R-genes, 29% and 11.63% DR-genes duplication in indica and Oryza brachyantha, as compared to 20% and 26% duplication of R-genes and DR-genes in japonica respectively. We found that during the course of duplication only 9.5% of R- and DR-genes changed their function and rest of the genes have maintained their identity. Syntenic relationship across three genomes inferred that more orthology is shared between indica and japonica genomes as compared to brachyantha genome. Genome wide identification of R-genes and DR-genes in the rice genome will help in allele mining and functional validation of these genes, and to understand molecular mechanism of disease resistance and their evolution in rice and related species. PMID:25902056
Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L
Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883
DiCarlo, James E; Mahajan, Vinit B; Tsang, Stephen H
Precision medicine seeks to treat disease with molecular specificity. Advances in genome sequence analysis, gene delivery, and genome surgery have allowed clinician-scientists to treat genetic conditions at the level of their pathology. As a result, progress in treating retinal disease using genetic tools has advanced tremendously over the past several decades. Breakthroughs in gene delivery vectors, both viral and nonviral, have allowed the delivery of genetic payloads in preclinical models of retinal disorders and have paved the way for numerous successful clinical trials. Moreover, the adaptation of CRISPR-Cas systems for genome engineering have enabled the correction of both recessive and dominant pathogenic alleles, expanding the disease-modifying power of gene therapies. Here, we highlight the translational progress of gene therapy and genome editing of several retinal disorders, including RPE65-, CEP290-, and GUY2D-associated Leber congenital amaurosis, as well as choroideremia, achromatopsia, Mer tyrosine kinase- (MERTK-) and RPGR X-linked retinitis pigmentosa, Usher syndrome, neovascular age-related macular degeneration, X-linked retinoschisis, Stargardt disease, and Leber hereditary optic neuropathy.
Poland, Jesse; Rutkoski, Jessica
Breeding for disease resistance is a central focus of plant breeding programs, as any successful variety must have the complete package of high yield, disease resistance, agronomic performance, and end-use quality. With the need to accelerate the development of improved varieties, genomics-assisted breeding is becoming an important tool in breeding programs. With marker-assisted selection, there has been success in breeding for disease resistance; however, much of this work and research has focused on identifying, mapping, and selecting for major resistance genes that tend to be highly effective but vulnerable to breakdown with rapid changes in pathogen races. In contrast, breeding for minor-gene quantitative resistance tends to produce more durable varieties but is a more challenging breeding objective. As the genetic architecture of resistance shifts from single major R genes to a diffused architecture of many minor genes, the best approach for molecular breeding will shift from marker-assisted selection to genomic selection. Genomics-assisted breeding for quantitative resistance will therefore necessitate whole-genome prediction models and selection methodology as implemented for classical complex traits such as yield. Here, we examine multiple case studies testing whole-genome prediction models and genomic selection for disease resistance. In general, whole-genome models for disease resistance can produce prediction accuracy suitable for application in breeding. These models also largely outperform multiple linear regression as would be applied in marker-assisted selection. With the implementation of genomic selection for yield and other agronomic traits, whole-genome marker profiles will be available for the entire set of breeding lines, enabling genomic selection for disease at no additional direct cost. In this context, the scope of implementing genomics selection for disease resistance, and specifically for quantitative resistance and quarantined pathogens
Xu, Wei; Wang, Huisong; Cheng, Wenqing; Fu, Dong; Xia, Tian; Kibbe, Warren A; Lin, Simon M
Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.
Taye H Hamza; Honglei Chen; Erin M Hill-Burns; Shannon L Rhodes; Jennifer Montimurro; Denise M Kay; Albert Tenesa; Victoria I Kusel; Patricia Sheehan; Muthukrishnan Eaaswarkhanth; Dora Yearout; Ali Samii; John W Roberts; Pinky Agarwal; Yvette Bordelon
Our aim was to identify genes that influence the inverse association of coffee with the risk of developing Parkinson's disease (PD). We used genome-wide genotype data and lifetime caffeinated-coffee-consumption data on 1,458 persons with PD and 931 without PD from the NeuroGenetics Research Consortium (NGRC), and we performed a genome-wide association and interaction study (GWAIS), testing each SNP's main-effect plus its interaction with coffee, adjusting for sex, age, and two principal compo...
Liu, Xuewu; Wang, Yuanyuan; Liang, Jiao; Wang, Luojun; Qin, Na; Zhao, Ya; Zhao, Gang
Plasmodium falciparum is the most virulent malaria parasite capable of parasitizing human erythrocytes. The identification of genes related to this capability can enhance our understanding of the molecular mechanisms underlying human malaria and lead to the development of new therapeutic strategies for malaria control. With the availability of several malaria parasite genome sequences, performing computational analysis is now a practical strategy to identify genes contributing to this disease. Here, we developed and used a virtual genome method to assign 33,314 genes from three human malaria parasites, namely, P. falciparum, P. knowlesi and P. vivax, and three rodent malaria parasites, namely, P. berghei, P. chabaudi and P. yoelii, to 4605 clusters. Each cluster consisted of genes whose protein sequences were significantly similar and was considered as a virtual gene. Comparing the enriched values of all clusters in human malaria parasites with those in rodent malaria parasites revealed 115 P. falciparum genes putatively responsible for parasitizing human erythrocytes. These genes are mainly located in the chromosome internal regions and participate in many biological processes, including membrane protein trafficking and thiamine biosynthesis. Meanwhile, 289 P. berghei genes were included in the rodent parasite-enriched clusters. Most are located in subtelomeric regions and encode erythrocyte surface proteins. Comparing cluster values in P. falciparum with those in P. vivax and P. knowlesi revealed 493 candidate genes linked to virulence. Some of them encode proteins present on the erythrocyte surface and participate in cytoadhesion, virulence factor trafficking, or erythrocyte invasion, but many genes with unknown function were also identified. Cerebral malaria is characterized by accumulation of infected erythrocytes at trophozoite stage in brain microvascular. To discover cerebral malaria-related genes, fast Fourier transformation (FFT) was introduced to extract
Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin
Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Escott-Price, Valentina; Bellenguez, Céline; Wang, Li-San; Choi, Seung-Hoan; Harold, Denise; Jones, Lesley; Holmans, Peter; Gerrish, Amy; Vedernikov, Alexey; Richards, Alexander; DeStefano, Anita L; Lambert, Jean-Charles; Ibrahim-Verbaas, Carla A; Naj, Adam C; Sims, Rebecca; Jun, Gyungah; Bis, Joshua C; Beecham, Gary W; Grenier-Boley, Benjamin; Russo, Giancarlo; Thornton-Wells, Tricia A; Denning, Nicola; Smith, Albert V; Chouraki, Vincent; Thomas, Charlene; Ikram, M Arfan; Zelenika, Diana; Vardarajan, Badri N; Kamatani, Yoichiro; Lin, Chiao-Feng; Schmidt, Helena; Kunkle, Brian; Dunstan, Melanie L; Vronskaya, Maria; Johnson, Andrew D; Ruiz, Agustin; Bihoreau, Marie-Thérèse; Reitz, Christiane; Pasquier, Florence; Hollingworth, Paul; Hanon, Olivier; Fitzpatrick, Annette L; Buxbaum, Joseph D; Campion, Dominique; Crane, Paul K; Baldwin, Clinton; Becker, Tim; Gudnason, Vilmundur; Cruchaga, Carlos; Craig, David; Amin, Najaf; Berr, Claudine; Lopez, Oscar L; De Jager, Philip L; Deramecourt, Vincent; Johnston, Janet A; Evans, Denis; Lovestone, Simon; Letenneur, Luc; Hernández, Isabel; Rubinsztein, David C; Eiriksdottir, Gudny; Sleegers, Kristel; Goate, Alison M; Fiévet, Nathalie; Huentelman, Matthew J; Gill, Michael; Brown, Kristelle; Kamboh, M Ilyas; Keller, Lina; Barberger-Gateau, Pascale; McGuinness, Bernadette; Larson, Eric B; Myers, Amanda J; Dufouil, Carole; Todd, Stephen; Wallon, David; Love, Seth; Rogaeva, Ekaterina; Gallacher, John; George-Hyslop, Peter St; Clarimon, Jordi; Lleo, Alberto; Bayer, Anthony; Tsuang, Debby W; Yu, Lei; Tsolaki, Magda; Bossù, Paola; Spalletta, Gianfranco; Proitsi, Petra; Collinge, John; Sorbi, Sandro; Garcia, Florentino Sanchez; Fox, Nick C; Hardy, John; Naranjo, Maria Candida Deniz; Bosco, Paolo; Clarke, Robert; Brayne, Carol; Galimberti, Daniela; Scarpini, Elio; Bonuccelli, Ubaldo; Mancuso, Michelangelo; Siciliano, Gabriele; Moebus, Susanne; Mecocci, Patrizia; Zompo, Maria Del; Maier, Wolfgang; Hampel, Harald; Pilotto, Alberto; Frank-García, Ana; Panza, Francesco; Solfrizzi, Vincenzo; Caffarra, Paolo; Nacmias, Benedetta; Perry, William; Mayhaus, Manuel; Lannfelt, Lars; Hakonarson, Hakon; Pichler, Sabrina; Carrasquillo, Minerva M; Ingelsson, Martin; Beekly, Duane; Alvarez, Victoria; Zou, Fanggeng; Valladares, Otto; Younkin, Steven G; Coto, Eliecer; Hamilton-Nelson, Kara L; Gu, Wei; Razquin, Cristina; Pastor, Pau; Mateo, Ignacio; Owen, Michael J; Faber, Kelley M; Jonsson, Palmi V; Combarros, Onofre; O'Donovan, Michael C; Cantwell, Laura B; Soininen, Hilkka; Blacker, Deborah; Mead, Simon; Mosley, Thomas H; Bennett, David A; Harris, Tamara B; Fratiglioni, Laura; Holmes, Clive; de Bruijn, Renee F A G; Passmore, Peter; Montine, Thomas J; Bettens, Karolien; Rotter, Jerome I; Brice, Alexis; Morgan, Kevin; Foroud, Tatiana M; Kukull, Walter A; Hannequin, Didier; Powell, John F; Nalls, Michael A; Ritchie, Karen; Lunetta, Kathryn L; Kauwe, John S K; Boerwinkle, Eric; Riemenschneider, Matthias; Boada, Mercè; Hiltunen, Mikko; Martin, Eden R; Schmidt, Reinhold; Rujescu, Dan; Dartigues, Jean-François; Mayeux, Richard; Tzourio, Christophe; Hofman, Albert; Nöthen, Markus M; Graff, Caroline; Psaty, Bruce M; Haines, Jonathan L; Lathrop, Mark; Pericak-Vance, Margaret A; Launer, Lenore J; Van Broeckhoven, Christine; Farrer, Lindsay A; van Duijn, Cornelia M; Ramirez, Alfredo; Seshadri, Sudha; Schellenberg, Gerard D; Amouyel, Philippe; Williams, Julie
Alzheimer's disease is a common debilitating dementia with known heritability, for which 20 late onset susceptibility loci have been identified, but more remain to be discovered. This study sought to identify new susceptibility genes, using an alternative gene-wide analytical approach which tests for patterns of association within genes, in the powerful genome-wide association dataset of the International Genomics of Alzheimer's Project Consortium, comprising over 7 m genotypes from 25,580 Alzheimer's cases and 48,466 controls. In addition to earlier reported genes, we detected genome-wide significant loci on chromosomes 8 (TP53INP1, p = 1.4×10-6) and 14 (IGHV1-67 p = 7.9×10-8) which indexed novel susceptibility loci. The additional genes identified in this study, have an array of functions previously implicated in Alzheimer's disease, including aspects of energy metabolism, protein degradation and the immune system and add further weight to these pathways as potential therapeutic targets in Alzheimer's disease.
Full Text Available Alzheimer's disease is a common debilitating dementia with known heritability, for which 20 late onset susceptibility loci have been identified, but more remain to be discovered. This study sought to identify new susceptibility genes, using an alternative gene-wide analytical approach which tests for patterns of association within genes, in the powerful genome-wide association dataset of the International Genomics of Alzheimer's Project Consortium, comprising over 7 m genotypes from 25,580 Alzheimer's cases and 48,466 controls.In addition to earlier reported genes, we detected genome-wide significant loci on chromosomes 8 (TP53INP1, p = 1.4×10-6 and 14 (IGHV1-67 p = 7.9×10-8 which indexed novel susceptibility loci.The additional genes identified in this study, have an array of functions previously implicated in Alzheimer's disease, including aspects of energy metabolism, protein degradation and the immune system and add further weight to these pathways as potential therapeutic targets in Alzheimer's disease.
Timothy G Lesnick
Full Text Available While major inroads have been made in identifying the genetic causes of rare Mendelian disorders, little progress has been made in the discovery of common gene variations that predispose to complex diseases. The single gene variants that have been shown to associate reproducibly with complex diseases typically have small effect sizes or attributable risks. However, the joint actions of common gene variants within pathways may play a major role in predisposing to complex diseases (the paradigm of complex genetics. The goal of this study was to determine whether polymorphism in a candidate pathway (axon guidance predisposed to a complex disease (Parkinson disease [PD]. We mined a whole-genome association dataset and identified single nucleotide polymorphisms (SNPs that were within axon-guidance pathway genes. We then constructed models of axon-guidance pathway SNPs that predicted three outcomes: PD susceptibility (odds ratio = 90.8, p = 4.64 x 10(-38, survival free of PD (hazards ratio = 19.0, p = 5.43 x 10(-48, and PD age at onset (R(2 = 0.68, p = 1.68 x 10(-51. By contrast, models constructed from thousands of random selections of genomic SNPs predicted the three PD outcomes poorly. Mining of a second whole-genome association dataset and mining of an expression profiling dataset also supported a role for many axon-guidance pathway genes in PD. These findings could have important implications regarding the pathogenesis of PD. This genomic pathway approach may also offer insights into other complex diseases such as Alzheimer disease, diabetes mellitus, nicotine and alcohol dependence, and several cancers.
Sims, Katherine B.; Ozelius, Laurie; Corey, Timothy; Rinehart, William B.; Liberfarb, Ruth; Haines, Jonathan; Chen, Wei Jane; Norio, Reijo; Sankila, Eeva; de la Chapelle, Albert; Murphy, Dennis L.; Gusella, James; Breakefield, Xandra O.
The genes for MAO-A and MAO-B appear to be very close to the Norrie disease gene, on the basis of loss and /or disruption of the MAO genes and activities in atypical Norrie disease patients deleted for the DXS7 locus; linkage among the MAO genes, the Norrie disease gene, and the DXS7 locus; and mapping of all these loci to the chromosomal region Xp11. The present study provides evidence that the MAO genes are not disrupted in “classic” Norrie disease patients. Genomic DNA from these “nondelet...
The mosquito Aedes aegypti transmits some of the most important human arboviruses, including dengue, yellow fever and chikungunya viruses. It has a large genome containing many repetitive sequences, which has resulted in the genome being poorly assembled - there are 4,758 scaffolds, few of which have been assigned to a chromosome. To allow the mapping of genes affecting disease transmission, we have improved the genome assembly by scoring a large number of SNPs in recombinant progeny from a cross between two strains of Ae. aegypti, and used these to generate a genetic map. This revealed a high rate of misassemblies in the current genome, where, for example, sequences from different chromosomes were found on the same scaffold. Once these were corrected, we were able to assign 60% of the genome sequence to chromosomes and approximately order the scaffolds along the chromosome. We found that there are very large regions of suppressed recombination around the centromeres, which can extend to as much as 47% of the chromosome. To illustrate the utility of this new genome assembly, we mapped a gene that makes Ae. aegypti resistant to the human parasite Brugia malayi, and generated a list of candidate genes that could be affecting the trait. © 2014 Juneja et al.
Li, Jie; Wang, Shunli; Li, Shanshan; Ge, Pei; Li, Xiaohui; Ma, Wujun; Zeller, F J; Hsam, Sai L K; Yan, Yueming
The α-gliadins are associated with human celiac disease. A total of 23 noninterrupted full open reading frame α-gliadin genes and 19 pseudogenes were cloned and sequenced from C, M, N, and U genomes of four diploid Aegilops species. Sequence comparison of α-gliadin genes from Aegilops and Triticum species demonstrated an existence of extensive allelic variations in Gli-2 loci of the four Aegilops genomes. Specific structural features were found including the compositions and variations of two polyglutamine domains (QI and QII) and four T cell stimulatory toxic epitopes. The mean numbers of glutamine residues in the QI domain in C and N genomes and the QII domain in C, N, and U genomes were much higher than those in Triticum genomes, and the QI domain in C and N genomes and the QII domain in C, M, N, and U genomes displayed greater length variations. Interestingly, the types and numbers of four T cell stimulatory toxic epitopes in α-gliadins from the four Aegilops genomes were significantly less than those from Triticum A, B, D, and their progenitor genomes. Relationships between the structural variations of the two polyglutamine domains and the distributions of four T cell stimulatory toxic epitopes were found, resulting in the α-gliadin genes from the Aegilops and Triticum genomes to be classified into three groups.
Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.
The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).
... MD): National Center for Biotechnology Information (US); 1998-. Genes and Disease [Internet]. Show details National Center for ... 45K) PDF version of this title (3.8M) Gene sequence Genome view see gene locations Entrez Gene ...
Bandrés-Ciga, Sara; Ruz, Clara; Barrero, Francisco J; Escamilla-Sevilla, Francisco; Pelegrina, Javier; Vives, Francisco; Duran, Raquel
Parkinson's disease (PD) is the second most common neurodegenerative disease, whose prevalence is projected to be between 8.7 and 9.3 million by 2030. Until about 20 years ago, PD was considered to be the textbook example of a "non-genetic" disorder. Nowadays, PD is generally considered a multifactorial disorder that arises from the combination and complex interaction of genes and environmental factors. To date, a total of 7 genes including SNCA, LRRK2, PARK2, DJ-1, PINK 1, VPS35 and ATP13A2 have been seen to cause unequivocally Mendelian PD. Also, variants with incomplete penetrance in the genes LRRK2 and GBA are considered to be strong risk factors for PD worldwide. Although genetic studies have provided valuable insights into the pathogenic mechanisms underlying PD, the role of structural variation in PD has been understudied in comparison with other genomic variations. Structural genomic variations might substantially account for such genetic substrates yet to be discovered. The present review aims to provide an overview of the structural genomic variants implicated in the pathogenesis of PD.
Rao, Soumya; Nandineni, Madhusudan R
Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.
Full Text Available Nucleotide-binding site (NBS disease resistance genes play an important role in defending plants from a variety of pathogens and insect pests. Many R-genes have been identified in various plant species. However, little is known about the NBS-encoding genes in Brachypodium distachyon. In this study, using computational analysis of the B. distachyon genome, we identified 126 regular NBS-encoding genes and characterized them on the bases of structural diversity, conserved protein motifs, chromosomal locations, gene duplications, promoter region, and phylogenetic relationships. EST hits and full-length cDNA sequences (from Brachypodium database of 126 R-like candidates supported their existence. Based on the occurrence of conserved protein motifs such as coiled-coil (CC, NBS, leucine-rich repeat (LRR, these regular NBS-LRR genes were classified into four subgroups: CC-NBS-LRR, NBS-LRR, CC-NBS, and X-NBS. Further expression analysis of the regular NBS-encoding genes in Brachypodium database revealed that these genes are expressed in a wide range of libraries, including those constructed from various developmental stages, tissue types, and drought challenged or nonchallenged tissue.
Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang
Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.
Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin
Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.
Yang, Hyun-Jin; Ratnapriya, Rinki; Cogliati, Tiziana; Kim, Jung-Woong; Swaroop, Anand
Genomics and genetics have invaded all aspects of biology and medicine, opening uncharted territory for scientific exploration. The definition of "gene" itself has become ambiguous, and the central dogma is continuously being revised and expanded. Computational biology and computational medicine are no longer intellectual domains of the chosen few. Next generation sequencing (NGS) technology, together with novel methods of pattern recognition and network analyses, has revolutionized the way we think about fundamental biological mechanisms and cellular pathways. In this review, we discuss NGS-based genome-wide approaches that can provide deeper insights into retinal development, aging and disease pathogenesis. We first focus on gene regulatory networks (GRNs) that govern the differentiation of retinal photoreceptors and modulate adaptive response during aging. Then, we discuss NGS technology in the context of retinal disease and develop a vision for therapies based on network biology. We should emphasize that basic strategies for network construction and analyses can be transported to any tissue or cell type. We believe that specific and uniform guidelines are required for generation of genome, transcriptome and epigenome data to facilitate comparative analysis and integration of multi-dimensional data sets, and for constructing networks underlying complex biological processes. As cellular homeostasis and organismal survival are dependent on gene-gene and gene-environment interactions, we believe that network-based biology will provide the foundation for deciphering disease mechanisms and discovering novel drug targets for retinal neurodegenerative diseases. Published by Elsevier Ltd.
Escott-Price, Valentina; Bellenguez, Céline; Wang, Li-San; Choi, Seung-Hoan; Harold, Denise; Jones, Lesley; Holmans, Peter Alan; Gerrish, Amy; Vedernikov, Alexey; Richards, Alexander; DeStefano, Anita L.; Lambert, Jean-Charles; Ibrahim-Verbaas, Carla A.; Naj, Adam C.; Sims, Rebecca
PUBLISHED BACKGROUND: Alzheimer's disease is a common debilitating dementia with known heritability, for which 20 late onset susceptibility loci have been identified, but more remain to be discovered. This study sought to identify new susceptibility genes, using an alternative gene-wide analytical approach which tests for patterns of association within genes, in the powerful genome-wide association dataset of the International Genomics of Alzheimer's Project Consortium, comprising over...
Vivianne G A A Vleeshouwers
Full Text Available Potato is the world's fourth largest food crop yet it continues to endure late blight, a devastating disease caused by the Irish famine pathogen Phytophthora infestans. Breeding broad-spectrum disease resistance (R genes into potato (Solanum tuberosum is the best strategy for genetically managing late blight but current approaches are slow and inefficient. We used a repertoire of effector genes predicted computationally from the P. infestans genome to accelerate the identification, functional characterization, and cloning of potentially broad-spectrum R genes. An initial set of 54 effectors containing a signal peptide and a RXLR motif was profiled for activation of innate immunity (avirulence or Avr activity on wild Solanum species and tentative Avr candidates were identified. The RXLR effector family IpiO induced hypersensitive responses (HR in S. stoloniferum, S. papita and the more distantly related S. bulbocastanum, the source of the R gene Rpi-blb1. Genetic studies with S. stoloniferum showed cosegregation of resistance to P. infestans and response to IpiO. Transient co-expression of IpiO with Rpi-blb1 in a heterologous Nicotiana benthamiana system identified IpiO as Avr-blb1. A candidate gene approach led to the rapid cloning of S. stoloniferum Rpi-sto1 and S. papita Rpi-pta1, which are functionally equivalent to Rpi-blb1. Our findings indicate that effector genomics enables discovery and functional profiling of late blight R genes and Avr genes at an unprecedented rate and promises to accelerate the engineering of late blight resistant potato varieties.
Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E.; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder
Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops. PMID:29672525
Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder; Murphy, Denis J
Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops.
Versteeg, Bart; Bruisten, Sylvia M.; Pannekoek, Yvonne; Jolley, Keith A.; Maiden, Martin C. J.; van der Ende, Arie; Harrison, Odile B.
Background: Chlamydia trachomatis (Ct) plasmid has been shown to encode genes essential for infection. We evaluated the population structure of Ct using whole-genome sequence data (WGS). In particular, the relationship between the Ct genome, plasmid and disease was investigated. Results: WGS data
Gulia-Nuss, Monika; Nuss, Andrew B.; Meyer, Jason M.; Sonenshine, Daniel E.; Roe, R. Michael; Waterhouse, Robert M.; Sattelle, David B.; de la Fuente, José; Ribeiro, Jose M.; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R.; Walenz, Brian P.; Koren, Sergey; Hostetler, Jessica B.; Thiagarajan, Mathangi; Joardar, Vinita S.; Hannick, Linda I.; Bidwell, Shelby; Hammond, Martin P.; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L.; Almeida, Francisca C.; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W.; Bonzon-Kulichenko, Elena; Buckingham, Steven D.; Caffrey, Daniel R.; Caimano, Melissa J.; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J.; Giraldo-Calderón, Gloria I.; Grabowski, Jeffrey M.; Jiang, David; Khalil, Sayed M. S.; Kim, Donghun; Kocan, Katherine M.; Koči, Juraj; Kuhn, Richard J.; Kurtti, Timothy J.; Lees, Kristin; Lang, Emma G.; Kennedy, Ryan C.; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin; Radolf, Justin D.; Sakamoto, Joyce M.; Sánchez-Gracia, Alejandro; Severo, Maiara S.; Silverman, Neal; Šimo, Ladislav; Tojo, Marta; Tornador, Cristian; Van Zee, Janice P.; Vázquez, Jesús; Vieira, Filipe G.; Villar, Margarita; Wespiser, Adam R.; Yang, Yunlong; Zhu, Jiwei; Arensburger, Peter; Pietrantonio, Patricia V.; Barker, Stephen C.; Shao, Renfu; Zdobnov, Evgeny M.; Hauser, Frank; Grimmelikhuijzen, Cornelis J. P.; Park, Yoonseong; Rozas, Julio; Benton, Richard; Pedra, Joao H. F.; Nelson, David R.; Unger, Maria F.; Tubio, Jose M. C.; Tu, Zhijian; Robertson, Hugh M.; Shumway, Martin; Sutton, Granger; Wortman, Jennifer R.; Lawson, Daniel; Wikel, Stephen K.; Nene, Vishvanath M.; Fraser, Claire M.; Collins, Frank H.; Birren, Bruce; Nelson, Karen E.; Caler, Elisabet; Hill, Catherine A.
Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing ∼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick–host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host ‘questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent. PMID:26856261
Pouget, Jennie G; Gonçalves, Vanessa F; Spain, Sarah L
There has been intense debate over the immunological basis of schizophrenia, and the potential utility of adjunct immunotherapies. The major histocompatibility complex is consistently the most powerful region of association in genome-wide association studies (GWASs) of schizophrenia and has been...... in immune genes contributes to schizophrenia. We show that there is no enrichment of immune loci outside of the MHC region in the largest genetic study of schizophrenia conducted to date, in contrast to 5 diseases of known immune origin. Among 108 regions of the genome previously associated...
Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita
Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases.
Pemberton, Trevor J; Szpiech, Zachary A
Genomic regions of autozygosity (ROAs) represent segments of individual genomes that are homozygous for haplotypes inherited identical-by-descent (IBD) from a common ancestor. ROAs are nonuniformly distributed across the genome, and increased ROA levels are a reported risk factor for numerous complex diseases. Previously, we hypothesized that long ROAs are enriched for deleterious homozygotes as a result of young haplotypes with recent deleterious mutations-relatively untouched by purifying selection-being paired IBD as a consequence of recent parental relatedness, a pattern supported by ROA and whole-exome sequence data on 27 individuals. Here, we significantly bolster support for our hypothesis and expand upon our original analyses using ROA and whole-genome sequence data on 2,436 individuals from The 1000 Genomes Project. Considering CADD deleteriousness scores, we reaffirm our previous observation that long ROAs are enriched for damaging homozygotes worldwide. We show that strongly damaging homozygotes experience greater enrichment than weaker damaging homozygotes, while overall enrichment varies appreciably among populations. Mendelian disease genes and those encoding FDA-approved drug targets have significantly increased rates of gain in damaging homozygotes with increasing ROA coverage relative to all other genes. In genes implicated in eight complex phenotypes for which ROA levels have been identified as a risk factor, rates of gain in damaging homozygotes vary across phenotypes and populations but frequently differ significantly from non-disease genes. These findings highlight the potential confounding effects of population background in the assessment of associations between ROA levels and complex disease risk, which might underlie reported inconsistencies in ROA-phenotype associations. Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
For the past decade, the development of genomic technology has revolutionized modern biological research. Functional genomic analyses enable biologists to study genetic events on a genome wide scale. Examples of applications are gene discovery, biomarker determination, disease classification, and drug target identification. Global expression profiles performed with microarrays enable a better understanding of molecular signature of human disease, including acute and chronic kidney disease. About 10 % of the population in western industrialized nations suffers from chronic kidney disease (CKD). Treatment of end stage renal disease, the final stage of CKD is performed by either hemo- or peritoneal dialysis or renal transplantation. The preferred treatment is renal transplantation, because of the higher quality of life. But the pathophysiology of the disease on a molecular level is not well enough understood and early biomarkers for acute and chronic kidney disease are missing. In my studies I focused on genomics of allograft biopsies, prevention of delayed graft function after renal transplantation, anemia after renal transplantation, biocompatibility of hemodialysis membranes and peritoneal dialysis fluids and cardiovascular diseases and bone disorders in CKD patients. Gene expression profiles, pathway analysis and protein-protein interaction networks were used to elucidate the underlying pathophysiological mechanism of the disease or phenomena, identifying early biomarkers or predictors of disease state and potentially drug targets. In summery my PhD thesis represents the application of functional genomic analyses in chronic kidney disease and renal transplantation. The results provide a deeper view into the molecular and cellular mechanisms of kidney disease. Nevertheless, future multicenter collaborative studies, meta-analyses of existing data, incorporation of functional genomics into large-scale prospective clinical trials are needed and will give biomedical
Winnaker, E L
The goal of my lecture is to show the new dimensions of genome research. It is replacing classic recombinant DNA technologies. The search for single genes is being replaced by the analysis of gene activities of whole cells, organs or organisms. This development changes radically basic biomedical research and points to new therapeutic strategies (examples:cancer,Alzheimer's disease). I will also show the rapid changes of our understanding of gene activity. Mendel's definition of genes is now replaced by molecular terms which teach us how gene expression is regulated and controlled. Finally I will try to outline the limits of genetic analysis and how it raises ethical and moral questions. If the analysis of changes in the genetic read-out are related to diseases for which there is no therapy or if such knowledge only predisposes to genetic diseases the handling of such information requires extraordinary care. The genome projects thus have to be and are being pursued in conjunction with careful ethical analyses ...
Nicholls, Andrew W.; Salek, Reza M.; Marques-Vidal, Pedro; Morya, Edgard; Sameshima, Koichi; Montoliu, Ivan; Da Silva, Laeticia; Collino, Sebastiano; Martin, François-Pierre; Rezzi, Serge; Steinbeck, Christoph; Waterworth, Dawn M.; Waeber, Gérard; Vollenweider, Peter; Beckmann, Jacques S.; Le Coutre, Johannes; Mooser, Vincent; Bergmann, Sven; Genick, Ulrich K.; Kutalik, Zoltán
Metabolic traits are molecular phenotypes that can drive clinical phenotypes and may predict disease progression. Here, we report results from a metabolome- and genome-wide association study on 1H-NMR urine metabolic profiles. The study was conducted within an untargeted approach, employing a novel method for compound identification. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5×10−8) and independent associations between single nucleotide polymorphisms (SNP) and metabolome features. Fifty-six of these associations replicated in the TasteSensomics cohort, comprising 601 individuals from São Paulo of vastly diverse ethnic background. They correspond to eleven gene-metabolite associations, six of which had been previously identified in the urine metabolome and three in the serum metabolome. Our key novel findings are the associations of two SNPs with NMR spectral signatures pointing to fucose (rs492602, P = 6.9×10−44) and lysine (rs8101881, P = 1.2×10−33), respectively. Fine-mapping of the first locus pinpointed the FUT2 gene, which encodes a fucosyltransferase enzyme and has previously been associated with Crohn's disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within the SLC7A9 gene, rare mutations of which have been linked to severe kidney damage. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers. PMID:24586186
Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie
Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA
Full Text Available It is recognized that genetic factors contribute to human longevity. Besides the hypothesis of existence of longevity genes, another suggests that a lower frequency of risk alleles decreases the incidence of age-related diseases in the long-lived people. However, the latter finds no support from recent genetic studies. Considering the crucial role of epigenetic modification in gene regulation, we then hypothesize that suppressing disease-related genes in longevity individuals is likely achieved by epigenetic modification, e.g. DNA methylation. To test this hypothesis, we investigated the genome-wide methylation profile in 4 Chinese female centenarians and 4 middle-aged controls using methyl-DNA immunoprecipitation sequencing. 626 differentially methylated regions (DMRs were observed between both groups. Interestingly, genes with these DMRs were enriched in age-related diseases, including type-2 diabetes, cardiovascular disease, stroke and Alzheimer's disease. This pattern remains rather stable after including methylomes of two white individuals. Further analyses suggest that the observed DMRs likely have functional roles in regulating disease-associated gene expressions, with some genes [e.g. caspase 3 (CASP3] being down-regulated whereas the others [i.e. interleukin 1 receptor, type 2 (IL1R2] up-regulated. Therefore, our study suggests that suppressing the disease-related genes via epigenetic modification is an important contributor to human longevity.
Hitomi, Yuki; Tokunaga, Katsushi
Human genome variation may cause differences in traits and disease risks. Disease-causal/susceptible genes and variants for both common and rare diseases can be detected by comprehensive whole-genome analyses, such as whole-genome sequencing (WGS), using next-generation sequencing (NGS) technology and genome-wide association studies (GWAS). Here, in addition to the application of an NGS as a whole-genome analysis method, we summarize approaches for the identification of functional disease-causal/susceptible variants from abundant genetic variants in the human genome and methods for evaluating their functional effects in human diseases, using an NGS and in silico and in vitro functional analyses. We also discuss the clinical applications of the functional disease causal/susceptible variants to personalized medicine.
Raymond, Amy; Haffner, Taryn; Ng, Nathan; Lorimer, Don; Staker, Bart; Stewart, Lance
An overview of one salvage strategy for high-value SSGCID targets is given. Any structural genomics endeavor, particularly ambitious ones such as the NIAID-funded Seattle Structural Genomics Center for Infectious Disease (SSGCID) and Center for Structural Genomics of Infectious Disease (CSGID), face technical challenges at all points of the production pipeline. One salvage strategy employed by SSGCID is combined gene engineering and structure-guided construct design to overcome challenges at the levels of protein expression and protein crystallization. Multiple constructs of each target are cloned in parallel using Polymerase Incomplete Primer Extension cloning and small-scale expressions of these are rapidly analyzed by capillary electrophoresis. Using the methods reported here, which have proven particularly useful for high-value targets, otherwise intractable targets can be resolved
Lorimer, Don; Raymond, Amy; Mixon, Mark; Burgin, Alex; Staker, Bart; Stewart, Lance
For structural biology applications, protein-construct engineering is guided by comparative sequence analysis and structural information, which allow the researcher to better define domain boundaries for terminal deletions and nonconserved regions for surface mutants. A database software application called Gene Composer has been developed to facilitate construct design. The structural genomics effort at the Seattle Structural Genomics Center for Infectious Disease (SSGCID) requires the manipulation of large numbers of amino-acid sequences and the underlying DNA sequences which are to be cloned into expression vectors. To improve efficiency in high-throughput protein structure determination, a database software package, Gene Composer, has been developed which facilitates the information-rich design of protein constructs and their underlying gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bioinformatics steps used in modern structure-guided protein engineering and synthetic gene engineering. An example of the structure determination of H1N1 RNA-dependent RNA polymerase PB2 subunit is given
Sims, K B; Ozelius, L; Corey, T; Rinehart, W B; Liberfarb, R; Haines, J; Chen, W J; Norio, R; Sankila, E; de la Chapelle, A
The genes for MAO-A and MAO-B appear to be very close to the Norrie disease gene, on the basis of loss and/or disruption of the MAO genes and activities in atypical Norrie disease patients deleted for the DXS7 locus; linkage among the MAO genes, the Norrie disease gene, and the DXS7 locus; and mapping of all these loci to the chromosomal region Xp11. The present study provides evidence that the MAO genes are not disrupted in "classic" Norrie disease patients. Genomic DNA from these "nondeletion" Norrie disease patients did not show rearrangements at the MAOA or DXS7 loci. Normal levels of MAO-A activities, as well as normal amounts and size of the MAO-A mRNA, were observed in cultured skin fibroblasts from these patients, and MAO-B activity in their platelets was normal. Catecholamine metabolites evaluated in plasma and urine were in the control range. Thus, although some atypical Norrie disease patients lack both MAO-A and MAO-B activities, MAO does not appear to be an etiologic factor in classic Norrie disease.
Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.
Full Text Available Abstract Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org. This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.
Singh, Param Priya; Arora, Jatin; Isambert, Hervé
Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.
Perez-Iratxeta, Carolina; Wjst, Matthias; Bork, Peer; Andrade, Miguel A
Abstract Background Human inherited diseases can be associated by genetic linkage with one or more genomic regions. The availability of the complete sequence of the human genome allows examining those locations for an associated gene. We previously developed an algorithm to prioritize genes on a chromosomal region according to their possible relation to an inherited disease using a combination of data mining on biomedical databases and gene sequence analysis. Results We have implemented this ...
Anne Z Phillips
Full Text Available Cotton bacterial blight (CBB, an important disease of (Gossypium hirsutum in the early 20th century, had been controlled by resistant germplasm for over half a century. Recently, CBB re-emerged as an agronomic problem in the United States. Here, we report analysis of cotton variety planting statistics that indicate a steady increase in the percentage of susceptible cotton varieties grown each year since 2009. Phylogenetic analysis revealed that strains from the current outbreak cluster with race 18 Xanthomonas citri pv. malvacearum (Xcm strains. Illumina based draft genomes were generated for thirteen Xcm isolates and analyzed along with 4 previously published Xcm genomes. These genomes encode 24 conserved and nine variable type three effectors. Strains in the race 18 clade contain 3 to 5 more effectors than other Xcm strains. SMRT sequencing of two geographically and temporally diverse strains of Xcm yielded circular chromosomes and accompanying plasmids. These genomes encode eight and thirteen distinct transcription activator-like effector genes. RNA-sequencing revealed 52 genes induced within two cotton cultivars by both tested Xcm strains. This gene list includes a homeologous pair of genes, with homology to the known susceptibility gene, MLO. In contrast, the two strains of Xcm induce different clade III SWEET sugar transporters. Subsequent genome wide analysis revealed patterns in the overall expression of homeologous gene pairs in cotton after inoculation by Xcm. These data reveal important insights into the Xcm-G. hirsutum disease complex and strategies for future development of resistant cultivars.
Wang, Shur-Jen; Laulederkind, Stanley J. F.; Hayman, G. T.; Smith, Jennifer R.; Petri, Victoria; Lowry, Timothy F.; Nigam, Rajni; Dwinell, Melinda R.; Worthey, Elizabeth A.; Munzenmaier, Diane H.; Shimoyama, Mary; Jacob, Howard J.
The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene–disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, ‘regulation of programmed cell death’ was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where ‘lipid metabolic process’ was the most enriched term. ‘Cytosol’ and ‘nucleus’ were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with ‘nucleus’ annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term–annotated gene list showed enrichment in physiologically related diseases. For example, the ‘regulation of blood pressure’ genes were enriched with cardiovascular disease annotations, and the ‘lipid metabolic process’ genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological
Keats, B. [Louisiana State Univ. Medical Center, New Orleans, LA (United States)
The Human Genome Project has had a major impact on genetic research over the past five years. The number of mapped genes is now over 3,000 compared with approximately 1,600 in 1989 and only about 260 ten years before that. The realization that extensive variation could be detected in anonymous DNA segments greatly enhanced the potential for mapping by linkage analysis. Previously, linkage studies had depended on polymorphisms that could be detected in red blood cell antigens, proteins (revealed by electrophoresis and isoelectric focusing), and cytogenetic heteromorphisms. The identification of thousands of polymorphic DNA markers throughout the human genome has led to the construction of high density genetic linkage maps. These maps provide the data necessary to test hypotheses concerning differences in recombination rates and levels of interference. They are also important for disease gene mapping because the existence of these genes must be inferred from the phenotype. Showing linkage of a disease gene to a DNA marker is the first step towards isolating the disease gene, determining its protein product, and developing effective therapies. However, interpretation of results is not always straightforward. Factors such as etiological heterogeneity and undetected irregular segregation can lead to confusing linkage results and incorrect conclusions about the locations of disease genes. This paper will discuss these phenomena and present examples that illustrate the problems, as well as approaches to dealing with them. 23 refs., 3 figs., 3 tabs.
Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese
Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.
Xiong, Xin; Chen, Meng; Lim, Wendell A; Zhao, Dehua; Qi, Lei S
The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system, a versatile RNA-guided DNA targeting platform, has been revolutionizing our ability to modify, manipulate, and visualize the human genome, which greatly advances both biological research and therapeutics development. Here, we review the current development of CRISPR/Cas9 technologies for gene editing, transcription regulation, genome imaging, and epigenetic modification. We discuss the broad application of this system to the study of functional genomics, especially genome-wide genetic screening, and to therapeutics development, including establishing disease models, correcting defective genetic mutations, and treating diseases.
Smith, J; Gheyas, A; Burt, D W
Avian pathogens are responsible for major costs to society, both in terms of huge economic losses to the poultry industry and their implications for human health. The health and welfare of millions of birds is under continued threat from many infectious diseases, some of which are increasing in virulence and thus becoming harder to control, such as Marek's disease virus and avian influenza viruses. The current era in animal genomics has seen huge developments in both technologies and resources, which means that researchers have never been in a better position to investigate the genetics of disease resistance and determine the underlying genes/mutations which make birds susceptible or resistant to infection. Avian genomics has reached a point where the biological mechanisms of infectious diseases can be investigated and understood in poultry and other avian species. Knowledge of genes conferring disease resistance can be used in selective breeding programmes or to develop vaccines which help to control the effects of these pathogens, which have such a major impact on birds and humans alike.
Kettleborough, R.N.; Busch-Nentwich, E.M.; Harvey, S.A.; Dooley, C.M.; de Bruijn, E.; van Eeden, F.; Sealy, I.; White, R.J.; Herd, C.; Nijman, I.J.; Fenyes, F.; Mehroke, S.; Scahill, C.; Gibbons, R.; Wali, N.; Carruthers, S.; Hall, A.; Yen, J.; Cuppen, E.; Stemple, D.L.
Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms,
Andersen, Ethan J; Nepal, Madhav P
We report data associated with the identification of 242 disease resistance genes (R-genes) in the genome of Setaria italica as presented in "Genetic diversity of disease resistance genes in foxtail millet ( Setaria italica L.)" (Andersen and Nepal, 2017) . Our data describe the structure and evolution of the Coiled-coil, Nucleotide-binding site, Leucine-rich repeat (CNL) R-genes in foxtail millet. The CNL genes were identified through rigorous extraction and analysis of recently available plant genome sequences using cutting-edge analytical software. Data visualization includes gene structure diagrams, chromosomal syntenic maps, a chromosomal density plot, and a maximum-likelihood phylogenetic tree comparing Sorghum bicolor , Panicum virgatum , Setaria italica , and Arabidopsis thaliana . Compilation of InterProScan annotations, Gene Ontology (GO) annotations, and Basic Local Alignment Search Tool (BLAST) results for the 242 R-genes identified in the foxtail millet genome are also included in tabular format.
Gluckman, Peter D; Hanson, Mark A; Beedle, Alan S
That there is a heritable or familial component of susceptibility to chronic non-communicable diseases such as type 2 diabetes, obesity and cardiovascular disease is well established, but there is increasing evidence that some elements of such heritability are transmitted non-genomically and that the processes whereby environmental influences act during early development to shape disease risk in later life can have effects beyond a single generation. Such heritability may operate through epigenetic mechanisms involving regulation of either imprinted or non-imprinted genes but also through broader mechanisms related to parental physiology or behaviour. We review evidence and potential mechanisms for non-genomic transgenerational inheritance of 'lifestyle' disease and propose that the 'developmental origins of disease' phenomenon is a maladaptive consequence of an ancestral mechanism of developmental plasticity that may have had adaptive value in the evolution of generalist species such as Homo sapiens. Copyright 2007 Wiley Periodicals, Inc.
Full Text Available In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET.Celsius, the largest co-normalized microarray dataset of Affymetrix based gene expression, was used to calculate the correlation between all possible gene pairs on all platforms, and generate stored indexes in a web searchable format. The size of Celsius makes UGET a powerful gene characterization tool. Using a small seed list of known cartilage-selective genes, UGET extended the list of known genes by identifying 32 new highly cartilage-selective genes. Of these, 7 of 10 tested were validated by qPCR including the novel cartilage-specific genes SDK2 and FLJ41170. In addition, we retrospectively tested UGET and other gene expression based prioritization tools to identify disease-causing genes within known linkage intervals. We first demonstrated this utility with UGET using genetically heterogeneous disorders such as Joubert syndrome, microcephaly, neuropsychiatric disorders and type 2 limb girdle muscular dystrophy (LGMD2 and then compared UGET to other gene expression based prioritization programs which use small but discrete and well annotated datasets. Finally, we observed a significantly higher gene correlation shared between genes in disease networks associated with similar complex or Mendelian disorders.UGET is an invaluable resource for a geneticist that permits the rapid inclusion of expression criteria from one to hundreds of genes in genomic intervals linked to disease. By using thousands of arrays UGET annotates and prioritizes genes better than other tools especially with rare tissue disorders or complex multi-tissue biological processes. This information can be critical in prioritization of candidate genes for sequence analysis.
Bergholdt, Regine; Brorsson, Caroline; Palleja, Albert
Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated with dis......-cells. Our results provide novel insight to the mechanisms behind type 1 diabetes pathogenesis and, thus, may provide the basis for the design of novel treatment strategies.......Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated...... with disease, and they do not typically inform the broader context in which the disease genes operate. Here, we integrated type 1 diabetes GWAS data with protein-protein interactions to construct biological networks of relevance for disease. A total of 17 networks were identified. To prioritize...
Ethan J. Andersen
Full Text Available We report data associated with the identification of 242 disease resistance genes (R-genes in the genome of Setaria italica as presented in “Genetic diversity of disease resistance genes in foxtail millet (Setaria italica L.” (Andersen and Nepal, 2017 . Our data describe the structure and evolution of the Coiled-coil, Nucleotide-binding site, Leucine-rich repeat (CNL R-genes in foxtail millet. The CNL genes were identified through rigorous extraction and analysis of recently available plant genome sequences using cutting-edge analytical software. Data visualization includes gene structure diagrams, chromosomal syntenic maps, a chromosomal density plot, and a maximum-likelihood phylogenetic tree comparing Sorghum bicolor, Panicum virgatum, Setaria italica, and Arabidopsis thaliana. Compilation of InterProScan annotations, Gene Ontology (GO annotations, and Basic Local Alignment Search Tool (BLAST results for the 242 R-genes identified in the foxtail millet genome are also included in tabular format.
Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.
Scherer Stephen W
Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
Bloss, Cinnamon S.; Scott-Van Zeeland, Ashley A.; Topol, Sarah E.; Darst, Burcu F.; Boeldt, Debra L.; Erikson, Galina A.; Bethel, Kelly J.; Bjork, Robert L.; Friedman, Jennifer R.; Hwynn, Nelson; Patay, Bradley A.; Pockros, Paul J.; Scott, Erick R.; Simon, Ronald A.; Williams, Gary W.
Purpose The Scripps Idiopathic Diseases of huMan (IDIOM) study aims to discover novel gene-disease relationships and provide molecular genetic diagnosis and treatment guidance for individuals with novel diseases using genome sequencing integrated with clinical assessment and multidisciplinary case review. Methods Here we describe the IDIOM study operational protocol and initial results. Results 121 cases underwent first tier review by the principal investigators to determine if the primary in...
Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E
The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Rocha Eduardo PC
Full Text Available Abstract Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.
Bloss, Cinnamon S.; Scott-Van Zeeland, Ashley A.; Topol, Sarah E.; Darst, Burcu F.; Boeldt, Debra L.; Erikson, Galina A.; Bethel, Kelly J.; Bjork, Robert L.; Friedman, Jennifer R.; Hwynn, Nelson; Patay, Bradley A.; Pockros, Paul J.; Scott, Erick R.; Simon, Ronald A.; Williams, Gary W.; Schork, Nicholas J.; Topol, Eric J.; Torkamani, Ali
Purpose The Scripps Idiopathic Diseases of huMan (IDIOM) study aims to discover novel gene-disease relationships and provide molecular genetic diagnosis and treatment guidance for individuals with novel diseases using genome sequencing integrated with clinical assessment and multidisciplinary case review. Methods Here we describe the IDIOM study operational protocol and initial results. Results 121 cases underwent first tier review by the principal investigators to determine if the primary inclusion criteria were satisfied, 59 (48.8%) underwent second tier review by our clinician-scientist review panel, and 17 (14.0%) patients and their family members were enrolled. 60% of cases resulted in a plausible molecular diagnosis. 18% of cases resulted in a confirmed molecular diagnosis. 2 of 3 confirmed cases led to the identification of novel gene-disease relationships. In the third confirmed case, a previously described but unrecognized disease was revealed. In all three confirmed cases, a new clinical management strategy was initiated based on the genetic findings. Conclusions Genome sequencing provides tangible clinical benefit for individuals with idiopathic genetic disease, not only in the context of molecular genetic diagnosis of known rare conditions, but also in cases where prior clinical information regarding a new genetic disorder is lacking. PMID:25790160
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu
MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.
Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D
The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
Xu, Shuqing; Clark, Terry; Zheng, Hongkun
-chromosomal conversions distributed between chromosome 1 and 5, 2 and 6, and 3 and 5 are more frequent than genome average (Z-test, P ... is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less...... involved in conversion events. CONCLUSION: The evolution of gene families in the rice genome may have been accelerated by conversion with pseudogenes. Our analysis suggests a possible role for gene conversion in the evolution of pathogen-response genes....
Barbara E Stranger
Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun
A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.
Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción
In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a
Hamza, Taye H.; Chen, Honglei; Hill-Burns, Erin M.; Rhodes, Shannon L.; Montimurro, Jennifer; Kay, Denise M.; Tenesa, Albert; Kusel, Victoria I.; Sheehan, Patricia; Eaaswarkhanth, Muthukrishnan; Yearout, Dora; Samii, Ali; Roberts, John W.; Agarwal, Pinky; Bordelon, Yvette; Park, Yikyung; Wang, Liyong; Gao, Jianjun; Vance, Jeffery M.; Kendler, Kenneth S.; Bacanu, Silviu-Alin; Scott, William K.; Ritz, Beate; Nutt, John; Factor, Stewart A.; Zabetian, Cyrus P.; Payami, Haydeh
Our aim was to identify genes that influence the inverse association of coffee with the risk of developing Parkinson's disease (PD). We used genome-wide genotype data and lifetime caffeinated-coffee-consumption data on 1,458 persons with PD and 931 without PD from the NeuroGenetics Research Consortium (NGRC), and we performed a genome-wide association and interaction study (GWAIS), testing each SNP's main-effect plus its interaction with coffee, adjusting for sex, age, and two principal components. We then stratified subjects as heavy or light coffee-drinkers and performed genome-wide association study (GWAS) in each group. We replicated the most significant SNP. Finally, we imputed the NGRC dataset, increasing genomic coverage to examine the region of interest in detail. The primary analyses (GWAIS, GWAS, Replication) were performed using genotyped data. In GWAIS, the most significant signal came from rs4998386 and the neighboring SNPs in GRIN2A. GRIN2A encodes an NMDA-glutamate-receptor subunit and regulates excitatory neurotransmission in the brain. Achieving P2df = 10−6, GRIN2A surpassed all known PD susceptibility genes in significance in the GWAIS. In stratified GWAS, the GRIN2A signal was present in heavy coffee-drinkers (OR = 0.43; P = 6×10−7) but not in light coffee-drinkers. The a priori Replication hypothesis that “Among heavy coffee-drinkers, rs4998386_T carriers have lower PD risk than rs4998386_CC carriers” was confirmed: ORReplication = 0.59, PReplication = 10−3; ORPooled = 0.51, PPooled = 7×10−8. Compared to light coffee-drinkers with rs4998386_CC genotype, heavy coffee-drinkers with rs4998386_CC genotype had 18% lower risk (P = 3×10−3), whereas heavy coffee-drinkers with rs4998386_TC genotype had 59% lower risk (P = 6×10−13). Imputation revealed a block of SNPs that achieved P2dfcoffee-drinkers. This study is proof of concept that inclusion of environmental factors can help identify genes that
Hamza, Taye H; Chen, Honglei; Hill-Burns, Erin M; Rhodes, Shannon L; Montimurro, Jennifer; Kay, Denise M; Tenesa, Albert; Kusel, Victoria I; Sheehan, Patricia; Eaaswarkhanth, Muthukrishnan; Yearout, Dora; Samii, Ali; Roberts, John W; Agarwal, Pinky; Bordelon, Yvette; Park, Yikyung; Wang, Liyong; Gao, Jianjun; Vance, Jeffery M; Kendler, Kenneth S; Bacanu, Silviu-Alin; Scott, William K; Ritz, Beate; Nutt, John; Factor, Stewart A; Zabetian, Cyrus P; Payami, Haydeh
Our aim was to identify genes that influence the inverse association of coffee with the risk of developing Parkinson's disease (PD). We used genome-wide genotype data and lifetime caffeinated-coffee-consumption data on 1,458 persons with PD and 931 without PD from the NeuroGenetics Research Consortium (NGRC), and we performed a genome-wide association and interaction study (GWAIS), testing each SNP's main-effect plus its interaction with coffee, adjusting for sex, age, and two principal components. We then stratified subjects as heavy or light coffee-drinkers and performed genome-wide association study (GWAS) in each group. We replicated the most significant SNP. Finally, we imputed the NGRC dataset, increasing genomic coverage to examine the region of interest in detail. The primary analyses (GWAIS, GWAS, Replication) were performed using genotyped data. In GWAIS, the most significant signal came from rs4998386 and the neighboring SNPs in GRIN2A. GRIN2A encodes an NMDA-glutamate-receptor subunit and regulates excitatory neurotransmission in the brain. Achieving P(2df) = 10(-6), GRIN2A surpassed all known PD susceptibility genes in significance in the GWAIS. In stratified GWAS, the GRIN2A signal was present in heavy coffee-drinkers (OR = 0.43; P = 6×10(-7)) but not in light coffee-drinkers. The a priori Replication hypothesis that "Among heavy coffee-drinkers, rs4998386_T carriers have lower PD risk than rs4998386_CC carriers" was confirmed: OR(Replication) = 0.59, P(Replication) = 10(-3); OR(Pooled) = 0.51, P(Pooled) = 7×10(-8). Compared to light coffee-drinkers with rs4998386_CC genotype, heavy coffee-drinkers with rs4998386_CC genotype had 18% lower risk (P = 3×10(-3)), whereas heavy coffee-drinkers with rs4998386_TC genotype had 59% lower risk (P = 6×10(-13)). Imputation revealed a block of SNPs that achieved P(2df)coffee-drinkers. This study is proof of concept that inclusion of environmental factors can help identify
Titus, Tom A.; Yan, Yi-Lin; Wilson, Catherine; Starks, Amber M.; Frohnmayer, Jonathan D.; Canestro, Cristian; Rodriguez-Mari, Adriana; He, Xinjun; Postlethwait, John H.
Fanconi anemia (FA) is a genic disease resulting in bone marrow failure, high cancer risks, and infertility, and developmental anomalies including microphthalmia, microcephaly, hypoplastic radius and thumb. Here we present cDNA sequences, genetic mapping, and genomic analyses for the four previously undescribed zebrafish FA genes (fanci, fancj, fancm, and fancn, and show that they reverted to single copy after the teleost genome duplication. We tested the hypothesis that FA genes are expresse...
Full Text Available Abstract Background Rice is one of the most important food crops in the world. With increasing world demand for food crops, there is an urgent need to develop new cultivars that have enhanced performance with regard to yield, disease resistance, and so on. Wild rice is expected to provide useful genetic resources that could improve the present cultivated species. However, the quantity and quality of these unexplored resources remain unclear. Recent accumulation of the genomic information of both cultivated and wild rice species allows for their comparison at the molecular level. Here, we compared the genome sequence of Oryza sativa ssp. japonica with sets of bacterial artificial chromosome end sequences (BESs from two wild rice species, O. rufipogon and O. nivara, and an African rice species, O. glaberrima. Results We found that about four to five percent of the BESs of the two wild rice species and about seven percent of the African rice could not be mapped to the japonica genome, suggesting that a substantial number of genes have been lost in the japonica rice lineage; however, their close relatives still possess their counterpart genes. We estimated that during evolution, O. sativa has lost at least one thousand genes that are still preserved in the genomes of the other species. In addition, our BLASTX searches against the non-redundant protein sequence database showed that disease resistance-related proteins were significantly overrepresented in the close relative-specific genomic portions. In total, 235 unmapped BESs of the three relatives matched 83 non-redundant proteins that contained a disease resistance protein domain, most of which corresponded to an NBS-LRR domain. Conclusion We found that the O. sativa lineage appears to have recently experienced massive gene losses following divergence from its wild ancestor. Our results imply that the domestication process accelerated large-scale genomic deletions in the lineage of Asian
Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.
Trevisan, Marta; Palù, Giorgio; Barzon, Luisa
Genome editing by programmable nucleases represents a promising tool that could be exploited to develop new therapeutic strategies to fight infectious diseases. These nucleases, such as zinc-finger nucleases, transcription activator-like effector nucleases, clustered regularly interspaced short palindromic repeat (CRISPR)-CRISPR-associated protein 9 (Cas9) and homing endonucleases, are molecular scissors that can be targeted at predetermined loci in order to modify the genome sequence of an organism. Areas covered: By perturbing genomic DNA at predetermined loci, programmable nucleases can be used as antiviral and antimicrobial treatment. This approach includes targeting of essential viral genes or viral sequences able, once mutated, to inhibit viral replication; repurposing of CRISPR-Cas9 system for lethal self-targeting of bacteria; targeting antibiotic-resistance and virulence genes in bacteria, fungi, and parasites; engineering arthropod vectors to prevent vector-borne infections. Expert commentary: While progress has been done in demonstrating the feasibility of using genome editing as antimicrobial strategy, there are still many hurdles to overcome, such as the risk of off-target mutations, the raising of escape mutants, and the inefficiency of delivery methods, before translating results from preclinical studies into clinical applications.
Tong, Pin; Monahan, Jack; Prendergast, James G D
Large-scale gene expression datasets are providing an increasing understanding of the location of cis-eQTLs in the human genome and their role in disease. However, little is currently known regarding the extent of regulatory site-sharing between genes. This is despite it having potentially wide-ranging implications, from the determination of the way in which genetic variants may shape multiple phenotypes to the understanding of the evolution of human gene order. By first identifying the location of non-redundant cis-eQTLs, we show that regulatory site-sharing is a relatively common phenomenon in the human genome, with over 10% of non-redundant regulatory variants linked to the expression of multiple nearby genes. We show that these shared, local regulatory sites are linked to high levels of chromatin looping between the regulatory sites and their associated genes. In addition, these co-regulated gene modules are found to be strongly conserved across mammalian species, suggesting that shared regulatory sites have played an important role in shaping human gene order. The association of these shared cis-eQTLs with multiple genes means they also appear to be unusually important in understanding the genetics of human phenotypes and pleiotropy, with shared regulatory sites more often linked to multiple human phenotypes than other regulatory variants. This study shows that regulatory site-sharing is likely an underappreciated aspect of gene regulation and has important implications for the understanding of various biological phenomena, including how the two and three dimensional structures of the genome have been shaped and the potential causes of disease pleiotropy outside coding regions.
Shaw, Chris D.
This paper introduces an analysis-based zoomable visualization technique for displaying the location of genes across many related species of microbes. The purpose of this visualizatiuon is to enable a biologist to examine the layout of genes in the organism of interest with respect to the gene organization of related organisms. During the genomic annotation process, the ability to observe gene organization in common with previously annotated genomes can help a biologist better confirm the structure and function of newly analyzed microbe DNA sequences. We have developed a visualization and analysis tool that enables the biologist to observe and examine gene organization among genomes, in the context of the primary sequence of interest. This paper describes the visualization and analysis steps, and presents a case study using a number of Rickettsia genomes.
Streptococcus agalactiae strain 138P was isolated from the kidney of diseased Nile tilapia in Idaho during a 2007 streptococcal disease outbreak. The full genome of S. agalactiae 138P is 1,838,716 bp. The availability of this genome will allow comparative genomics to identify genes for antigen disco...
König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario
As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: firstname.lastname@example.org or email@example.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Have, Christian Theil; Mørk, Søren
We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...
Rakitina, Daria V; Manolov, Alexander I; Kanygina, Alexandra V; Garushyants, Sofya K; Baikova, Julia P; Alexeev, Dmitry G; Ladygina, Valentina G; Kostryukova, Elena S; Larin, Andrei K; Semashko, Tatiana A; Karpova, Irina Y; Babenko, Vladislav V; Ismagilova, Ruzilya K; Malanin, Sergei Y; Gelfand, Mikhail S; Ilina, Elena N; Gorodnichev, Roman B; Lisitsyna, Eugenia S; Aleshkin, Gennady I; Scherbakov, Petr L; Khalif, Igor L; Shapina, Marina V; Maev, Igor V; Andreev, Dmitry N; Govorun, Vadim M
Escherichia coli (E. coli) has been increasingly implicated in the pathogenesis of Crohn's disease (CD). The phylogeny of E. coli isolated from Crohn's disease patients (CDEC) was controversial, and while genotyping results suggested heterogeneity, the sequenced strains of E. coli from CD patients were closely related. We performed the shotgun genome sequencing of 28 E. coli isolates from ten CD patients and compared genomes from these isolates with already published genomes of CD strains and other pathogenic and non-pathogenic strains. CDEC was shown to belong to A, B1, B2 and D phylogenetic groups. The plasmid and several operons from the reference CD-associated E. coli strain LF82 were demonstrated to be more often present in CDEC genomes belonging to different phylogenetic groups than in genomes of commensal strains. The operons include carbon-source induced invasion GimA island, prophage I, iron uptake operons I and II, capsular assembly pathogenetic island IV and propanediol and galactitol utilization operons. Our findings suggest that CDEC are phylogenetically diverse. However, some strains isolated from independent sources possess highly similar chromosome or plasmids. Though no CD-specific genes or functional domains were present in all CD-associated strains, some genes and operons are more often found in the genomes of CDEC than in commensal E. coli. They are principally linked to gut colonization and utilization of propanediol and other sugar alcohols.
Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles
A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society
Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles
A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.
Full Text Available Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.
Agung, Muhammad Budi; Budiarsa, I. Made; Suwastika, I. Nengah
Cocoa bean is one of the main commodities from Indonesia for the world, which still have problem regarding yield degradation due to pathogens and disease attack. Developing robust cacao plant that genetically resistant to pathogen and disease attack is an ideal solution in over taking on this problem. The aim of this study was to identify Theobroma cacao genes on database of cacao genome that homolog to response genes of pathogen and disease attack in other plant, through in silico analysis. Basic information survey and gene identification were performed in GenBank and The Arabidopsis Information Resource database. The In silico analysis contains protein BLAST, homology test of each gene's protein candidates, and identification of homologue gene in Cacao Genome Database using data source "Theobroma cacao cv. Matina 1-6 v1.1" genome. Identification found that Thecc1EG011959t1 (EDS1), Thecc1EG006803t1 (EDS5), Thecc1EG013842t1 (ICS1), and Thecc1EG015614t1 (BG_PPAP) gene of Cacao Genome Database were Theobroma cacao genes that homolog to plant's resistance genes which highly possible to have similar functions of each gene's homologue gene.
Purpose: An increasing number of Mendelian disorders have been identified for which two or more variants in one or more genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of variants underlying oligogenic diseases in individual whole exome or whole genome sequences. Methods: Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical research can provide useful information and improve variant prioritization for Mendelian diseases. Additionally, background knowledge about interactions between genes can be utilized to guide and restrict the selection of candidate disease modules. Results: We developed OligoPVP, an algorithm that can be used to identify variants in oligogenic diseases and their interactions, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods. Conclusions: Our results show that OligoPVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.
Castro-Santos, Patricia; Díaz-Peña, Roberto
Most rheumatic diseases are complex or multifactorial entities with pathogeneses that interact with both multiple genetic factors and a high number of diverse environmental factors. Knowledge of the human genome sequence and its diversity among populations has provided a crucial step forward in our understanding of genetic diseases, identifying many genetic loci or genes associated with diverse phenotypes. In general, susceptibility to autoimmunity is associated with multiple risk factors, but the mechanism of the environmental component influence is poorly understood. Studies in twins have demonstrated that genetics do not explain the totality of the pathogenesis of rheumatic diseases. One method of modulating gene expression through environmental effects is via epigenetic modifications. These techniques open a new field for identifying useful new biomarkers and therapeutic targets. In this context, the development of "-omics" techniques is an opportunity to progress in our knowledge of complex diseases, impacting the discovery of new potential biomarkers suitable for their introduction into clinical practice. In this review, we focus on the recent advances in the fields of genomics and epigenomics in rheumatic diseases and their potential to be useful for the diagnosis, follow-up, and treatment of these diseases. The ultimate aim of genomic studies in any human disease is to understand its pathogenesis, thereby enabling the prediction of the evolution of the disease to establish new treatments and address the development of personalized therapies.
Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl
Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Mannini, Linda; Menga, Stefania; Musio, Antonio
Cohesin is responsible for sister chromatid cohesion, ensuring the correct chromosome segregation. Beyond this role, cohesin and regulatory cohesin genes seem to play a role in preserving genome stability and gene transcription regulation. DNA damage is thought to be a major culprit for many human diseases, including cancer. Our present knowledge of the molecular basis underlying genome instability is extremely limited. Mutations in cohesin genes cause human diseases such as Cornelia de Lange syndrome and Roberts syndrome/SC phocomelia, and all the cell lines derived from affected patients show genome instability. Cohesin mutations have also been identified in colorectal cancer. Here, we will discuss the human disorders caused by alterations of cohesin function, with emphasis on the emerging role of cohesin as a genome stability caretaker.
Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G
The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.
Doerr, Daniel; Kowada, Luis Antonio B; Araujo, Eloi; Deshpande, Shachi; Dantas, Simone; Moret, Bernard M E; Stoye, Jens
Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.
Jahanshad, Neda; Rajagopalan, Priya; Hua, Xue; Hibar, Derrek P.; Nir, Talia M.; Toga, Arthur W.; Jack, Clifford R.; Saykin, Andrew J.; Green, Robert C.; Weiner, Michael W.; Medland, Sarah E.; Montgomery, Grant W.; Hansell, Narelle K.; McMahon, Katie L.; de Zubicaray, Greig I.; Martin, Nicholas G.; Wright, Margaret J.; Thompson, Paul M.; Weiner, Michael; Aisen, Paul; Weiner, Michael; Aisen, Paul; Petersen, Ronald; Jack, Clifford R.; Jagust, William; Trojanowski, John Q.; Toga, Arthur W.; Beckett, Laurel; Green, Robert C.; Saykin, Andrew J.; Morris, John; Liu, Enchi; Green, Robert C.; Montine, Tom; Petersen, Ronald; Aisen, Paul; Gamst, Anthony; Thomas, Ronald G.; Donohue, Michael; Walter, Sarah; Gessert, Devon; Sather, Tamie; Beckett, Laurel; Harvey, Danielle; Gamst, Anthony; Donohue, Michael; Kornak, John; Jack, Clifford R.; Dale, Anders; Bernstein, Matthew; Felmlee, Joel; Fox, Nick; Thompson, Paul; Schuff, Norbert; Alexander, Gene; DeCarli, Charles; Jagust, William; Bandy, Dan; Koeppe, Robert A.; Foster, Norm; Reiman, Eric M.; Chen, Kewei; Mathis, Chet; Morris, John; Cairns, Nigel J.; Taylor-Reinwald, Lisa; Trojanowki, J.Q.; Shaw, Les; Lee, Virginia M.Y.; Korecka, Magdalena; Toga, Arthur W.; Crawford, Karen; Neu, Scott; Saykin, Andrew J.; Foroud, Tatiana M.; Potkin, Steven; Shen, Li; Khachaturian, Zaven; Frank, Richard; Snyder, Peter J.; Molchan, Susan; Kaye, Jeffrey; Quinn, Joseph; Lind, Betty; Dolen, Sara; Schneider, Lon S.; Pawluczyk, Sonia; Spann, Bryan M.; Brewer, James; Vanderswag, Helen; Heidebrink, Judith L.; Lord, Joanne L.; Petersen, Ronald; Johnson, Kris; Doody, Rachelle S.; Villanueva-Meyer, Javier; Chowdhury, Munir; Stern, Yaakov; Honig, Lawrence S.; Bell, Karen L.; Morris, John C.; Ances, Beau; Carroll, Maria; Leon, Sue; Mintun, Mark A.; Schneider, Stacy; Marson, Daniel; Griffith, Randall; Clark, David; Grossman, Hillel; Mitsis, Effie; Romirowsky, Aliza; deToledo-Morrell, Leyla; Shah, Raj C.; Duara, Ranjan; Varon, Daniel; Roberts, Peggy; Albert, Marilyn; Onyike, Chiadi; Kielb, Stephanie; Rusinek, Henry; de Leon, Mony J.; Glodzik, Lidia; De Santi, Susan; Doraiswamy, P. Murali; Petrella, Jeffrey R.; Coleman, R. Edward; Arnold, Steven E.; Karlawish, Jason H.; Wolk, David; Smith, Charles D.; Jicha, Greg; Hardy, Peter; Lopez, Oscar L.; Oakley, MaryAnn; Simpson, Donna M.; Porsteinsson, Anton P.; Goldstein, Bonnie S.; Martin, Kim; Makino, Kelly M.; Ismail, M. Saleem; Brand, Connie; Mulnard, Ruth A.; Thai, Gaby; Mc-Adams-Ortiz, Catherine; Womack, Kyle; Mathews, Dana; Quiceno, Mary; Diaz-Arrastia, Ramon; King, Richard; Weiner, Myron; Martin-Cook, Kristen; DeVous, Michael; Levey, Allan I.; Lah, James J.; Cellar, Janet S.; Burns, Jeffrey M.; Anderson, Heather S.; Swerdlow, Russell H.; Apostolova, Liana; Lu, Po H.; Bartzokis, George; Silverman, Daniel H.S.; Graff-Radford, Neill R.; Parfitt, Francine; Johnson, Heather; Farlow, Martin R.; Hake, Ann Marie; Matthews, Brandy R.; Herring, Scott; van Dyck, Christopher H.; Carson, Richard E.; MacAvoy, Martha G.; Chertkow, Howard; Bergman, Howard; Hosein, Chris; Black, Sandra; Stefanovic, Bojana; Caldwell, Curtis; Hsiung, Ging-Yuek Robin; Feldman, Howard; Mudge, Benita; Assaly, Michele; Kertesz, Andrew; Rogers, John; Trost, Dick; Bernick, Charles; Munic, Donna; Kerwin, Diana; Mesulam, Marek-Marsel; Lipowski, Kristina; Wu, Chuang-Kuo; Johnson, Nancy; Sadowsky, Carl; Martinez, Walter; Villena, Teresa; Turner, Raymond Scott; Johnson, Kathleen; Reynolds, Brigid; Sperling, Reisa A.; Johnson, Keith A.; Marshall, Gad; Frey, Meghan; Yesavage, Jerome; Taylor, Joy L.; Lane, Barton; Rosen, Allyson; Tinklenberg, Jared; Sabbagh, Marwan; Belden, Christine; Jacobson, Sandra; Kowall, Neil; Killiany, Ronald; Budson, Andrew E.; Norbash, Alexander; Johnson, Patricia Lynn; Obisesan, Thomas O.; Wolday, Saba; Bwayo, Salome K.; Lerner, Alan; Hudson, Leon; Ogrocki, Paula; Fletcher, Evan; Carmichael, Owen; Olichney, John; DeCarli, Charles; Kittur, Smita; Borrie, Michael; Lee, T.-Y.; Bartha, Rob; Johnson, Sterling; Asthana, Sanjay; Carlsson, Cynthia M.; Potkin, Steven G.; Preda, Adrian; Nguyen, Dana; Tariot, Pierre; Fleisher, Adam; Reeder, Stephanie; Bates, Vernice; Capote, Horacio; Rainka, Michelle; Scharre, Douglas W.; Kataki, Maria; Zimmerman, Earl A.; Celmins, Dzintra; Brown, Alice D.; Pearlson, Godfrey D.; Blank, Karen; Anderson, Karen; Saykin, Andrew J.; Santulli, Robert B.; Schwartz, Eben S.; Sink, Kaycee M.; Williamson, Jeff D.; Garg, Pradeep; Watkins, Franklin; Ott, Brian R.; Querfurth, Henry; Tremont, Geoffrey; Salloway, Stephen; Malloy, Paul; Correia, Stephen; Rosen, Howard J.; Miller, Bruce L.; Mintzer, Jacobo; Longmire, Crystal Flynn; Spicer, Kenneth; Finger, Elizabeth; Rachinsky, Irina; Rogers, John; Kertesz, Andrew; Drost, Dick
Aberrant connectivity is implicated in many neurological and psychiatric disorders, including Alzheimer’s disease and schizophrenia. However, other than a few disease-associated candidate genes, we know little about the degree to which genetics play a role in the brain networks; we know even less about specific genes that influence brain connections. Twin and family-based studies can generate estimates of overall genetic influences on a trait, but genome-wide association scans (GWASs) can screen the genome for specific variants influencing the brain or risk for disease. To identify the heritability of various brain connections, we scanned healthy young adult twins with high-field, high-angular resolution diffusion MRI. We adapted GWASs to screen the brain’s connectivity pattern, allowing us to discover genetic variants that affect the human brain’s wiring. The association of connectivity with the SPON1 variant at rs2618516 on chromosome 11 (11p15.2) reached connectome-wide, genome-wide significance after stringent statistical corrections were enforced, and it was replicated in an independent subsample. rs2618516 was shown to affect brain structure in an elderly population with varying degrees of dementia. Older people who carried the connectivity variant had significantly milder clinical dementia scores and lower risk of Alzheimer’s disease. As a posthoc analysis, we conducted GWASs on several organizational and topological network measures derived from the matrices to discover variants in and around genes associated with autism (MACROD2), development (NEDD4), and mental retardation (UBE2A) significantly associated with connectivity. Connectome-wide, genome-wide screening offers substantial promise to discover genes affecting brain connectivity and risk for brain diseases. PMID:23471985
Taye H Hamza
Full Text Available Our aim was to identify genes that influence the inverse association of coffee with the risk of developing Parkinson's disease (PD. We used genome-wide genotype data and lifetime caffeinated-coffee-consumption data on 1,458 persons with PD and 931 without PD from the NeuroGenetics Research Consortium (NGRC, and we performed a genome-wide association and interaction study (GWAIS, testing each SNP's main-effect plus its interaction with coffee, adjusting for sex, age, and two principal components. We then stratified subjects as heavy or light coffee-drinkers and performed genome-wide association study (GWAS in each group. We replicated the most significant SNP. Finally, we imputed the NGRC dataset, increasing genomic coverage to examine the region of interest in detail. The primary analyses (GWAIS, GWAS, Replication were performed using genotyped data. In GWAIS, the most significant signal came from rs4998386 and the neighboring SNPs in GRIN2A. GRIN2A encodes an NMDA-glutamate-receptor subunit and regulates excitatory neurotransmission in the brain. Achieving P(2df = 10(-6, GRIN2A surpassed all known PD susceptibility genes in significance in the GWAIS. In stratified GWAS, the GRIN2A signal was present in heavy coffee-drinkers (OR = 0.43; P = 6×10(-7 but not in light coffee-drinkers. The a priori Replication hypothesis that "Among heavy coffee-drinkers, rs4998386_T carriers have lower PD risk than rs4998386_CC carriers" was confirmed: OR(Replication = 0.59, P(Replication = 10(-3; OR(Pooled = 0.51, P(Pooled = 7×10(-8. Compared to light coffee-drinkers with rs4998386_CC genotype, heavy coffee-drinkers with rs4998386_CC genotype had 18% lower risk (P = 3×10(-3, whereas heavy coffee-drinkers with rs4998386_TC genotype had 59% lower risk (P = 6×10(-13. Imputation revealed a block of SNPs that achieved P(2df<5×10(-8 in GWAIS, and OR = 0.41, P = 3×10(-8 in heavy coffee-drinkers. This study is proof of
Frech, Christian; Chen, Nansheng
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221
Susanta K Behura
Full Text Available Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1 are components of developmental signaling pathways, 2 regulate fundamental developmental processes, 3 are critical for the development of tissues of vector importance, 4 function in developmental processes known to have diverged within insects, and 5 encode microRNAs (miRNAs that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.
Sorek, Rotem; Rubin, Edward M.
We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.
Tabara, Yasuharu; Kohara, Katsuhiko; Miki, Tetsuro
The Millennium Genome Project for Hypertension was started in 2000 to identify genetic variants conferring susceptibility to hypertension, with the aim of furthering the understanding of the pathogenesis of this condition and realizing genome-based personalized medical care. Two different approaches were launched, genome-wide association analysis using single-nucleotide polymorphisms (SNPs) and microsatellite markers, and systematic candidate gene analysis, under the hypothesis that common variants have an important role in the etiology of common diseases. These multilateral approaches identified ATP2B1 as a gene responsible for hypertension in not only Japanese but also Caucasians. The high blood pressure susceptibility conferred by certain alleles of ATP2B1 has been widely replicated in various populations. Ex vivo mRNA expression analysis in umbilical artery smooth muscle cells indicated that reduced expression of this gene associated with the risk allele may be an underlying mechanism relating the ATP2B1 variant to hypertension. However, the effect size of a SNP was too small to clarify the entire picture of the genetic basis of hypertension. Further, dense genome analysis with accurate phenotype data may be required.
Stata, Matt; Wang, Wei; White, Merlin M.; Moncalvo, Jean-Marc
ABSTRACT Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. PMID:29764946
Ferrin Thomas E
Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.
Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi
Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome
Rossin, Elizabeth J.; Hansen, Kasper Lage; Raychaudhuri, Soumya
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these r......Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed...... in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more...... that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non...
Full Text Available Tandemly arrayed genes (TAGs are duplicated genes that are linked as neighbors on a chromosome, many of which have important physiological and biochemical functions. Here we performed a survey of these genes in 11 available vertebrate genomes. TAGs account for an average of about 14% of all genes in these vertebrate genomes, and about 25% of all duplications. The majority of TAGs (72–94% have parallel transcription orientation (i.e., they are encoded on the same strand in contrast to the genome, which has about 50% of its genes in parallel transcription orientation. The majority of tandem arrays have only two members. In all species, the proportion of genes that belong to TAGs tends to be higher in large gene families than in small ones; together with our recent finding that tandem duplication played a more important role than retroposition in large families, this fact suggests that among all types of duplication mechanisms, tandem duplication is the predominant mechanism of duplication, especially in large families. Finally, several species have a higher proportion of large tandem arrays that are species-specific than random expectation.
Carlice-Dos-Reis, Tânia; Viana, Jaime; Moreira, Fabiano Cordeiro; Cardoso, Greice de Lemos; Guerreiro, João; Santos, Sidney; Ribeiro-Dos-Santos, Ândrea
Mutations in the HBB gene are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Sickle cell anemia is one of the most common monogenic diseases worldwide. Due to its prevalence, diverse strategies have been developed for a better understanding of its molecular mechanisms. In silico analysis has been increasingly used to investigate the genotype-phenotype relationship of many diseases, and the sequences of healthy individuals deposited in the 1,000 Genomes database appear to be an excellent tool for such analysis. The objective of this study is to analyze the variations in the HBB gene in the 1,000 Genomes database, to describe the mutation frequencies in the different population groups, and to investigate the pattern of pathogenicity. The computational tool SNPEFF was used to align the data from 2,504 samples of the 1,000 Genomes database with the HG19 genome reference. The pathogenicity of each amino acid change was investigated using the databases CLINVAR, dbSNP and HbVar and five different predictors. Twenty different mutations were found in 209 healthy individuals. The African group had the highest number of individuals with mutations, and the European group had the lowest number. Thus, it is concluded that approximately 8.3% of phenotypically healthy individuals from the 1,000 Genomes database have some mutation in the HBB gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, which 186 (7.4%) have a deleterious mutation. Considering that the 1,000 Genomes database is representative of the world's population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation.
A study was conducted to detect the presence of disease resistance genes to infection of wheat powdery mildew (Blumeria graminis f. sp. tritici) in selected wheat cultivars from China using molecular markers. Genomic DNA of sixty cultivars was extracted and tested for the presence of selected prominent resistance genes to ...
van Veelen Peter A
Full Text Available Abstract Background Bread wheat (Triticum aestivum is an important staple food. However, wheat gluten proteins cause celiac disease (CD in 0.5 to 1% of the general population. Among these proteins, the α-gliadins contain several peptides that are associated to the disease. Results We obtained 230 distinct α-gliadin gene sequences from severaldiploid wheat species representing the ancestral A, B, and D genomes of the hexaploid bread wheat. The large majority of these sequences (87% contained an internal stop codon. All α-gliadin sequences could be distinguished according to the genome of origin on the basis of sequence similarity, of the average length of the polyglutamine repeats, and of the differences in the presence of four peptides that have been identified as T cell stimulatory epitopes in CD patients through binding to HLA-DQ2/8. By sequence similarity, α-gliadins from the public database of hexaploid T. aestivum could be assigned directly to chromosome 6A, 6B, or 6D. T. monococcum (A genome sequences, as well as those from chromosome 6A of bread wheat, almost invariably contained epitope glia-α9 and glia-α20, but never the intact epitopes glia-α and glia-α2. A number of sequences from T. speltoides, as well as a number of sequences fromchromosome 6B of bread wheat, did not contain any of the four T cell epitopes screened for. The sequences from T. tauschii (D genome, as well as those from chromosome 6D of bread wheat, were found to contain all of these T cell epitopes in variable combinations per gene. The differences in epitope composition resulted mainly from point mutations. These substitutions appeared to be genome specific. Conclusion Our analysis shows that α-gliadin sequences from the three genomes of bread wheat form distinct groups. The four known T cell stimulatory epitopes are distributed non-randomly across the sequences, indicating that the three genomes contribute differently to epitope content. A systematic
Andersen, Ethan J.; Nepal, Madhav P.
We report data associated with the identification of 242 disease resistance genes (R-genes) in the genome of Setaria italica as presented in “Genetic diversity of disease resistance genes in foxtail millet (Setaria italica L.)” (Andersen and Nepal, 2017) . Our data describe the structure and evolution of the Coiled-coil, Nucleotide-binding site, Leucine-rich repeat (CNL) R-genes in foxtail millet. The CNL genes were identified through rigorous extraction and analysis of recently available ...
McKay, M J; Gaballa, M A
Somatic gene therapy of vascular diseases is a promising new field in modern medicine. Recent advancements in gene transfer technology have greatly evolved our understanding of the pathophysiologic role of candidate disease genes. With this knowledge, the expression of selective gene products provides the means to test the therapeutic use of gene therapy in a multitude of medical conditions. In addition, with the completion of genome sequencing programs, gene transfer can be used also to study the biologic function of novel genes in vivo. Novel genes are delivered to targeted tissue via several different vehicles. These vectors include adenoviruses, retroviruses, plasmids, plasmid/liposomes, and oligonucleotides. However, each one of these vectors has inherent limitations. Further investigations into developing delivery systems that not only allow for efficient, targeted gene transfer, but also are stable and nonimmunogenic, will optimize the clinical application of gene therapy in vascular diseases. This review further discusses the available mode of gene delivery and examines six major areas in vascular gene therapy, namely prevention of restenosis, thrombosis, hypertension, atherosclerosis, peripheral vascular disease in congestive heart failure, and ischemia. Although we highlight some of the recent advances in the use of gene therapy in treating vascular disease discovered primarily during the past two years, many excellent studies published during that period are not included in this review due to space limitations. The following is a selective review of practical uses of gene transfer therapy in vascular diseases. This review primarily covers work performed in the last 2 years. For earlier work, the reader may refer to several excellent review articles. For instance, Belalcazer et al. (6) reviewed general aspects of somatic gene therapy and the different vehicles used for the delivery of therapeutic genes. Gene therapy in restenosis and stimulation of
Full Text Available Marek’s disease (MD is a commercially important neoplastic disease of chickens caused by Marek’s disease virus (MDV, an oncogenic alphaherpesvirus. Selecting for increased genetic resistance to MD is a control strategy that can augment vaccinal control measures. To identify high-confidence candidate MD resistance genes, we conducted a genome-wide screen for allele-specific expression (ASE amongst F1 progeny of two inbred chicken lines that differ in MD resistance. High throughput sequencing was used to profile transcriptomes from pools of uninfected and infected individuals at 4 days post-infection to identify any genes showing ASE in response to MDV infection. RNA sequencing identified 22,655 single nucleotide polymorphisms (SNPs of which 5,360 in 3,773 genes exhibited significant allelic imbalance. Illumina GoldenGate assays were subsequently used to quantify regulatory variation controlled at the gene (cis and elsewhere in the genome (trans by examining differences in expression between F1 individuals and artificial F1 RNA pools over 6 time periods in 1,536 of the most significant SNPs identified by RNA sequencing. Allelic imbalance as a result of cis-regulatory changes was confirmed in 861 of the 1,233 GoldenGate assays successfully examined. Furthermore we have identified 7 genes that display trans-regulation only in infected animals and approximately 500 SNP that show a complex interaction between cis- and trans-regulatory changes. Our results indicate ASE analyses are a powerful approach to identify regulatory variation responsible for differences in transcript abundance in genes underlying complex traits. And the genes with SNPs exhibiting ASE provide a strong foundation to further investigate the causative polymorphisms and genetic mechanisms for MD resistance. Finally, the methods used here for identifying specific genes and SNPs may have practical implications for applying marker-assisted selection to complex traits that are
Full Text Available Integrated analyses of functional genomics data have enormous potential for identifying phenotype-associated genes. Tissue-specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. Accounting for tissue specificity in global integration of functional genomics data is challenging, as "functionality" and "functional relationships" are often not resolved for specific tissue types. We address this challenge by generating tissue-specific functional networks, which can effectively represent the diversity of protein function for more accurate identification of phenotype-associated genes in the laboratory mouse. Specifically, we created 107 tissue-specific functional relationship networks through integration of genomic data utilizing knowledge of tissue-specific gene expression patterns. Cross-network comparison revealed significantly changed genes enriched for functions related to specific tissue development. We then utilized these tissue-specific networks to predict genes associated with different phenotypes. Our results demonstrate that prediction performance is significantly improved through using the tissue-specific networks as compared to the global functional network. We used a testis-specific functional relationship network to predict genes associated with male fertility and spermatogenesis phenotypes, and experimentally confirmed one top prediction, Mbyl1. We then focused on a less-common genetic disease, ataxia, and identified candidates uniquely predicted by the cerebellum network, which are supported by both literature and experimental evidence. Our systems-level, tissue-specific scheme advances over traditional global integration and analyses and establishes a prototype to address the tissue-specific effects of genetic perturbations, diseases and drugs.
Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.
Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979
Orr, J.L.; Back, W.; Gu, J.; Leegwater, P.H.; Govindarajan, P.; Conroy, J.; Ducro, B.J.; Arendonk, van J.A.M.
The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of
Andersen, Jannik N; Jansen, Peter G; Echwald, Søren M
sequence databases, we discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts. Direct orthologs were present in the mouse genome for all 38 human PTP genes. In addition, we identified 12 PTP pseudogenes unique to humans...... that have probably contaminated previous bioinformatics analysis of this gene family. PCR amplification and transcript sequencing indicate that some PTP pseudogenes are expressed, but their function (if any) is unknown. Furthermore, we analyzed the enhanced diversity generated by alternative splicing...
Rigby, R J; Fernando, M M A; Vyse, T J
Defining the polymorphisms that contribute to the development of complex genetic disease traits is a challenging, although increasingly tractable problem. Historically, the technical difficulties in conducting association studies across the entire human genome are such that murine models have been used to generate candidate genes for analysis in human complex diseases, such as SLE. In this article we discuss the advantages and disadvantages of this approach and specifically address some assumptions made in the transition from studying one species to another, using lupus as an example. These issues include differences in genetic structure and genetic organisation which are a reflection on the population history. Clearly there are major differences in the histories of the human population and inbred laboratory strains of mice. Both human and murine genomes do exhibit structure at the genetic level. That is to say, they comprise haplotypes which are genomic regions that carry runs of polymorphisms that are not independently inherited. Haplotypes therefore reduce the number of combinations of the polymorphisms in the DNA in that region and facilitate the identification of disease susceptibility genes in both mice and humans. There are now novel means of generating candidate genes in SLE using mutagenesis (with ENU) in mice and identifying mice that generate antinuclear autoimmunity. In addition, murine models still provide a valuable means of exploring the functional consequences of genetic variation. However, advances in technology are such that human geneticists can now screen large fractions of the human genome for disease associations using microchip technologies that provide information on upwards of 100,000 different polymorphisms. These approaches are aimed at identifying haplotypes that carry disease susceptibility mutations and rely less on the generation of candidate genes.
Full Text Available One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns-independent component analysis-to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739, previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1 is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178, which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644 was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the
Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.
Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494
Bertram, Lars; Tanzi, Rudolph E
Genome-wide association studies (GWAS) have gained considerable momentum over the last couple of years for the identification of novel complex disease genes. In the field of Alzheimer's disease (AD), there are currently eight published and two provisionally reported GWAS, highlighting over two dozen novel potential susceptibility loci beyond the well-established APOE association. On the basis of the data available at the time of this writing, the most compelling novel GWAS signal has been observed in GAB2 (GRB2-associated binding protein 2), followed by less consistently replicated signals in galanin-like peptide (GALP), piggyBac transposable element derived 1 (PGBD1), tyrosine kinase, non-receptor 1 (TNK1). Furthermore, consistent replication has been recently announced for CLU (clusterin, also known as apolipoprotein J). Finally, there are at least three replicated loci in hitherto uncharacterized genomic intervals on chromosomes 14q32.13, 14q31.2 and 6q24.1 likely implicating the existence of novel AD genes in these regions. In this review, we will discuss the characteristics and potential relevance to pathogenesis of the outcomes of all currently available GWAS in AD. A particular emphasis will be laid on findings with independent data in favor of the original association.
Munang'andu, Hetron Mweemba; Galindo-Villegas, Jorge; David, Lior
Genome wide studies based on conventional molecular tools and upcoming omics technologies are beginning to gain functional applications in the control and prevention of diseases in teleosts fish. Herein, we provide insights into current progress and prospects in the use genomics studies for the control and prevention of fish diseases. Metagenomics has emerged to be an important tool used to identify emerging infectious diseases for the timely design of rational disease control strategies, determining microbial compositions in different aquatic environments used for fish farming and the use of host microbiota to monitor the health status of fish. Expounding the use of antimicrobial peptides (AMPs) as therapeutic agents against different pathogens as well as elucidating their role in tissue regeneration is another vital aspect of genomics studies that had taken precedent in recent years. In vaccine development, prospects made include the identification of highly immunogenic proteins for use in recombinant vaccine designs as well as identifying gene signatures that correlate with protective immunity for use as benchmarks in optimizing vaccine efficacy. Progress in quantitative trait loci (QTL) mapping is beginning to yield considerable success in identifying resistant traits against some of the highly infectious diseases that have previously ravaged the aquaculture industry. Altogether, the synopsis put forth shows that genomics studies are beginning to yield positive contribution in the prevention and control of fish diseases in aquaculture.
Hetron Mweemba Munang’andu
Full Text Available Genome wide studies based on conventional molecular tools and upcoming omics technologies are beginning to gain functional applications in the control and prevention of diseases in teleosts fish. Herein, we provide insights into current progress and prospects in the use genomics studies for the control and prevention of fish diseases. Metagenomics has emerged to be an important tool used to identify emerging infectious diseases for the timely design of rational disease control strategies, determining microbial compositions in different aquatic environments used for fish farming and the use of host microbiota to monitor the health status of fish. Expounding the use of antimicrobial peptides (AMPs as therapeutic agents against different pathogens as well as elucidating their role in tissue regeneration is another vital aspect of genomics studies that had taken precedent in recent years. In vaccine development, prospects made include the identification of highly immunogenic proteins for use in recombinant vaccine designs as well as identifying gene signatures that correlate with protective immunity for use as benchmarks in optimizing vaccine efficacy. Progress in quantitative trait loci (QTL mapping is beginning to yield considerable success in identifying resistant traits against some of the highly infectious diseases that have previously ravaged the aquaculture industry. Altogether, the synopsis put forth shows that genomics studies are beginning to yield positive contribution in the prevention and control of fish diseases in aquaculture.
Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji
Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.
J. Alberto Romero Navarro
Full Text Available Chocolate is a highly valued and palatable confectionery product. Chocolate is primarily made from the processed seeds of the tree species Theobroma cacao. Cacao cultivation is highly relevant for small-holder farmers throughout the tropics, yet its productivity remains limited by low yields and widespread pathogens. A panel of 148 improved cacao clones was assembled based on productivity and disease resistance, and phenotypic single-tree replicated clonal evaluation was performed for 8 years. Using high-density markers, the diversity of clones was expressed relative to 10 known ancestral cacao populations, and significant effects of ancestry were observed in productivity and disease resistance. Genome-wide association (GWA was performed, and six markers were significantly associated with frosty pod disease resistance. In addition, genomic selection was performed, and consistent with the observed extensive linkage disequilibrium, high predictive ability was observed at low marker densities for all traits. Finally, quantitative trait locus mapping and differential expression analysis of two cultivars with contrasting disease phenotypes were performed to identify genes underlying frosty pod disease resistance, identifying a significant quantitative trait locus and 35 differentially expressed genes using two independent differential expression analyses. These results indicate that in breeding populations of heterozygous and recently admixed individuals, mapping approaches can be used for low complexity traits like pod color cacao, or in other species single gene disease resistance, however genomic selection for quantitative traits remains highly effective relative to mapping. Our results can help guide the breeding process for sustainable improved cacao productivity.
Thomas W. Jeffries; Jennifer R. Headman Van Vleet
Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...
Full Text Available The human genome hosts several active families of transposable elements (TEs, including the Alu, LINE-1, and SVA retrotransposons that are mobilized via reverse transcription of RNA intermediates. We evaluated how insertion polymorphisms generated by human retrotransposon activity may be related to common health and disease phenotypes that have been previously interrogated through genome-wide association studies (GWAS. To address this question, we performed a genome-wide screen for retrotransposon polymorphism disease associations that are linked to TE induced gene regulatory changes. Our screen first identified polymorphic retrotransposon insertions found in linkage disequilibrium (LD with single nucleotide polymorphisms that were previously associated with common complex diseases by GWAS. We further narrowed this set of candidate disease associated retrotransposon polymorphisms by identifying insertions that are located within tissue-specific enhancer elements. We then performed expression quantitative trait loci analysis on the remaining set of candidates in order to identify polymorphic retrotransposon insertions that are associated with gene expression changes in B-cells of the human immune system. This progressive and stringent screen yielded a list of six retrotransposon insertions as the strongest candidates for TE polymorphisms that lead to disease via enhancer-mediated changes in gene regulation. For example, we found an SVA insertion within a cell-type specific enhancer located in the second intron of the B4GALT1 gene. B4GALT1 encodes a glycosyltransferase that functions in the glycosylation of the Immunoglobulin G (IgG antibody in such a way as to convert its activity from pro- to anti-inflammatory. The disruption of the B4GALT1 enhancer by the SVA insertion is associated with down-regulation of the gene in B-cells, which would serve to keep the IgG molecule in a pro-inflammatory state. Consistent with this idea, the B4GALT1 enhancer
Parker, Heidi G.; Meurs, Kathryn M.; Ostrander, Elaine A.
Recent advances in canine genomics are changing the landscape of veterinary biology, and by default, veterinary medicine. No longer are clinicians locked into traditional methods of diagnoses and therapy. Rather major advances in canine genetics and genomics from the past five years are now changing the way the veterinarian of the 21st century practices medicine. First, the availability of a dense genome map gives canine genetics a much needed foothold in comparative medicine, allowing advances made in human and mouse genetics to be applied to companion animals. Second, the recently released 7.5x whole genome sequence of the dog is facilitating the identification of hereditary disease genes. Finally, development of genetic tools for rapid screening of families and populations at risk for inherited disease means that the cost of identifying and testing for disease loci will significantly decrease in coming years. Out of these advances will come major changes in companion animal diagnostics and therapy. Clinicians will be able to offer their clients genetic testing and counseling for a myriad of disorders. Such advances are certain to generate healthier and more long lived dogs, improving quality of life for owner and pet alike. The clinician of the 21st century, therefore, faces incredible opportunities as well as challenges in the management of genetic disease. In this review we summarize recent findings in canine genomics and discuss their application to the study of canine cardiac health. PMID:19083345
Coughlan, Simone; Taylor, Ali Shirley; Feane, Eoghan; Sanders, Mandy; Schonian, Gabriele; Cotton, James A; Downing, Tim
The unicellular protozoan parasite Leishmania causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the Viannia subgenus predominates, so far only L. ( Viannia ) braziliensis and L. ( V. ) panamensis have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, L. ( V. ) naiffi and L. ( V. ) guyanensis genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for L. braziliensis M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of L. guyanensis and L. naiffi with 10 other Viannia genomes revealed four traits common to all Viannia : aneuploidy, 22 orthologous groups of genes absent in other Leishmania subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the Viannia , there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar L. lainsoni , L. naiffi had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate L. shawi M8408 had a possible minichromosome derived from the 3' end of chromosome 34 . This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse Viannia has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance.
Pritykin, Yuri; Ghersi, Dario; Singh, Mona
Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655
Smithies, O; Maeda, N
Gene targeting allows precise, predetermined changes to be made in a chosen gene in the mouse genome. To date, targeting has been used most often for generation of animals completely lacking the product of a gene of interest. The resulting "knockout" mice have confirmed some hypotheses, have upset others, but have rarely been uninformative. Models of several human genetic diseases have been produced by targeting--including Gaucher disease, cystic fibrosis, and the fragile X syndrome. These di...
Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; Oven, Mannis; Wallace, D.C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J.; Gai, Xiaowu
textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR ...
Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.
Wolf Yuri I
Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile
Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis
Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...
Full Text Available Background: We conducted a genome-wide association study (GWAS to identify specific genetic variants that underlie susceptibility to disease caused by Staphylococcus aureus in humans. Methods: Cases (n=309 and controls (n=2,925 were genotyped at 508,921 single nucleotide polymorphisms (SNPs. Cases had at least one laboratory and clinician confirmed disease caused by S. aureus whereas controls did not. R-package (for SNP association, EIGENSOFT (to estimate and adjust for population stratification and gene- (VEGAS and pathway-based (DAVID, PANTHER, and Ingenuity Pathway Analysis analyses were performed.Results: No SNP reached genome-wide significance. Four SNPs exceeded the pConclusion: We identified potential susceptibility genes for S. aureus diseases in this preliminary study but confirmation by other studies is needed. The observed associations could be relevant given the complexity of S. aureus as a pathogen and its ability to exploit multiple biological pathways to cause infections in humans.
Stewart Lindsay B
Full Text Available Abstract Background Gene copy number variation (CNV is responsible for several important phenotypes of the malaria parasite Plasmodium falciparum, including drug resistance, loss of infected erythrocyte cytoadherence and alteration of receptor usage for erythrocyte invasion. Despite the known effects of CNV, little is known about its extent throughout the genome. Results We performed a whole-genome survey of CNV genes in P. falciparum using comparative genome hybridisation of a diverse set of 16 laboratory culture-adapted isolates to a custom designed high density Affymetrix GeneChip array. Overall, 186 genes showed hybridisation signals consistent with deletion or amplification in one or more isolate. There is a strong association of CNV with gene length, genomic location, and low orthology to genes in other Plasmodium species. Sub-telomeric regions of all chromosomes are strongly associated with CNV genes independent from members of previously described multigene families. However, ~40% of CNV genes were located in more central regions of the chromosomes. Among the previously undescribed CNV genes, several that are of potential phenotypic relevance are identified. Conclusion CNV represents a major form of genetic variation within the P. falciparum genome; the distribution of gene features indicates the involvement of highly non-random mutational and selective processes. Additional studies should be directed at examining CNV in natural parasite populations to extend conclusions to clinical settings.
Biernacka, Joanna M.; Geske, Jennifer; Jenkins, Gregory D.; Colby, Colin; Rider, David N.; Karpyak, Victor M.; Choi, Doo-Sup; Fridley, Brooke L.
It is believed that multiple genetic variants with small individual effects contribute to the risk of alcohol dependence. Such polygenic effects are difficult to detect in genome-wide association studies that test for association of the phenotype with each single nucleotide polymorphism (SNP) individually. To overcome this challenge, gene set analysis (GSA) methods that jointly test for the effects of pre-defined groups of genes have been proposed. Rather than testing for association between the phenotype and individual SNPs, these analyses evaluate the global evidence of association with a set of related genes enabling the identification of cellular or molecular pathways or biological processes that play a role in development of the disease. It is hoped that by aggregating the evidence of association for all available SNPs in a group of related genes, these approaches will have enhanced power to detect genetic associations with complex traits. We performed GSA using data from a genome-wide study of 1165 alcohol dependent cases and 1379 controls from the Study of Addiction: Genetics and Environment (SAGE), for all 200 pathways listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results demonstrated a potential role of the “Synthesis and Degradation of Ketone Bodies” pathway. Our results also support the potential involvement of the “Neuroactive Ligand Receptor Interaction” pathway, which has previously been implicated in addictive disorders. These findings demonstrate the utility of GSA in the study of complex disease, and suggest specific directions for further research into the genetic architecture of alcohol dependence. PMID:22717047
Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their
Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun
The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position
Kim, Woonsu; Park, Hyesun; Seo, Seongwon
The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID
Burren Oliver S
Full Text Available Abstract The genetic dissection of complex disease remains a significant challenge. Sample-tracking and the recording, processing and storage of high-throughput laboratory data with public domain data, require integration of databases, genome informatics and genetic analyses in an easily updated and scaleable format. To find genes involved in multifactorial diseases such as type 1 diabetes (T1D, chromosome regions are defined based on functional candidate gene content, linkage information from humans and animal model mapping information. For each region, genomic information is extracted from Ensembl, converted and loaded into ACeDB for manual gene annotation. Homology information is examined using ACeDB tools and the gene structure verified. Manually curated genes are extracted from ACeDB and read into the feature database, which holds relevant local genomic feature data and an audit trail of laboratory investigations. Public domain information, manually curated genes, polymorphisms, primers, linkage and association analyses, with links to our genotyping database, are shown in Gbrowse. This system scales to include genetic, statistical, quality control (QC and biological data such as expression analyses of RNA or protein, all linked from a genomics integrative display. Our system is applicable to any genetic study of complex disease, of either large or small scale.
Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease
Carbonetto, Peter; Stephens, Matthew
Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14
Winick-Ng, Warren; Rylett, R Jane
Alzheimer's disease (AD) is a progressive neurodegenerative disease characterized by synapse dysfunction and cognitive impairment. Understanding the development and progression of AD is challenging, as the disease is highly complex and multifactorial. Both environmental and genetic factors play a role in AD pathogenesis, highlighted by observations of complex DNA modifications at the single gene level, and by new evidence that also implicates changes in genome architecture in AD patients. The four-dimensional structure of chromatin in space and time is essential for context-dependent regulation of gene expression in post-mitotic neurons. Dysregulation of epigenetic processes have been observed in the aging brain and in patients with AD, though there is not yet agreement on the impact of these changes on transcription. New evidence shows that proteins involved in genome organization have altered expression and localization in the AD brain, suggesting that the genomic landscape may play a critical role in the development of AD. This review discusses the role of the chromatin organizers and epigenetic modifiers in post-mitotic cells, the aging brain, and in the development and progression of AD. How these new insights can be used to help determine disease risk and inform treatment strategies will also be discussed.
Ronco, Troels; Lyhs, Ulrike; Stegger, Marc
to be important for the development of NE in chickens and piglets, respectively, while the role of these toxins is less well elucidated in diseased turkeys. Methods: We carried out comparative genomic analysis of 40 C. perfringens genomes from healthy and NE-suffering chickens and turkeys, and diseased pigs using......B, NELoc-1 and -3 seem to play an important role in the NE pathogenesis in chickens, whereas cpb2 is important in diseased pigs. • The VirSR two-component system is involved in regulating NE-associated virulence genes. • Conjugative plasmid genes are widely spread among C. perfringens. • WGS is a powerful...
Uno, Narumi; Abe, Satoshi; Oshimura, Mitsuo; Kazuki, Yasuhiro
Chromosome transfer technology, including chromosome modification, enables the introduction of Mb-sized or multiple genes to desired cells or animals. This technology has allowed innovative developments to be made for models of human disease and humanized animals, including Down syndrome model mice and humanized transchromosomic (Tc) immunoglobulin mice. Genome editing techniques are developing rapidly, and permit modifications such as gene knockout and knockin to be performed in various cell lines and animals. This review summarizes chromosome transfer-related technologies and the combined technologies of chromosome transfer and genome editing mainly for the production of cell/animal models of human disease and humanized animal models. Specifically, these include: (1) chromosome modification with genome editing in Chinese hamster ovary cells and mouse A9 cells for efficient transfer to desired cell types; (2) single-nucleotide polymorphism modification in humanized Tc mice with genome editing; and (3) generation of a disease model of Down syndrome-associated hematopoiesis abnormalities by the transfer of human chromosome 21 to normal human embryonic stem cells and the induction of mutation(s) in the endogenous gene(s) with genome editing. These combinations of chromosome transfer and genome editing open up new avenues for drug development and therapy as well as for basic research.
Postlethwait, J H
Zebrafish is one of several important teleost models for understanding principles of vertebrate developmental, molecular, organismal, genetic, evolutionary, and genomic biology. Efficient investigation of the molecular genetic basis of induced mutations depends on knowledge of the zebrafish genome. Principles of zebrafish genomic analysis, including gene mapping, ortholog identification, conservation of syntenies, genome duplication, and evolution of duplicate gene function are discussed here using as a case study the zebrafish msxa, msxb, msxc, msxd, and msxe genes, which together constitute zebrafish orthologs of tetrapod Msx1, Msx2, and Msx3. Genomic analysis suggests orthologs for this difficult to understand group of paralogs.
Peprah, Emmanuel; Xu, Huichun; Tekola-Ayele, Fasil; Royal, Charmaine D.
Genomic research is one of the tools for elucidating the pathogenesis of diseases of global health relevance, and paving the research dimension to clinical and public health translation. Recent advances in genomic research and technologies have increased our understanding of human diseases, genes associated with these disorders, and the relevant mechanisms. Genome-wide association studies (GWAS) have proliferated since the first studies were published several years ago, and have become an important tool in helping researchers comprehend human variation and the role genetic variants play in disease. However, the need to expand the diversity of populations in GWAS has become increasingly apparent as new knowledge is gained about genetic variation. Inclusion of diverse populations in genomic studies is critical to a more complete understanding of human variation and elucidation of the underpinnings of complex diseases. In this review, we summarize the available data on GWAS in recent-African ancestry populations within the western hemisphere (i.e. African Americans and peoples of the Caribbean) and continental African populations. Furthermore, we highlight ways in which genomic studies in populations of recent African ancestry have led to advances in the areas of malaria, HIV, prostate cancer, and other diseases. Finally, we discuss the advantages of conducting GWAS in recent African ancestry populations in the context of addressing existing and emerging global health conditions. PMID:25427668
Peprah, Emmanuel; Xu, Huichun; Tekola-Ayele, Fasil; Royal, Charmaine D
Genomic research is one of the tools for elucidating the pathogenesis of diseases of global health relevance and paving the research dimension to clinical and public health translation. Recent advances in genomic research and technologies have increased our understanding of human diseases, genes associated with these disorders, and the relevant mechanisms. Genome-wide association studies (GWAS) have proliferated since the first studies were published several years ago and have become an important tool in helping researchers comprehend human variation and the role genetic variants play in disease. However, the need to expand the diversity of populations in GWAS has become increasingly apparent as new knowledge is gained about genetic variation. Inclusion of diverse populations in genomic studies is critical to a more complete understanding of human variation and elucidation of the underpinnings of complex diseases. In this review, we summarize the available data on GWAS in recent African ancestry populations within the western hemisphere (i.e. African Americans and peoples of the Caribbean) and continental African populations. Furthermore, we highlight ways in which genomic studies in populations of recent African ancestry have led to advances in the areas of malaria, HIV, prostate cancer, and other diseases. Finally, we discuss the advantages of conducting GWAS in recent African ancestry populations in the context of addressing existing and emerging global health conditions.
Schizophrenia (SZ) is a devastating mental disorder afflicting 1% of the population. Recent genome-wide association studies (GWASs) of SZ have identified >100 risk loci. However, the causal variants/genes and the causal mechanisms remain largely unknown, which hinders the translation of GWAS findings into disease biology and drug targets. Most risk variants are noncoding, thus likely regulate gene expression. A major mechanism of transcriptional regulation is chromatin remodeling, and open chromatin is a versatile predictor of regulatory sequences. MicroRNA-mediated post-transcriptional regulation plays an important role in SZ pathogenesis. Neurons differentiated from patient-specific induced pluripotent stem cells (iPSCs) provide an experimental model to characterize the genetic perturbation of regulatory variants that are often specific to cell type and/or developmental stage. The emerging genome-editing technology enables the creation of isogenic iPSCs and neurons to efficiently characterize the effects of SZ-associated regulatory variants on SZ-relevant molecular and cellular phenotypes involving dopaminergic, glutamatergic, and GABAergic neurotransmissions. SZ GWAS findings equipped with the emerging functional genomics approaches provide an unprecedented opportunity for understanding new disease biology and identifying novel drug targets.
Sato, Yukuto; Tsukamoto, Katsumi; Nishida, Mutsumi
Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis. PMID:26578810
Maldonado, Lucas L; Assis, Juliana; Araújo, Flávio M Gomes; Salim, Anna C M; Macchiaroli, Natalia; Cucher, Marcela; Camicia, Federico; Fox, Adolfo; Rosenzvit, Mara; Oliveira, Guilherme; Kamenetzky, Laura
The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high
Hotta, Akitsu; Yamanaka, Shinya
The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.
A. Parsa (Afshin); C. Fuchsberger (Christian); A. Köttgen (Anna); C.M. O'Seaghdha (Conall); C. Pattaro (Cristian); M. de Andrade (Mariza); D.I. Chasman (Daniel); A. Teumer (Alexander); K. Endlich (Karlhans); M. Olden (Matthias); M-H. Chen (Ming-Huei); A. Tin (Adrienne); Y-J. Kim (Yong-Jin); D. Taliun (Daniel); M. Li (Man); M.F. Feitosa (Mary Furlan); M. Gorski (Mathias); Q. Yang (Qiong); C. Hundertmark (Claudia); M.C. Foster (Michael); N. Glazer (Nicole); A.J. Isaacs (Aaron); M. Rao (Madhumathi); G.D. Smith; J.R. O´Connell; M.V. Struchalin (Maksim); T. Tanaka (Toshiko); G. Li (Guo); S.J. Hwang; E.J. Atkinson (Elizabeth); K. Lohman (Kurt); M. Cornelis (Marilyn); A. Johansson (Åsa); A. Tönjes (Anke); A. Dehghan (Abbas); V. Couraki (Vincent); E.G. Holliday (Elizabeth); R. Sorice; Z. Kutalik (Zoltán); T. Lehtimäki (Terho); T. Esko (Tõnu); H. Deshmukh (Harshal); S. Ulivi (Shelia); A.Y. Chu (Audrey); D. Murgia (Daniela); S. Trompet (Stella); M. Imboden (Medea); B. Kollerits (Barbara); G. Pistis (Giorgio); T.B. Harris (Tamara); L.J. Launer (Lenore); T. Aspelund (Thor); G. Eiriksdottir (Gudny); B.D. Mitchell (Braxton); E.A. Boerwinkle (Eric); H. Schmidt (Helena); E. Hofer (Edith); F.B. Hu (Frank); A. Demirkan (Ayşe); B.A. Oostra (Ben); S.T. Turner (Stephen); J. Ding (Jingzhong); J.S. Andrews (Jeanette); B.I. Freedman (Barry); F. Giulianini (Franco); W. Koenig (Wolfgang); T. Illig (Thomas); A. Döring (Angela); H.E. Wichmann (Heinz Erich); L. Zgaga (Lina); T. Zemunik (Tatijana); M. Boban (Mladen); C. Minelli (Cosetta); H.E. Wheeler (Heather); W. Igl (Wilmar); G. Zaboli (Ghazal); S.H. Wild (Sarah); A.F. Wright (Alan); H. Campbell (Harry); D. Ellinghaus (David); U. Nöthlings (Ute); G. Jacobs (Gunnar); R. Biffar (Reiner); F.D.J. Ernst (Florian); G. Homuth (Georg); H.K. Kroemer (Heyo); M. Nauck (Matthias); S. Stracke (Sylvia); U. Vol̈ker (Uwe); H. Völzke (Henry); P. Kovacs (Peter); M. Stumvoll (Michael); R. Mägi (Reedik); A. Hofman (Albert); A.G. Uitterlinden (André); F. Rivadeneira Ramirez (Fernando); Y.S. Aulchenko (Yurii); O. Polasek (Ozren); N. Hastie (Nick); V. Vitart (Veronique); C. Helmer (Catherine); J.J. Wang (Jie Jin); B. Stengel (Bernd); D. Ruggiero; S.M. Bergmann (Sven); M. Kähönen (Mika); J. Viikari (Jorma); T. Nikopensius (Tiit); M.A. Province (Mike); H.M. Colhoun (H.); A.S.F. Doney (Alex); A. Robino (Antonietta); B.K. Krämer (Bernhard); L. Portas (Laura); I. Ford (Ian); B.M. Buckley (Brendan M.); M. Adam (Martin); G.-A. Thun (Gian-Andri); B. Paulweber (Bernhard); M. Haun (Margot); C. Sala (Cinzia); P. Mitchell (Paul); M. Ciullo; P. Vollenweider (Peter); O. Raitakari (Olli); A. Metspalu (Andres); C.N.A. Palmer (Colin); P. Gasparini (Paolo); M. Pirastu (Mario); J.W. Jukema (Jan Wouter); N.M. Probst-Hensch (Nicole M.); F. Kronenberg (Florian); D. Toniolo (Daniela); V. Gudnason (Vilmundur); A.R. Shuldiner (Alan); J. Coresh (Josef); R. Schmidt (Reinhold); L. Ferrucci (Luigi); C.M. van Duijn (Cornelia); I.B. Borecki (Ingrid); S.L.R. Kardia (Sharon); Y. Liu (YongMei); G.C. Curhan (Gary); I. Rudan (Igor); U. Gyllensten (Ulf); J.F. Wilson (James); A. Franke (Andre); P.P. Pramstaller (Peter Paul); R. Rettig (Rainer); I. Prokopenko (Inga); J.C.M. Witteman (Jacqueline); C. Hayward (Caroline); P.M. Ridker (Paul); M. Bochud (Murielle); I.M. Heid (Iris); D.S. Siscovick (David); C.S. Fox (Caroline); W.H.L. Kao (Wen); C.A. Böger (Carsten)
textabstractMany common genetic variants identified by genome-wide association studies for complex traitsmap to genes previously linked to rare inherited Mendelian disorders. A systematic analysis of common single-nucleotide polymorphisms (SNPs) in genes responsible for Mendelian diseases with
Linehan, W. Marston
Kidney cancer is not a single disease; it is made up of a number of different types of cancer, including clear cell, type 1 papillary, type 2 papillary, chromophobe, TFE3, TFEB, and oncocytoma. Sporadic, nonfamilial kidney cancer includes clear cell kidney cancer (75%), type 1 papillary kidney cancer (10%), papillary type 2 kidney cancer (including collecting duct and medullary RCC) (5%), the microphalmia-associated transcription (MiT) family translocation kidney cancers (TFE3, TFEB, and MITF), chromophobe kidney cancer (5%), and oncocytoma (5%). Each has a distinct histology, a different clinical course, responds differently to therapy, and is caused by mutation in a different gene. Genomic studies identifying the genes for kidney cancer, including the VHL, MET, FLCN, fumarate hydratase, succinate dehydrogenase, TSC1, TSC2, and TFE3 genes, have significantly altered the ways in which patients with kidney cancer are managed. While seven FDA-approved agents that target the VHL pathway have been approved for the treatment of patients with advanced kidney cancer, further genomic studies, such as whole genome sequencing, gene expression patterns, and gene copy number, will be required to gain a complete understanding of the genetic basis of kidney cancer and of the kidney cancer gene pathways and, most importantly, to provide the foundation for the development of effective forms of therapy for patients with this disease. PMID:23038766
Genomics is the study of all person's genes including interactions of those genes ... Our environment and our biology are two factors that strongly influence our health. ... The completion of the Human Genome Project signaled that the genome ...
Full Text Available The different environments that humans experience are likely to impact physiology and disease susceptibility. In order to estimate the magnitude of the impact of environment on transcript abundance, we examined gene expression in peripheral blood leukocyte samples from 46 desert nomadic, mountain agrarian and coastal urban Moroccan Amazigh individuals. Despite great expression heterogeneity in humans, as much as one third of the leukocyte transcriptome was found to be associated with differences among regions. Genome-wide polymorphism analysis indicates that genetic differentiation in the total sample is limited and is unlikely to explain the expression divergence. Methylation profiling of 1,505 CpG sites suggests limited contribution of methylation to the observed differences in gene expression. Genetic network analysis further implies that specific aspects of immune function are strongly affected by regional factors and may influence susceptibility to respiratory and inflammatory disease. Our results show a strong genome-wide gene expression signature of regional population differences that presumably include lifestyle, geography, and biotic factors, implying that these can play at least as great a role as genetic divergence in modulating gene expression variation in humans.
Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A
Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.
Meehan, Terrence F.; Conte, Nathalie; West, David B.; Jacobsen, Julius O.; Mason, Jeremy; Warren, Jonathan; Chen, Chao-Kung; Tudose, Ilinca; Relac, Mike; Matthews, Peter; Karp, Natasha; Santos, Luis; Fiegel, Tanja; Ring, Natalie; Westerberg, Henrik; Greenaway, Simon; Sneddon, Duncan; Morgan, Hugh; Codner, Gemma F; Stewart, Michelle E; Brown, James; Horner, Neil; Haendel, Melissa; Washington, Nicole; Mungall, Christopher J.; Reynolds, Corey L; Gallegos, Juan; Gailus-Durner, Valerie; Sorg, Tania; Pavlovic, Guillaume; Bower, Lynette R; Moore, Mark; Morse, Iva; Gao, Xiang; Tocchini-Valentini, Glauco P; Obata, Yuichi; Cho, Soo Young; Seong, Je Kyung; Seavitt, John; Beaudet, Arthur L.; Dickinson, Mary E.; Herault, Yann; Wurst, Wolfgang; de Angelis, Martin Hrabe; Lloyd, K.C. Kent; Flenniken, Ann M; Nutter, Lauryl MJ; Newbigging, Susan; McKerlie, Colin; Justice, Monica J.; Murray, Stephen A.; Svenson, Karen L.; Braun, Robert E.; White, Jacqueline K.; Bradley, Allan; Flicek, Paul; Wells, Sara; Skarnes, William C.; Adams, David J.; Parkinson, Helen; Mallon, Ann-Marie; Brown, Steve D.M.; Smedley, Damian
Although next generation sequencing has revolutionised the ability to associate variants with human diseases, diagnostic rates and development of new therapies are still limited by our lack of knowledge of function and pathobiological mechanism for most genes. To address this challenge, the International Mouse Phenotyping Consortium (IMPC) is creating a genome- and phenome-wide catalogue of gene function by characterizing new knockout mouse strains across diverse biological systems through a broad set of standardised phenotyping tests, with all mice made readily available to the biomedical community. Analysing the first 3328 genes reveals models for 360 diseases including the first for type C Bernard-Soulier, Bardet-Biedl-5 and Gordon Holmes syndromes. 90% of our phenotype annotations are novel, providing the first functional evidence for 1092 genes and candidates in unsolved diseases such as Arrhythmogenic Right Ventricular Dysplasia 3. Finally, we describe our role in variant functional validation with the 100,000 Genomes and other projects. PMID:28650483
Wajid, Abdul; Rehmani, Shafqat Fatima; Sharma, Poonam; Goraichuk, Iryna V.; Dimitrov, Kiril M.; Afonso, Claudio L.
Two complete genome sequences of Newcastle disease virus (NDV) are described here. Virulent isolates pigeon/Pakistan/Lahore/21A/2015 and pigeon/Pakistan/Lahore/25A/2015 were obtained from racing pigeons sampled in the Pakistani province of Punjab during 2015. Phylogenetic analysis of the fusion protein genes and complete genomes classified the isolates as members of NDV class II, genotype VI.
Gubala, Aneta; Davis, Steven; Weir, Richard; Melville, Lorna; Cowled, Chris; Boyle, David
Tibrogargan virus (TIBV) and Coastal Plains virus (CPV) were isolated from cattle in Australia and TIBV has also been isolated from the biting midge Culicoides brevitarsis. Complete genomic sequencing revealed that the viruses share a novel genome structure within the family Rhabdoviridae, each virus containing two additional putative genes between the matrix protein (M) and glycoprotein (G) genes and one between the G and viral RNA polymerase (L) genes. The predicted novel protein products are highly diverged at the sequence level but demonstrate clear conservation of secondary structure elements, suggesting conservation of biological functions. Phylogenetic analyses showed that TIBV and CPV form an independent group within the 'dimarhabdovirus supergroup'. Although no disease has been observed in association with these viruses, antibodies were detected at high prevalence in cattle and buffalo in northern Australia, indicating the need for disease monitoring and further study of this distinctive group of viruses.
Lu, Jianguo; Peatman, Eric; Tang, Haibao; Lewis, Joshua; Liu, Zhanjiang
Gene duplication has had a major impact on genome evolution. Localized (or tandem) duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes. Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks), and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish. We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication
Full Text Available Parkinson disease (PD is a complex neurodegenerative disorder with largely unknown genetic mechanisms. While the degeneration of dopaminergic neurons in PD mainly takes place in the substantia nigra pars compacta (SN region, other brain areas, including the prefrontal cortex, develop Lewy bodies, the neuropathological hallmark of PD. We generated and analyzed expression data from the prefrontal cortex Brodmann Area 9 (BA9 of 27 PD and 26 control samples using the 44K One-Color Agilent 60-mer Whole Human Genome Microarray. All samples were male, without significant Alzheimer disease pathology and with extensive pathological annotation available. 507 of the 39,122 analyzed expression probes were different between PD and control samples at false discovery rate (FDR of 5%. One of the genes with significantly increased expression in PD was the forkhead box O1 (FOXO1 transcription factor. Notably, genes carrying the FoxO1 binding site were significantly enriched in the FDR-significant group of genes (177 genes covered by 189 probes, suggesting a role for FoxO1 upstream of the observed expression changes. Single-nucleotide polymorphisms (SNPs selected from a recent meta-analysis of PD genome-wide association studies (GWAS were successfully genotyped in 50 out of the 53 microarray brains, allowing a targeted expression-SNP (eSNP analysis for 52 SNPs associated with PD affection at genome-wide significance and the 189 probes from FoxO1 regulated genes. A significant association was observed between a SNP in the cyclin G associated kinase (GAK gene and a probe in the spermine oxidase (SMOX gene. Further examination of the FOXO1 region in a meta-analysis of six available GWAS showed two SNPs significantly associated with age at onset of PD. These results implicate FOXO1 as a PD-relevant gene and warrant further functional analyses of its transcriptional regulatory mechanisms.
Full Text Available Abstract Background Vibrio vulnificus is the leading cause of reported death from consumption of seafood in the United States. Despite several decades of research on molecular pathogenesis, much remains to be learned about the mechanisms of virulence of this opportunistic bacterial pathogen. The two complete and annotated genomic DNA sequences of V. vulnificus belong to strains of clade 2, which is the predominant clade among clinical strains. Clade 2 strains generally possess higher virulence potential in animal models of disease compared with clade 1, which predominates among environmental strains. SOLiD sequencing of four V. vulnificus strains representing different clades (1 and 2 and biotypes (1 and 2 was used for comparative genomic analysis. Results Greater than 4,100,000 bases were sequenced of each strain, yielding approximately 100-fold coverage for each of the four genomes. Although the read lengths of SOLiD genomic sequencing were only 35 nt, we were able to make significant conclusions about the unique and shared sequences among the genomes, including identification of single nucleotide polymorphisms. Comparative analysis of the newly sequenced genomes to the existing reference genomes enabled the identification of 3,459 core V. vulnificus genes shared among all six strains and 80 clade 2-specific genes. We identified 523,161 SNPs among the six genomes. Conclusions We were able to glean much information about the genomic content of each strain using next generation sequencing. Flp pili, GGDEF proteins, and genomic island XII were identified as possible virulence factors because of their presence in virulent sequenced strains. Genomic comparisons also point toward the involvement of sialic acid catabolism in pathogenesis.
In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approxima...
Otto, Thomas D
Background: Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function. Results: We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilized it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the `Plasmodium interspersed repeat genes\\' (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family. Conclusions: Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.
Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup
Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Full Text Available Increasing numbers of human diseases are being linked to genetic variants, but our understanding of the mechanistic links leading from DNA sequence to disease phenotype is limited. The majority of disease-causing nucleotide variants fall within the non-protein-coding portion of the genome, making it likely that they act by altering gene regulatory sequences. We hypothesised that SNPs within the binding sites of the transcriptional repressor REST alter the degree of repression of target genes. Given that changes in the effective concentration of REST contribute to several pathologies-various cancers, Huntington's disease, cardiac hypertrophy, vascular smooth muscle proliferation-these SNPs should alter disease-susceptibility in carriers. We devised a strategy to identify SNPs that affect the recruitment of REST to target genes through the alteration of its DNA recognition element, the RE1. A multi-step screen combining genetic, genomic, and experimental filters yielded 56 polymorphic RE1 sequences with robust and statistically significant differences of affinity between alleles. These SNPs have a considerable effect on the the functional recruitment of REST to DNA in a range of in vitro, reporter gene, and in vivo analyses. Furthermore, we observe allele-specific biases in deeply sequenced chromatin immunoprecipitation data, consistent with predicted differenes in RE1 affinity. Amongst the targets of polymorphic RE1 elements are important disease genes including NPPA, PTPRT, and CDH4. Thus, considerable genetic variation exists in the DNA motifs that connect gene regulatory networks. Recently available ChIP-seq data allow the annotation of human genetic polymorphisms with regulatory information to generate prior hypotheses about their disease-causing mechanism.
Johnson, Rory; Richter, Nadine; Bogu, Gireesh K.; Bhinge, Akshay; Teng, Siaw Wei; Choo, Siew Hua; Andrieux, Lise O.; de Benedictis, Cinzia; Jauch, Ralf; Stanton, Lawrence W.
Increasing numbers of human diseases are being linked to genetic variants, but our understanding of the mechanistic links leading from DNA sequence to disease phenotype is limited. The majority of disease-causing nucleotide variants fall within the non-protein-coding portion of the genome, making it likely that they act by altering gene regulatory sequences. We hypothesised that SNPs within the binding sites of the transcriptional repressor REST alter the degree of repression of target genes. Given that changes in the effective concentration of REST contribute to several pathologies—various cancers, Huntington's disease, cardiac hypertrophy, vascular smooth muscle proliferation—these SNPs should alter disease-susceptibility in carriers. We devised a strategy to identify SNPs that affect the recruitment of REST to target genes through the alteration of its DNA recognition element, the RE1. A multi-step screen combining genetic, genomic, and experimental filters yielded 56 polymorphic RE1 sequences with robust and statistically significant differences of affinity between alleles. These SNPs have a considerable effect on the the functional recruitment of REST to DNA in a range of in vitro, reporter gene, and in vivo analyses. Furthermore, we observe allele-specific biases in deeply sequenced chromatin immunoprecipitation data, consistent with predicted differenes in RE1 affinity. Amongst the targets of polymorphic RE1 elements are important disease genes including NPPA, PTPRT, and CDH4. Thus, considerable genetic variation exists in the DNA motifs that connect gene regulatory networks. Recently available ChIP–seq data allow the annotation of human genetic polymorphisms with regulatory information to generate prior hypotheses about their disease-causing mechanism. PMID:22496669
Harrison, Paul M; Khachane, Amit; Kumar, Manish
Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long
Stokkers, P. C.; Huibregtse, K.; Leegwater, A. C.; Reitsma, P. H.; Tytgat, G. N.; van Deventer, S. J.
Genome scans have identified a region spanning 40 cM on the long arm of chromosome 12 as a susceptibility locus for inflammatory bowel disease (IBD). This locus contains several candidate genes for IBD, one of which is the gene for the natural resistance associated macrophage protein 2 (NRAMP2).
Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A
The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard
to investigate locomotor activity, and applied genomic feature prediction models to identify gene ontology (GO) cate- gories predictive of this phenotype. Next, we applied the covariance association test to partition the genomic variance of the predictive GO terms to the genes within these terms. We...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated......Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...
Johnson, Michael R; Shkura, Kirill; Langley, Sarah R; Delahaye-Duriez, Andree; Srivastava, Prashant; Hill, W David; Rackham, Owen J L; Davies, Gail; Harris, Sarah E; Moreno-Moral, Aida; Rotival, Maxime; Speed, Doug; Petrovski, Slavé; Katz, Anaïs; Hayward, Caroline; Porteous, David J; Smith, Blair H; Padmanabhan, Sandosh; Hocking, Lynne J; Starr, John M; Liewald, David C; Visconti, Alessia; Falchi, Mario; Bottolo, Leonardo; Rossetti, Tiziana; Danis, Bénédicte; Mazzuferi, Manuela; Foerch, Patrik; Grote, Alexander; Helmstaedter, Christoph; Becker, Albert J; Kaminski, Rafal M; Deary, Ian J; Petretto, Enrico
Genetic determinants of cognition are poorly characterized, and their relationship to genes that confer risk for neurodevelopmental disease is unclear. Here we performed a systems-level analysis of genome-wide gene expression data to infer gene-regulatory networks conserved across species and brain regions. Two of these networks, M1 and M3, showed replicable enrichment for common genetic variants underlying healthy human cognitive abilities, including memory. Using exome sequence data from 6,871 trios, we found that M3 genes were also enriched for mutations ascertained from patients with neurodevelopmental disease generally, and intellectual disability and epileptic encephalopathy in particular. M3 consists of 150 genes whose expression is tightly developmentally regulated, but which are collectively poorly annotated for known functional pathways. These results illustrate how systems-level analyses can reveal previously unappreciated relationships between neurodevelopmental disease-associated genes in the developed human brain, and provide empirical support for a convergent gene-regulatory network influencing cognition and neurodevelopmental disease.
Belizário, Jose E
Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease's etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.
Upadhyay, Atul K; Chacko, Anita R; Gandhimathi, A; Ghosh, Pritha; Harini, K; Joseph, Agnel P; Joshi, Adwait G; Karpe, Snehal D; Kaushik, Swati; Kuravadi, Nagesh; Lingu, Chandana S; Mahita, J; Malarini, Ramya; Malhotra, Sony; Malini, Manoharan; Mathew, Oommen K; Mutt, Eshita; Naika, Mahantesha; Nitish, Sathyanarayanan; Pasha, Shaik Naseer; Raghavender, Upadhyayula S; Rajamani, Anantharamanan; Shilpa, S; Shingate, Prashant N; Singh, Heikham Russiachand; Sukhwal, Anshul; Sunitha, Margaret S; Sumathi, Manojkumar; Ramaswamy, S; Gowda, Malali; Sowdhamini, Ramanathan
Krishna Tulsi, a member of Lamiaceae family, is a herb well known for its spiritual, religious and medicinal importance in India. The common name of this plant is 'Tulsi' (or 'Tulasi' or 'Thulasi') and is considered sacred by Hindus. We present the draft genome of Ocimum tenuiflurum L (subtype Krishna Tulsi) in this report. The paired-end and mate-pair sequence libraries were generated for the whole genome sequenced with the Illumina Hiseq 1000, resulting in an assembled genome of 374 Mb, with a genome coverage of 61 % (612 Mb estimated genome size). We have also studied transcriptomes (RNA-Seq) of two subtypes of O. tenuiflorum, Krishna and Rama Tulsi and report the relative expression of genes in both the varieties. The pathways leading to the production of medicinally-important specialized metabolites have been studied in detail, in relation to similar pathways in Arabidopsis thaliana and other plants. Expression levels of anthocyanin biosynthesis-related genes in leaf samples of Krishna Tulsi were observed to be relatively high, explaining the purple colouration of Krishna Tulsi leaves. The expression of six important genes identified from genome data were validated by performing q-RT-PCR in different tissues of five different species, which shows the high extent of urosolic acid-producing genes in young leaves of the Rama subtype. In addition, the presence of eugenol and ursolic acid, implied as potential drugs in the cure of many diseases including cancer was confirmed using mass spectrometry. The availability of the whole genome of O.tenuiflorum and our sequence analysis suggests that small amino acid changes at the functional sites of genes involved in metabolite synthesis pathways confer special medicinal properties to this herb.
Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.
The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within thealphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis
Chen, Yang; Gao, Zhen; Wang, Bingcheng; Xu, Rong
Glioblastoma (GBM) is the most common and aggressive brain tumors. It has poor prognosis even with optimal radio- and chemo-therapies. Since GBM is highly heterogeneous, drugs that target on specific molecular profiles of individual tumors may achieve maximized efficacy. Currently, the Cancer Genome Atlas (TCGA) projects have identified hundreds of GBM-associated genes. We develop a drug repositioning approach combining disease genomics and mouse phenotype data towards predicting targeted therapies for GBM. We first identified disease specific mouse phenotypes using the most recently discovered GBM genes. Then we systematically searched all FDA-approved drugs for candidates that share similar mouse phenotype profiles with GBM. We evaluated the ranks for approved and novel GBM drugs, and compared with an existing approach, which also use the mouse phenotype data but not the disease genomics data. We achieved significantly higher ranks for the approved and novel GBM drugs than the earlier approach. For all positive examples of GBM drugs, we achieved a median rank of 9.2 45.6 of the top predictions have been demonstrated effective in inhibiting the growth of human GBM cells. We developed a computational drug repositioning approach based on both genomic and phenotypic data. Our approach prioritized existing GBM drugs and outperformed a recent approach. Overall, our approach shows potential in discovering new targeted therapies for GBM.
Full Text Available Abstract Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs. We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis.
Wajid, Abdul; Rehmani, Shafqat Fatima; Sharma, Poonam; Goraichuk, Iryna V.; Dimitrov, Kiril M.
Two complete genome sequences of Newcastle disease virus (NDV) are described here. Virulent isolates pigeon/Pakistan/Lahore/21A/2015 and pigeon/Pakistan/Lahore/25A/2015 were obtained from racing pigeons sampled in the Pakistani province of Punjab during 2015. Phylogenetic analysis of the fusion protein genes and complete genomes classified the isolates as members of NDV class II, genotype VI. PMID:27540069
Tanzi, Rudolph E
The rich and colorful history of gene discovery in Alzheimer's disease (AD) over the past three decades is as complex and heterogeneous as the disease, itself. Twin and family studies indicate that genetic factors are estimated to play a role in at least 80% of AD cases. The inheritance of AD exhibits a dichotomous pattern. On one hand, rare mutations inAPP, PSEN1, and PSEN2 are fully penetrant for early-onset (95%) late-onset AD. These four genes account for 30-50% of the inheritability of AD. Genome-wide association studies have recently led to the identification of additional highly confirmed AD candidate genes. Here, I review the past, present, and future of attempts to elucidate the complex and heterogeneous genetic underpinnings of AD along with some of the unique events that made these discoveries possible.
Full Text Available The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue.
Full Text Available
Human health is determined by the interplay of genetic factors and the environment. In this context the recent advances in human genomics are expected to play a central role in medicine and public health by providing genetic information for disease prediction and prevention.
After the completion of the human genome sequencing, a fundamental step will be represented by the translation of these discoveries into meaningful actions to improve health and prevent diseases, and the field of epidemiology plays a central role in this effort. These are some of the issues addressed by Human Genome Epidemiology –A scientific foundation for using genetic information to improve health and prevent disease, a volume edited by Prof. M. Khoury, Prof. J. Little, Prof.W. Burke and published by Oxford university Press 2004.
This book describes the important role that epidemiological methods play in the continuum from gene discovery to the development and application of genetic tests. The Authors calls this continuum human genome epidemiology (HuGE to denote an evolving field of inquiry that uses systematic applications of epidemiological methods to assess the impact of human genetic variation on health and disease.
The book is divided into four sections and it is structured to allow readers to proceed systematically from the fundamentals of genome technology and discovery, to the epidemiological approaches, to gene characterisation, to the evaluation of genetic tests and their use in health services and public health.
Min Kyung Sung
Full Text Available Genome-wide association studies have proven the highly polygenic architecture of complex diseases or traits; therefore, single-locus-based methods are usually unable to detect all involved loci, especially when individual loci exert small effects. Moreover, the majority of associated single-nucleotide polymorphisms resides in non-coding regions, making it difficult to understand their phenotypic contribution. In this work, we studied epistatic interactions associated with three common diseases using Korea Association Resource (KARE data: type 2 diabetes mellitus (DM, hypertension (HT, and coronary artery disease (CAD. We showed that epistatic single-nucleotide polymorphisms (SNPs were enriched in enhancers, as well as in DNase I footprints (the Encyclopedia of DNA Elements [ENCODE] Project Consortium 2012, which suggested that the disruption of the regulatory regions where transcription factors bind may be involved in the disease mechanism. Accordingly, to identify the genes affected by the SNPs, we employed whole-genome multiple-cell-type enhancer data which discovered using DNase I profiles and Cap Analysis Gene Expression (CAGE. Assigned genes were significantly enriched in known disease associated gene sets, which were explored based on the literature, suggesting that this approach is useful for detecting relevant affected genes. In our knowledge-based epistatic network, the three diseases share many associated genes and are also closely related with each other through many epistatic interactions. These findings elucidate the genetic basis of the close relationship between DM, HT, and CAD.
Boland, Mary Regina; Tatonetti, Nicholas P
Prenatal and perinatal exposures vary seasonally (e.g., sunlight, allergens) and many diseases are linked with variance in exposure. Epidemiologists often measure these changes using birth month as a proxy for seasonal variance. Likewise, Genome-Wide Association Studies have associated or implicated these same diseases with many genes. Both disparate data types (epidemiological and genetic) can provide key insights into the underlying disease biology. We developed an algorithm that links 1) epidemiological data from birth month studies with 2) genetic data from published gene-disease association studies. Our framework uses existing data repositories - PubMed, DisGeNET and Gene Ontology - to produce a bipartite network that connects enriched seasonally varying biofactorss with birth month dependent diseases (BMDDs) through their overlapping developmental gene sets. As a proof-of-concept, we investigate 7 known BMDDs and highlight three important biological networks revealed by our algorithm and explore some interesting genetic mechanisms potentially responsible for the seasonal contribution to BMDDs.
Cook, Daniel J; Nielsen, Jens
Advances in genome sequencing, high throughput measurement of gene and protein expression levels, data accessibility, and computational power have allowed genome-scale metabolic models (GEMs) to become a useful tool for understanding metabolic alterations associated with many different diseases. Despite the proven utility of GEMs, researchers confront multiple challenges in the use of GEMs, their application to human health and disease, and their construction and simulation in an organ-specific and disease-specific manner. Several approaches that researchers are taking to address these challenges include using proteomic and transcriptomic-informed methods to build GEMs for individual organs, diseases, and patients and using constraints on model behavior during simulation to match observed metabolic fluxes. We review the challenges facing researchers in the use of GEMs, review the approaches used to address these challenges, and describe advances that are on the horizon and could lead to a better understanding of human metabolism. WIREs Syst Biol Med 2017, 9:e1393. doi: 10.1002/wsbm.1393 For further resources related to this article, please visit the WIREs website. © 2017 Wiley Periodicals, Inc.
Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A; Patil, S.; Gubbala, S.; Aqrawi, P.; Arias, F.; Bess, C.; Blankenburg, K. B.; Brocchini, M.; Buhay, C.; Challis, D.; Chang, K.; Chen, D.; Coleman, P.; Drummond, J.; English, A.; Evani, U.; Francisco, L.; Fu, Q.; Goodspeed, R.; Haessly, T. H.; Hale, W.; Han, H.; Hu, Y.; Jackson, L.; Jakkamsetti, A.; Jayaseelan, J. C.; Kakkar, N.; Kalra, D.; Kandadi, H.; Lee, S.; Li, H.; Liu, Y.; Macmil, S.; Mandapat, C. M.; Mata, R.; Mathew, T.; Matskevitch, T.; Munidasa, M.; Nagaswamy, U.; Najjar, R.; Nguyen, N.; Niu, J.; Opheim, D.; Palculict, T.; Paul, S.; Pellon, M.; Perales, L.; Pham, C.; Pham, P.
Background: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. 2014 Elsik et al.; licensee BioMed Central Ltd.
Elsik, Christine G
Background: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. 2014 Elsik et al.; licensee BioMed Central Ltd.
Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.
Kim, Jihye; Yoo, Minjae; Shin, Jimin; Kim, Hyunmin; Kang, Jaewoo; Tan, Aik Choon
Traditional Chinese medicine (TCM) originated in ancient China has been practiced over thousands of years for treating various symptoms and diseases. However, the molecular mechanisms of TCM in treating these diseases remain unknown. In this study, we employ a systems pharmacology-based approach for connecting GWAS diseases with TCM for potential drug repurposing and repositioning. We studied 102 TCM components and their target genes by analyzing microarray gene expression experiments. We constructed disease-gene networks from 2558 GWAS studies. We applied a systems pharmacology approach to prioritize disease-target genes. Using this bioinformatics approach, we analyzed 14,713 GWAS disease-TCM-target gene pairs and identified 115 disease-gene pairs with q value < 0.2. We validated several of these GWAS disease-TCM-target gene pairs with literature evidence, demonstrating that this computational approach could reveal novel indications for TCM. We also develop TCM-Disease web application to facilitate the traditional Chinese medicine drug repurposing efforts. Systems pharmacology is a promising approach for connecting GWAS diseases with TCM for potential drug repurposing and repositioning. The computational approaches described in this study could be easily expandable to other disease-gene network analysis.
Background: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.Results: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.Conclusions: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that
Full Text Available BACKGROUND: The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE. METHODOLOGY/PRINCIPAL FINDINGS: The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST differing by up to 230 kb in genome size. CONCLUSION/SIGNIFICANCE: The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between "environmental" strains, the main contributors to the genetic diversity within the subspecies, and "domesticated" strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the "domesticated" strains essentially arose through substantial genomic flux within the dispensable
James F Denton
Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.
Full Text Available Because it is suspected that gene content may partly explain host adaptation and ecology of pathogenic bacteria, it is important to study factors affecting genome composition and its evolution. While recent genomic advances have revealed extremely large pan-genomes for some bacterial species, it remains difficult to predict to what extent gene pool is accessible within or transferable between populations. As genomes bear imprints of the history of the organisms, gene distribution pattern analyses should provide insights into the forces and factors at play in the shaping and maintaining of bacterial genomes. In this study, we revisited the data obtained from a previous CGH microarrays analysis in order to assess the genomic plasticity of the R. solanacearum species complex. Gene distribution analyses demonstrated the remarkably dispersed genome of R. solanacearum with more than half of the genes being accessory. From the reconstruction of the ancestral genomes compositions, we were able to infer the number of gene gain and loss events along the phylogeny. Analyses of gene movement patterns reveal that factors associated with gene function, genomic localization and ecology delineate gene flow patterns. While the chromosome displayed lower rates of movement, the megaplasmid was clearly associated with hot-spots of gene gain and loss. Gene function was also confirmed to be an essential factor in gene gain and loss dynamics with significant differences in movement patterns between different COG categories. Finally, analyses of gene distribution highlighted possible highways of horizontal gene transfer. Due to sampling and design bias, we can only speculate on factors at play in this gene movement dynamic. Further studies examining precise conditions that favor gene transfer would provide invaluable insights in the fate of bacteria, species delineation and the emergence of successful pathogens.
Lin, Chen-Ching; Zhao, Junfei; Jia, Peilin; Li, Wen-Hsiung; Zhao, Zhongming
Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics. PMID:26352260
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Full Text Available Abstract Background GATA transcription factors influence many developmental processes, including the specification of embryonic germ layers. The GATA gene family has significantly expanded in many animal lineages: whereas diverse cnidarians have only one GATA transcription factor, six GATA genes have been identified in many vertebrates, five in many insects, and eleven to thirteen in Caenorhabditis nematodes. All bilaterian animal genomes have at least one member each of two classes, GATA123 and GATA456. Results We have identified one GATA123 gene and one GATA456 gene from the genomic sequence of two invertebrate deuterostomes, a cephalochordate (Branchiostoma floridae and a hemichordate (Saccoglossus kowalevskii. We also have confirmed the presence of six GATA genes in all vertebrate genomes, as well as additional GATA genes in teleost fish. Analyses of conserved sequence motifs and of changes to the exon-intron structure, and molecular phylogenetic analyses of these deuterostome GATA genes support their origin from two ancestral deuterostome genes, one GATA 123 and one GATA456. Comparison of the conserved genomic organization across vertebrates identified eighteen paralogous gene families linked to multiple vertebrate GATA genes (GATA paralogons, providing the strongest evidence yet for expansion of vertebrate GATA gene families via genome duplication events. Conclusion From our analysis, we infer the evolutionary birth order and relationships among vertebrate GATA transcription factors, and define their expansion via multiple rounds of whole genome duplication events. As the genomes of four independent invertebrate deuterostome lineages contain single copy GATA123 and GATA456 genes, we infer that the 0R (pre-genome duplication invertebrate deuterostome ancestor also had two GATA genes, one of each class. Synteny analyses identify duplications of paralogous chromosomal regions (paralogons, from single ancestral vertebrate GATA123 and GATA456
Convergent functional genomics (CFG) is a translational methodology that integrates in a Bayesian fashion multiple lines of evidence from studies in human and animal models to get a better understanding of the genetics of a disease or pathological behavior. Here the integration of data sets that derive from forward genetics in animals and genetic association studies including genome wide association studies (GWAS) in humans is described for addictive behavior. The aim of forward genetics in animals and association studies in humans is to identify mutations (e.g. SNPs) that produce a certain phenotype; i.e. "from phenotype to genotype". Most powerful in terms of forward genetics is combined quantitative trait loci (QTL) analysis and gene expression profiling in recombinant inbreed rodent lines or genetically selected animals for a specific phenotype, e.g. high vs. low drug consumption. By Bayesian scoring genomic information from forward genetics in animals is then combined with human GWAS data on a similar addiction-relevant phenotype. This integrative approach generates a robust candidate gene list that has to be functionally validated by means of reverse genetics in animals; i.e. "from genotype to phenotype". It is proposed that studying addiction relevant phenotypes and endophenotypes by this CFG approach will allow a better determination of the genetics of addictive behavior.
Full Text Available Mycobacterium lepraemurium is the causative agent of murine leprosy, a chronic, granulomatous disease similar to human leprosy. Due to the similar clinical manifestations of human and murine leprosy and the difficulty of growing both bacilli axenically, Mycobacterium leprae and M. lepraemurium were once thought to be closely related, although it was later suggested that M. lepraemurium might be related to Mycobacterium avium. In this study, the complete genome of M. lepraemurium was sequenced using a combination of PacBio and Illumina sequencing. Phylogenomic analyses confirmed that M. lepraemurium is a distinct species within the M. avium complex (MAC. The M. lepraemurium genome is 4.05 Mb in length, which is considerably smaller than other MAC genomes, and it comprises 2,682 functional genes and 1,139 pseudogenes, which indicates that M. lepraemurium has undergone genome reduction. An error-prone repair homologue of the DNA polymerase III α-subunit was found to be nonfunctional in M. lepraemurium, which might contribute to pseudogene formation due to the accumulation of mutations in nonessential genes. M. lepraemurium has retained the functionality of several genes thought to influence virulence among members of the MAC.
Jensen, Majken Karoline; Pers, Tune Hannes; Dworzynski, Piotr
in genes associated with risk of coronary heart disease (CHD). Methods and Results-Genome-wide association analyses of approximately approximate to 700 000 single-nucleotide polymorphisms in 899 incident CHD cases and 1823 age-and sex-matched controls within the Nurses' Health and the Health Professionals...... complex. Conclusions-The integration of a GWA study with PPI data successfully identifies a set of candidate susceptibility genes for incident CHD that would have been missed in single-marker GWA analysis. (Circ Cardiovasc Genet. 2011; 4:549-556.)...
Zeng, Huicai; Fan, Dingding; Zhu, Yabin; Feng, Yue; Wang, Guofen; Peng, Chunfang; Jiang, Xuanting; Zhou, Dajie; Ni, Peixiang; Liang, Changcong; Liu, Lei; Wang, Jun; Mao, Chao
Background The asexual fungus Fusarium oxysporum f. sp. cubense (Foc) causing vascular wilt disease is one of the most devastating pathogens of banana (Musa spp.). To understand the molecular underpinning of pathogenicity in Foc, the genomes and transcriptomes of two Foc isolates were sequenced. Methodology/Principal Findings Genome analysis revealed that the genome structures of race 1 and race 4 isolates were highly syntenic with those of F. oxysporum f. sp. lycopersici strain Fol4287. A large number of putative virulence associated genes were identified in both Foc genomes, including genes putatively involved in root attachment, cell degradation, detoxification of toxin, transport, secondary metabolites biosynthesis and signal transductions. Importantly, relative to the Foc race 1 isolate (Foc1), the Foc race 4 isolate (Foc4) has evolved with some expanded gene families of transporters and transcription factors for transport of toxins and nutrients that may facilitate its ability to adapt to host environments and contribute to pathogenicity to banana. Transcriptome analysis disclosed a significant difference in transcriptional responses between Foc1 and Foc4 at 48 h post inoculation to the banana ‘Brazil’ in comparison with the vegetative growth stage. Of particular note, more virulence-associated genes were up regulated in Foc4 than in Foc1. Several signaling pathways like the mitogen-activated protein kinase Fmk1 mediated invasion growth pathway, the FGA1-mediated G protein signaling pathway and a pathogenicity associated two-component system were activated in Foc4 rather than in Foc1. Together, these differences in gene content and transcription response between Foc1 and Foc4 might account for variation in their virulence during infection of the banana variety ‘Brazil’. Conclusions/Significance Foc genome sequences will facilitate us to identify pathogenicity mechanism involved in the banana vascular wilt disease development. These will thus advance
Using reverse genetics technology, many strains of Newcastle disease virus (NDV) have been developed as vectors to express foreign genes for vaccine and gene therapy purposes. The foreign gene is usually inserted into a non-coding region of the NDV genome as an independent transcription unit. Eval...
Suciu, Radu M; Aydin, Emir; Chen, Brian E
With the exponential increase and widespread availability of genomic, transcriptomic, and proteomic data, accessing these '-omics' data is becoming increasingly difficult. The current resources for accessing and analyzing these data have been created to perform highly specific functions intended for specialists, and thus typically emphasize functionality over user experience. We have developed a web-based application, GeneDig.org, that allows any general user access to genomic information with ease and efficiency. GeneDig allows for searching and browsing genes and genomes, while a dynamic navigator displays genomic, RNA, and protein information simultaneously for co-navigation. We demonstrate that our application allows more than five times faster and efficient access to genomic information than any currently available methods. We have developed GeneDig as a platform for bioinformatics integration focused on usability as its central design. This platform will introduce genomic navigation to broader audiences while aiding the bioinformatics analyses performed in everyday biology research.
The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene
Ringman, John M.; Coppola, Giovanni
Purpose of Review: This article discusses the current status of knowledge regarding the genetic basis of Alzheimer disease (AD) with a focus on clinically relevant aspects. Recent Findings: The genetic architecture of AD is complex, as it includes multiple susceptibility genes and likely nongenetic factors. Rare but highly penetrant autosomal dominant mutations explain a small minority of the cases but have allowed tremendous advances in understanding disease pathogenesis. The identification of a strong genetic risk factor, APOE, reshaped the field and introduced the notion of genetic risk for AD. More recently, large-scale genome-wide association studies are adding to the picture a number of common variants with very small effect sizes. Large-scale resequencing studies are expected to identify additional risk factors, including rare susceptibility variants and structural variation. Summary: Genetic assessment is currently of limited utility in clinical practice because of the low frequency (Mendelian mutations) or small effect size (common risk factors) of the currently known susceptibility genes. However, genetic studies are identifying with confidence a number of novel risk genes, and this will further our understanding of disease biology and possibly the identification of therapeutic targets. PMID:23558482
Zhang, Na; Huang, Xing; Bao, Yaning; Wang, Bo; Zeng, Hongxia; Cheng, Weishun; Tang, Mi; Li, Yuhua; Ren, Jian; Sun, Yuhong
The early auxin responsive SAUR family is an important gene family in auxin signal transduction. We here present the first report of a genome-wide identification of SAUR genes in watermelon genome. We successfully identified 65 ClaSAURs and provide a genomic framework for future study on these genes. Phylogenetic result revealed a Cucurbitaceae-specific SAUR subfamily and contribute to understanding of the evolutionary pattern of SAUR genes in plants. Quantitative RT-PCR analysis demonstrates the existed expression of 11 randomly selected SAUR genes in watermelon tissues. ClaSAUR36 was highly expressed in fruit, for which further study might bring a new prospective for watermelon fruit development. Moreover, correlation analysis revealed the similar expression profiles of SAUR genes between watermelon and Arabidopsis during shoot organogenesis. This work gives us a new support for the conserved auxin machinery in plants.
Full Text Available Interpreting the impact of human genome variation on phenotype is challenging. The functional effect of protein-coding variants is often predicted using sequence conservation and population frequency data, however other factors are likely relevant. We hypothesized that variants in protein post-translational modification (PTM sites contribute to phenotype variation and disease. We analyzed fraction of rare variants and non-synonymous to synonymous variant ratio (Ka/Ks in 7,500 human genomes and found a significant negative selection signal in PTM regions independent of six factors, including conservation, codon usage, and GC-content, that is widely distributed across tissue-specific genes and function classes. PTM regions are also enriched in known disease mutations, suggesting that PTM variation is more likely deleterious. PTM constraint also affects flanking sequence around modified residues and increases around clustered sites, indicating presence of functionally important short linear motifs. Using target site motifs of 124 kinases, we predict that at least ∼180,000 motif-breaker amino acid residues that disrupt PTM sites when substituted, and highlight kinase motifs that show specific negative selection and enrichment of disease mutations. We provide this dataset with corresponding hypothesized mechanisms as a community resource. As an example of our integrative approach, we propose that PTPN11 variants in Noonan syndrome aberrantly activate the protein by disrupting an uncharacterized cluster of phosphorylation sites. Further, as PTMs are molecular switches that are modulated by drugs, we study mutated binding sites of PTM enzymes in disease genes and define a drug-disease network containing 413 novel predicted disease-gene links.
Whiteflies are a group of invasive crop pests that impact global agriculture. An analysis was conducted to compare draft genomes of two whitefly strains, which demonstrated the relative conserved gene order, but a number of genes were either novel (added) or omitted (deleted) between genomes. This...
Jirsová, Pavla; Snijders, A.M.; Kwek, S.; Roydasgupta, R.; Fridlyand, J.; Tokuyasu, T.; Pinkel, D.; Albertson, D. G.
Roč. 8, č. 6 (2007), r120 ISSN 1474-760X Institutional research plan: CEZ:AV0Z50040507; CEZ:AV0Z50040702 Keywords : gene amplification * array comparative genomic hybridization * oncogene Subject RIV: BO - Biophysics Impact factor: 6.589, year: 2007
Full Text Available Abstract Background The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. Results In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. Conclusions RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher’s specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch’s flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.
Xu, Jian-zhong; Zhang, Wei-guo
With the availability of the whole genome sequence of Escherichia coli or Corynebacterium glutamicum, strategies for directed DNA manipulation have developed rapidly. DNA manipulation plays an important role in understanding the function of genes and in constructing novel engineering bacteria according to requirement. DNA manipulation involves modifying the autologous genes and expressing the heterogenous genes. Two alternative approaches, using electroporation linear DNA or recombinant suicide plasmid, allow a wide variety of DNA manipulation. However, the over-expression of the desired gene is generally executed via plasmid-mediation. The current review summarizes the common strategies used for genetically modifying E. coli and C. glutamicum genomes, and discusses the technical problem of multi-layered DNA manipulation. Strategies for gene over-expression via integrating into genome are proposed. This review is intended to be an accessible introduction to DNA manipulation within the bacterial genome for novices and a source of the latest experimental information for experienced investigators. PMID:26834010
Anand K Ganesan
Full Text Available Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo, neurologic disorders (Parkinson's disease, auditory disorders (Waardenburg's syndrome, and opthalmologic disorders (age related macular degeneration. Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype
Full Text Available Similar to other malignancies, urothelial carcinoma (UC is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21, and BCL2L1 (20q11. We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development.
Supported by the Department of Energy (DOE) of U.S., the first tree genome, black cottonwood (Populus trichocarpa), has been completely sequenced and publicly release. This is the milestone that indicates the beginning of post-genome era for forest trees. Identification and cloning genes underlying important traits are one of the main tasks for the post-genome-era tree genomic studies. Recently, great achievements have been made in cloning genes coordinating important domestication traits in some crops, such as rice, tomato, maize and so on. Molecular breeding has been applied in the practical breeding programs for many crops. By contrast, molecular studies in trees are lagging behind. Trees possess some characteristics that make them as difficult organisms for studying on locating and cloning of genes. With the advances in techniques, given also the fast growth of tree genomic resources, great achievements are desirable in cloning unknown genes from trees, which will facilitate tree improvement programs by means of molecular breeding. In this paper, the author reviewed the progress in tree genomic and gene cloning studies, and prospected the future achievements in order to provide a useful reference for researchers working in this area.
Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.
Gao, Yu-Han; Guo, Rong-Jun; Li, Shi-Dong
The draft genome of Bacillus velezensis strain B6, a rhizobacterium with good biocontrol performance isolated from soil in China, was sequenced. The assembly comprises 32 scaffolds with a total size of 3.88 Mb. Gene clusters coding either ribosomally encoded bacteriocins or nonribosomally encoded antimicrobial polyketides and lipopeptides in the genome may contribute to plant disease control. Copyright © 2018 Gao et al.
Babu, B Kalyana; Dinesh, Pandey; Agrawal, Pawan K; Sood, S; Chandrashekara, C; Bhatt, Jagadish C; Kumar, Anil
The major limiting factor for production and productivity of finger millet crop is blast disease caused by Magnaporthe grisea. Since, the genome sequence information available in finger millet crop is scarce, comparative genomics plays a very important role in identification of genes/QTLs linked to the blast resistance genes using SSR markers. In the present study, a total of 58 genic SSRs were developed for use in genetic analysis of a global collection of 190 finger millet genotypes. The 58 SSRs yielded ninety five scorable alleles and the polymorphism information content varied from 0.186 to 0.677 at an average of 0.385. The gene diversity was in the range of 0.208 to 0.726 with an average of 0.487. Association mapping for blast resistance was done using 104 SSR markers which identified four QTLs for finger blast and one QTL for neck blast resistance. The genomic marker RM262 and genic marker FMBLEST32 were linked to finger blast disease at a P value of 0.007 and explained phenotypic variance (R²) of 10% and 8% respectively. The genomic marker UGEP81 was associated to finger blast at a P value of 0.009 and explained 7.5% of R². The QTLs for neck blast was associated with the genomic SSR marker UGEP18 at a P value of 0.01, which explained 11% of R². Three QTLs for blast resistance were found common by using both GLM and MLM approaches. The resistant alleles were found to be present mostly in the exotic genotypes. Among the genotypes of NW Himalayan region of India, VHC3997, VHC3996 and VHC3930 were found highly resistant, which may be effectively used as parents for developing blast resistant cultivars in the NW Himalayan region of India. The markers linked to the QTLs for blast resistance in the present study can be further used for cloning of the full length gene, fine mapping and their further use in the marker assisted breeding programmes for introgression of blast resistant alleles into locally adapted cultivars.
Promponas Vasilis J
Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating
Hagberg, Emma Elisabeth; Pedersen, Anders Gorm; Larsen, Lars E
Aleutian mink disease virus (AMDV) is a frequently encountered pathogen associated with mink farming. Previous phylogenetic analyses of AMDV have been based on shorter and more conserved parts of the genome, e.g. the partial NS1 gene. Such fragments are suitable for detection but are less useful...... direction of spread. It was however impossible to infer transmission pathways from the partial NS1 gene tree, since all samples from the case farms branched out from a single internal node. A sliding window analysis showed that there were no shorter genomic regions providing the same phylogenetic resolution...
Wu, Jing; Zhu, Jifeng; Wang, Lanfen; Wang, Shumin
Nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most important disease resistance genes in plants. The genome sequence of the common bean ( Phaseolus vulgaris L.) provides valuable data for determining the genomic organization of NBS-LRR genes. However, data on the NBS-LRR genes in the common bean are limited. In total, 178 NBS-LRR-type genes and 145 partial genes (with or without a NBS) located on 11 common bean chromosomes were identified from genome sequences database. Furthermore, 30 NBS-LRR genes were classified into Toll/interleukin-1 receptor (TIR)-NBS-LRR (TNL) types, and 148 NBS-LRR genes were classified into coiled-coil (CC)-NBS-LRR (CNL) types. Moreover, the phylogenetic tree supported the division of these PvNBS genes into two obvious groups, TNL types and CNL types. We also built expression profiles of NBS genes in response to anthracnose and common bacterial blight using qRT-PCR. Finally, we detected nine disease resistance loci for anthracnose (ANT) and seven for common bacterial blight (CBB) using the developed NBS-SSR markers. Among these loci, NSSR24, NSSR73, and NSSR265 may be located at new regions for ANT resistance, while NSSR65 and NSSR260 may be located at new regions for CBB resistance. Furthermore, we validated NSSR24, NSSR65, NSSR73, NSSR260, and NSSR265 using a new natural population. Our results provide useful information regarding the function of the NBS-LRR proteins and will accelerate the functional genomics and evolutionary studies of NBS-LRR genes in food legumes. NBS-SSR markers represent a wide-reaching resource for molecular breeding in the common bean and other food legumes. Collectively, our results should be of broad interest to bean scientists and breeders.
Hiscock, D; Upton, C
The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .
Jia, Hongge; Zhang, Yunzeng; Orbović, Vladimir; Xu, Jin; White, Frank F; Jones, Jeffrey B; Wang, Nian
Citrus is a highly valued tree crop worldwide, while, at the same time, citrus production faces many biotic challenges, including bacterial canker and Huanglongbing (HLB). Breeding for disease-resistant varieties is the most efficient and sustainable approach to control plant diseases. Traditional breeding of citrus varieties is challenging due to multiple limitations, including polyploidy, polyembryony, extended juvenility and long crossing cycles. Targeted genome editing technology has the potential to shorten varietal development for some traits, including disease resistance. Here, we used CRISPR/Cas9/sgRNA technology to modify the canker susceptibility gene CsLOB1 in Duncan grapefruit. Six independent lines, D LOB 2, D LOB 3, D LOB 9, D LOB 10, D LOB 11 and D LOB 12, were generated. Targeted next-generation sequencing of the six lines showed the mutation rate was 31.58%, 23.80%, 89.36%, 88.79%, 46.91% and 51.12% for D LOB 2, D LOB 3, D LOB 9, D LOB 10, D LOB 11 and D LOB 12, respectively, of the cells in each line. D LOB 2 and D LOB 3 showed canker symptoms similar to wild-type grapefruit, when inoculated with the pathogen Xanthomonas citri subsp. citri (Xcc). No canker symptoms were observed on D LOB 9, D LOB 10, D LOB 11 and D LOB 12 at 4 days postinoculation (DPI) with Xcc. Pustules caused by Xcc were observed on D LOB 9, D LOB 10, D LOB 11 and D LOB 12 in later stages, which were much reduced compared to that on wild-type grapefruit. The pustules on D LOB 9 and D LOB 10 did not develop into typical canker symptoms. No side effects and off-target mutations were detected in the mutated plants. This study indicates that genome editing using CRISPR technology will provide a promising pathway to generate disease-resistant citrus varieties. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Full Text Available Transfer of mitochondrial genes to the nucleus, and subsequent gain of regulatory elements for expression, is an ongoing evolutionary process in plants. Many examples have been characterized, which in some cases have revealed sources of mitochondrial targeting sequences and cis-regulatory elements. In contrast, there have been no reports of a nuclear gene that has undergone intracellular transfer to the mitochondrial genome and become expressed. Here we show that the orf164 gene in the mitochondrial genome of several Brassicaceae species, including Arabidopsis, is derived from the nuclear ARF17 gene that codes for an auxin responsive protein and is present across flowering plants. Orf164 corresponds to a portion of ARF17, and the nucleotide and amino acid sequences are 79% and 81% identical, respectively. Orf164 is transcribed in several organ types of Arabidopsis thaliana, as detected by RT-PCR. In addition, orf164 is transcribed in five other Brassicaceae within the tribes Camelineae, Erysimeae and Cardamineae, but the gene is not present in Brassica or Raphanus. This study shows that nuclear genes can be transferred to the mitochondrial genome and become expressed, providing a new perspective on the movement of genes between the genomes of subcellular compartments.
Santpere, Gabriel; Darre, Fleur; Blanco, Soledad; Alcami, Antonio; Villoslada, Pablo; Mar Albà, M; Navarro, Arcadi
Most people in the world (∼90%) are infected by the Epstein-Barr virus (EBV), which establishes itself permanently in B cells. Infection by EBV is related to a number of diseases including infectious mononucleosis, multiple sclerosis, and different types of cancer. So far, only seven complete EBV strains have been described, all of them coming from donors presenting EBV-related diseases. To perform a detailed comparative genomic analysis of EBV including, for the first time, EBV strains derived from healthy individuals, we reconstructed EBV sequences infecting lymphoblastoid cell lines (LCLs) from the 1000 Genomes Project. As strain B95-8 was used to transform B cells to obtain LCLs, it is always present, but a specific deletion in its genome sets it apart from natural EBV strains. After studying hundreds of individuals, we determined the presence of natural EBV in at least 10 of them and obtained a set of variants specific to wild-type EBV. By mapping the natural EBV reads into the EBV reference genome (NC007605), we constructed nearly complete wild-type viral genomes from three individuals. Adding them to the five disease-derived EBV genomic sequences available in the literature, we performed an in-depth comparative genomic analysis. We found that latency genes harbor more nucleotide diversity than lytic genes and that six out of nine latency-related genes, as well as other genes involved in viral attachment and entry into host cells, packaging, and the capsid, present the molecular signature of accelerated protein evolution rates, suggesting rapid host-parasite coevolution.
Inês C Conceição
Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high
Zhang, Yan-Cong; Lin, Kui
Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828
Full Text Available Oryza meyeriana (O. meyeriana, with a GG genome type (2n = 24, accumulated plentiful excellent characteristics with respect to resistance to many diseases such as rice shade and blast, even immunity to bacterial blight. It is very important to know if the diseases-resistant genes exist and express in this wild rice under native conditions. However, limited genomic or transcriptomic data of O. meyeriana are currently available. In this study, we present the first comprehensive characterization of the O. meyeriana transcriptome using RNA-seq and obtained 185,323 contigs with an average length of 1,692 bp and an N50 of 2,391 bp. Through differential expression analysis, it was found that there were most tissue-specifically expressed genes in roots, and next to stems and leaves. By similarity search against protein databases, 146,450 had at least a significant alignment to existed gene models. Comparison with the Oryza sativa (japonica-type Nipponbare and indica-type 93-11 genomes revealed that 13% of the O. meyeriana contigs had not been detected in O. sativa. Many diseases-resistant genes, such as bacterial blight resistant, blast resistant, rust resistant, fusarium resistant, cyst nematode resistant and downy mildew gene, were mined from the transcriptomic database. There are two kinds of rice bacterial blight-resistant genes (Xa1 and Xa26 differentially or specifically expressed in O. meyeriana. The 4 Xa1 contigs were all only expressed in root, while three of Xa26 contigs have the highest expression level in leaves, two of Xa26 contigs have the highest expression profile in stems and one of Xa26 contigs was expressed dominantly in roots. The transcriptomic database of O. meyeriana has been constructed and many diseases-resistant genes were found to express under native condition, which provides a foundation for future discovery of a number of novel genes and provides a basis for studying the molecular mechanisms associated with disease
Katharine J Sepp
Full Text Available While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Using, DNA markers and genome organization, several important disease resistance genes have been analyzed in mungbean (Vigna radiata), cowpea (Vigna unguiculata), common bean (Phaseolus vulgaris), and soybean (Glycine max). In the process, medium-density linkage maps consisting of restriction fragment length polymorphism (RFLP) markers were constructed for both mungbean and cowpea. Comparisons between these maps, as well as the maps of soybean and common bean, indicate that there is significant conservation of DNA marker order, though the conserved blocks in soybean are much shorter than in the others. DNA mapping results also indicate that a gene for seed weight may be conserved between mungbean and cowpea. Using the linkage maps, genes that control bruchid (genus Callosobruchus) and powdery mildew (Erysiphe polygoni) resistance in mungbean, aphid resistance in cowpea (Aphis craccivora), and cyst nematode (Heterodera glycines) resistance in soybean have all been mapped and characterized. For some of these traits resistance was found to be oligogenic and DNA mapping uncovered multiple genes involved in the phenotype. (author)
Qiu, Huan; Lee, Jun Mo; Yoon, Hwan Su; Bhattacharya, Debashish
Red algae (Rhodophyta) putatively diverged from the eukaryote tree of life >1.2 billion years ago and are the source of plastids in the ecologically important diatoms, haptophytes, and dinoflagellates. In general, red algae contain the largest plastid gene inventory among all such organelles derived from primary, secondary, or additional rounds of endosymbiosis. In contrast, their nuclear gene inventory is reduced when compared to their putative sister lineage, the Viridiplantae, and other photosynthetic lineages. The latter is thought to have resulted from a phase of genome reduction that occurred in the stem lineage of Rhodophyta. A recent comparative analysis of a taxonomically broad collection of red algal and Viridiplantae plastid genomes demonstrates that the red algal ancestor encoded ~1.5× more plastid genes than Viridiplantae. This difference is primarily explained by more extensive endosymbiotic gene transfer (EGT) in the stem lineage of Viridiplantae, when compared to red algae. We postulate that limited EGT in Rhodophytes resulted from the countervailing force of ancient, and likely recurrent, nuclear genome reduction. In other words, the propensity for nuclear gene loss led to the retention of red algal plastid genes that would otherwise have undergone intracellular gene transfer to the nucleus. This hypothesis recognizes the primacy of nuclear genome evolution over that of plastids, which have no inherent control of their gene inventory and can change dramatically (e.g., secondarily non-photosynthetic eukaryotes, dinoflagellates) in response to selection acting on the host lineage. © 2017 Phycological Society of America.
William A. Toscano, Jr.
Full Text Available Approximately 100,000 different environmental chemicals that are in use as high production volume chemicals confront us in our daily lives. Many of the chemicals we encounter are persistent and have long half-lives in the environment and our bodies. These compounds are referred to as Persistent Organic Pollutants, or POPS. The total environment however is broader than just toxic pollutants. It includes social capital, social economic status, and other factors that are not commonly considered in traditional approaches to studying environment-human interactions. The mechanism of action of environmental agents in altering the human phenotype from health to disease is more complex than once thought. The focus in public health has shifted away from the study of single-gene rare diseases and has given way to the study of multifactorial complex diseases that are common in the population. To understand common complex diseases, we need teams of scientists from different fields working together with common aims. We review some approaches for studying the action of the environment by discussing use-inspired research, and transdisciplinary research approaches. The Genomic era has yielded new tools for study of gene-environment interactions, including genomics, epigenomics, and systems biology. We use environmentally-driven diabetes mellitus type two as an example of environmental epigenomics and disease. The aim of this review is to start the conversation of how the application of advances in biomedical science can be used to advance public health.
Full Text Available The history of infectious diseases raised the plague as one of the most devastating for human beings. Far too often considered an ancient disease, the frequent resurgence of the plague has led to consider it as a reemerging disease in Madagascar, Algeria, Libya and Congo. The genetic factors associated with the pathogenicity of Yersinia pestis, the causative agent of the plague, involve the acquisition of the pPCP1 plasmid that promotes host invasion through the expression of the virulence factor Pla. The surveillance of plague foci after the 2003 outbreak in Algeria resulted in a positive detection of the specific pla gene of Y. pestis in rodents. However, the phenotypic characterization of the isolate identified a Citrobacter koseri. The comparative genomics of our sequenced C. koseri URMITE genome revealed a mosaic gene structure resulting from the lifestyle of our isolate and provided evidence for gene exchanges with different enteric bacteria. The most striking was the acquisition of a continuous 2 kb genomic fragment containing the virulence factor Pla of the Y. pestis pPCP1 plasmid; however, the subcutaneous injection of the CKU strain in mice did not produce any pathogenic effect. Our findings demonstrate that fast molecular detection of plague using solely the pla gene is unsuitable and should rather require Y. pestis gene marker combinations. We also suggest that the evolutionary force that might govern the expression of pathogenicity can occur through the acquisition of virulence genes but could also require the loss or the inactivation of resident genes such as antivirulence genes.
Schnable, James C.; Freeling, Michael; Lyons, Eric
The grasses, Poaceae, are one of the largest and most successful angiosperm families. Like many radiations of flowering plants, the divergence of the major grass lineages was preceded by a whole-genome duplication (WGD), although these events are not rare for flowering plants. By combining identification of syntenic gene blocks with measures of gene pair divergence and different frequencies of ancient gene loss, we have separated the two subgenomes present in modern grasses. Reciprocal loss of duplicated genes or genomic regions has been hypothesized to reproductively isolate populations and, thus, speciation. However, in contrast to previous studies in yeast and teleost fishes, we found very little evidence of reciprocal loss of homeologous genes between the grasses, suggesting that post-WGD gene loss may not be the cause of the grass radiation. The sets of homeologous and orthologous genes and predicted locations of deleted genes identified in this study, as well as links to the CoGe comparative genomics web platform for analyzing pan-grass syntenic regions, are provided along with this paper as a resource for the grass genetics community. PMID:22275519
Hwang, In Sun; Oh, Eom-Ji; Kim, Donghyuk; Oh, Chang-Sik
Clavibacter michiganensis ssp. capsici is a Gram-positive plant-pathogenic bacterium causing bacterial canker disease in pepper. Virulence genes and mechanisms of C. michiganensis ssp. capsici in pepper have not yet been studied. To identify virulence genes of C. michiganensis ssp. capsici, comparative genome analyses with C. michiganensis ssp. capsici and its related C. michiganensis subspecies, and functional analysis of its putative virulence genes during infection were performed. The C. michiganensis ssp. capsici type strain PF008 carries one chromosome (3.056 Mb) and two plasmids (39 kb pCM1 Cmc and 145 kb pCM2 Cmc ). The genome analyses showed that this bacterium lacks a chromosomal pathogenicity island and celA gene that are important for disease development by C. michiganensis ssp. michiganensis in tomato, but carries most putative virulence genes in both plasmids. Virulence of pCM1 Cmc -cured C. michiganensis ssp. capsici was greatly reduced compared with the wild-type strain in pepper. The complementation analysis with pCM1 Cmc -located putative virulence genes showed that at least five genes, chpE, chpG, ppaA1, ppaB1 and pelA1, encoding serine proteases or pectate lyase contribute to disease development in pepper. In conclusion, C. michiganensis ssp. capsici has a unique genome structure, and its multiple plasmid-borne genes play critical roles in virulence in pepper, either separately or together. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V
Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...
Skovgaard, M; Jensen, L J; Brunak, S
In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....
Watson, Corey T; Roussos, Panos; Garg, Paras; Ho, Daniel J; Azam, Nidha; Katsel, Pavel L; Haroutunian, Vahram; Sharp, Andrew J
Alzheimer's disease affects ~13% of people in the United States 65 years and older, making it the most common neurodegenerative disorder. Recent work has identified roles for environmental, genetic, and epigenetic factors in Alzheimer's disease risk. We performed a genome-wide screen of DNA methylation using the Illumina Infinium HumanMethylation450 platform on bulk tissue samples from the superior temporal gyrus of patients with Alzheimer's disease and non-demented controls. We paired a sliding window approach with multivariate linear regression to characterize Alzheimer's disease-associated differentially methylated regions (DMRs). We identified 479 DMRs exhibiting a strong bias for hypermethylated changes, a subset of which were independently associated with aging. DMR intervals overlapped 475 RefSeq genes enriched for gene ontology categories with relevant roles in neuron function and development, as well as cellular metabolism, and included genes reported in Alzheimer's disease genome-wide and epigenome-wide association studies. DMRs were enriched for brain-specific histone signatures and for binding motifs of transcription factors with roles in the brain and Alzheimer's disease pathology. Notably, hypermethylated DMRs preferentially overlapped poised promoter regions, marked by H3K27me3 and H3K4me3, previously shown to co-localize with aging-associated hypermethylation. Finally, the integration of DMR-associated single nucleotide polymorphisms with Alzheimer's disease genome-wide association study risk loci and brain expression quantitative trait loci highlights multiple potential DMRs of interest for further functional analysis. We have characterized changes in DNA methylation in the superior temporal gyrus of patients with Alzheimer's disease, highlighting novel loci that facilitate better characterization of pathways and mechanisms underlying Alzheimer's disease pathogenesis, and improve our understanding of epigenetic signatures that may contribute to the
Shao, Mingfu; Moret, Bernard M E
A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.
Yang, Yanmei; Wang, Jinpeng; Di, Jianyong
Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.
Mondego, Jorge M C; Carazzolle, Marcelo F; Costa, Gustavo G L; Formighieri, Eduardo F; Parizzi, Lucas P; Rincones, Johana; Cotomacci, Carolina; Carraro, Dirce M; Cunha, Anderson F; Carrer, Helaine; Vidal, Ramon O; Estrela, Raíssa C; García, Odalys; Thomazella, Daniela P T; de Oliveira, Bruno V; Pires, Acássia Bl; Rio, Maria Carolina S; Araújo, Marcos Renato R; de Moraes, Marcos H; Castro, Luis A B; Gramacho, Karina P; Gonçalves, Marilda S; Neto, José P Moura; Neto, Aristóteles Góes; Barbosa, Luciana V; Guiltinan, Mark J; Bailey, Bryan A; Meinhardt, Lyndel W; Cascardo, Julio Cm; Pereira, Gonçalo A G
The basidiomycete fungus Moniliophthora perniciosa is the causal agent of Witches' Broom Disease (WBD) in cacao (Theobroma cacao). It is a hemibiotrophic pathogen that colonizes the apoplast of cacao's meristematic tissues as a biotrophic pathogen, switching to a saprotrophic lifestyle during later stages of infection. M. perniciosa, together with the related species M. roreri, are pathogens of aerial parts of the plant, an uncommon characteristic in the order Agaricales. A genome survey (1.9x coverage) of M. perniciosa was analyzed to evaluate the overall gene content of this phytopathogen. Genes encoding proteins involved in retrotransposition, reactive oxygen species (ROS) resistance, drug efflux transport and cell wall degradation were identified. The great number of genes encoding cytochrome P450 monooxygenases (1.15% of gene models) indicates that M. perniciosa has a great potential for detoxification, production of toxins and hormones; which may confer a high adaptive ability to the fungus. We have also discovered new genes encoding putative secreted polypeptides rich in cysteine, as well as genes related to methylotrophy and plant hormone biosynthesis (gibberellin and auxin). Analysis of gene families indicated that M. perniciosa have similar amounts of carboxylesterases and repertoires of plant cell wall degrading enzymes as other hemibiotrophic fungi. In addition, an approach for normalization of gene family data using incomplete genome data was developed and applied in M. perniciosa genome survey. This genome survey gives an overview of the M. perniciosa genome, and reveals that a significant portion is involved in stress adaptation and plant necrosis, two necessary characteristics for a hemibiotrophic fungus to fulfill its infection cycle. Our analysis provides new evidence revealing potential adaptive traits that may play major roles in the mechanisms of pathogenicity in the M. perniciosa/cacao pathosystem.
Bailey Bryan A
Full Text Available Abstract Background The basidiomycete fungus Moniliophthora perniciosa is the causal agent of Witches' Broom Disease (WBD in cacao (Theobroma cacao. It is a hemibiotrophic pathogen that colonizes the apoplast of cacao's meristematic tissues as a biotrophic pathogen, switching to a saprotrophic lifestyle during later stages of infection. M. perniciosa, together with the related species M. roreri, are pathogens of aerial parts of the plant, an uncommon characteristic in the order Agaricales. A genome survey (1.9× coverage of M. perniciosa was analyzed to evaluate the overall gene content of this phytopathogen. Results Genes encoding proteins involved in retrotransposition, reactive oxygen species (ROS resistance, drug efflux transport and cell wall degradation were identified. The great number of genes encoding cytochrome P450 monooxygenases (1.15% of gene models indicates that M. perniciosa has a great potential for detoxification, production of toxins and hormones; which may confer a high adaptive ability to the fungus. We have also discovered new genes encoding putative secreted polypeptides rich in cysteine, as well as genes related to methylotrophy and plant hormone biosynthesis (gibberellin and auxin. Analysis of gene families indicated that M. perniciosa have similar amounts of carboxylesterases and repertoires of plant cell wall degrading enzymes as other hemibiotrophic fungi. In addition, an approach for normalization of gene family data using incomplete genome data was developed and applied in M. perniciosa genome survey. Conclusion This genome survey gives an overview of the M. perniciosa genome, and reveals that a significant portion is involved in stress adaptation and plant necrosis, two necessary characteristics for a hemibiotrophic fungus to fulfill its infection cycle. Our analysis provides new evidence revealing potential adaptive traits that may play major roles in the mechanisms of pathogenicity in the M. perniciosa
Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong
Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.
Full Text Available Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing.
Myelodysplastic syndromes (MDS) are characterized by clonal proliferation of hematopoietic stem/progenitor cells and their apoptosis, and show a propensity to progress to acute myelogenous leukemia (AML). Although MDS are recognized as neoplastic diseases caused by genomic aberrations of hematopoietic cells, the details of the genetic abnormalities underlying disease development have not as yet been fully elucidated due to difficulties in analyzing chromosomal abnormalities. Recent advances in comprehensive analyses of disease genomes including whole-genome sequencing technologies have revealed the genomic abnormalities in MDS. Surprisingly, gene mutations were found in approximately 80-90% of cases with MDS, and the novel mutations discovered with these technologies included previously unknown, MDS-specific, mutations such as those of the genes in the RNA-splicing machinery. It is anticipated that these recent studies will shed new light on the pathophysiology of MDS due to genomic aberrations.
Full Text Available Whole genome duplication (WGD and tandem duplication (TD are both important modes of gene expansion. However, how whole genome duplication influences tandemly duplicated genes is not well studied. We used Brassica rapa, which has undergone an additional genome triplication (WGT and shares a common ancestor with Arabidopsis thaliana, Arabidopsis lyrata and Thellungiella parvula, to investigate the impact of genome triplication on tandem gene evolution. We identified 2,137, 1,569, 1,751 and 1,135 tandem gene arrays in B. rapa, A. thaliana, A. lyrata and T. parvula respectively. Among them, 414 conserved tandem arrays are shared by the 3 species without WGT, which were also considered as existing in the diploid ancestor of B. rapa. Thus, after genome triplication, B. rapa should have 1,242 tandem arrays according to the 414 conserved tandems. Here, we found 400 out of the 414 tandems had at least one syntenic ortholog in the genome of B. rapa. Furthermore, 294 out of the 400 shared syntenic orthologs maintain tandem arrays (more than one gene for each syntenic hit in B. rapa. For the 294 tandem arrays, we obtained 426 copies of syntenic paralogous tandems in the triplicated genome of B. rapa. In this study, we demonstrated that tandem arrays in B. rapa were dramatically fractionated after WGT when compared either to non-tandem genes in the B. rapa genome or to the tandem arrays in closely related species that have not experienced a recent whole-genome polyploidization event.
Full Text Available Abstract Background Sox domain containing genes are important metazoan transcriptional regulators implicated in a wide rage of developmental processes. The vertebrate B subgroup contains the Sox1, Sox2 and Sox3 genes that have early functions in neural development. Previous studies show that Drosophila Group B genes have been functionally conserved since they play essential roles in early neural specification and mutations in the Drosophila Dichaete and SoxN genes can be rescued with mammalian Sox genes. Despite their importance, the extent and organisation of the Group B family in Drosophila has not been fully characterised, an important step in using Drosophila to examine conserved aspects of Group B Sox gene function. Results We have used the directed cDNA sequencing along with the output from the publicly-available genome sequencing projects to examine the structure of Group B Sox domain genes in Drosophila melanogaster, Drosophila pseudoobscura, Anopheles gambiae and Apis mellifora. All of the insect genomes contain four genes encoding Group B proteins, two of which are intronless, as is the case with vertebrate group B genes. As has been previously reported and unusually for Group B genes, two of the insect group B genes, Sox21a and Sox21b, contain introns within their DNA-binding domains. We find that the highly unusual multi-exon structure of the Sox21b gene is common to the insects. In addition, we find that three of the group B Sox genes are organised in a linked cluster in the insect genomes. By in situ hybridisation we show that the pattern of expression of each of the four group B genes during embryogenesis is conserved between D. melanogaster and D. pseudoobscura. Conclusion The DNA-binding domain sequences and genomic organisation of the group B genes have been conserved over 300 My of evolution since the last common ancestor of the Hymenoptera and the Diptera. Our analysis suggests insects have two Group B1 genes, SoxN and
Grigoriev Igor V
Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.
Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J
Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.
Full Text Available Correlations of genetic variation in DNA with functional brain activity have already provided a starting point for delving into human cognitive mechanisms. However, these analyses do not provide the specific genes driving the associations, which are complicated by intergenic localization as well as tissue-specific epigenetics and expression. The use of brain-derived expression datasets could build upon the foundation of these initial genetic insights and yield genes and molecular pathways for testing new hypotheses regarding the molecular bases of human brain development, cognition, and disease. Thus, coupling these human brain gene expression data with measurements of brain activity may provide genes with critical roles in brain function. However, these brain gene expression datasets have their own set of caveats, most notably a reliance on postmortem tissue. In this perspective, I summarize and examine the progress that has been made in this realm to date, and discuss the various frontiers remaining, such as the inclusion of cell-type-specific information, additional physiological measurements, and genomic data from patient cohorts.
Butterworth, A.S.; Braund, P.S.; Hardwick, R.J.; Saleheen, D.; Peden, J.F.; Soranzo, N.; Chambers, J.C.; Kleber, M.E.; Keating, B.; Qasim, A.; Klopp, N.; Erdmann, J.; Basart, H.; Baumert, J.H.; Bezzina, C.R.; Boehm, B.O.; Brocheton, J.; Bugert, P.; Cambien, F.; Collins, R.; Couper, D.; Jong, J.S. de; Diemert, P.; Ejebe, K.; Elbers, C.C.; Elliott, P.; Fornage, M.; Frossard, P.; Garner, S.; Hunt, S.E.; Kastelein, J.J.; Klungel, O.H.; Kluter, H.; Koch, K.; Konig, I.R.; Kooner, A.S.; Liu, K.; McPherson, R.; Musameh, M.D.; Musani, S.; Papanicolaou, G.; Peters, A.; Peters, B.J.; Potter, S.; Psaty, B.M.; Rasheed, A.; Scott, J.; Seedorf, U.; Sehmi, J.S.; Sotoodehnia, N.; Stark, K.; Stephens, J.; Schoot, C.E. van der; Schouw, Y.T. van der; Harst, P. van der; Vasan, R.S.; Wilde, A.A.; Willenborg, C.; Winkelmann, B.R.; Zaidi, M.; Zhang, W.; Ziegler, A.; Koenig, W.; Matz, W.; Trip, M.D.; Reilly, M.P.; Kathiresan, S.; Schunkert, H.; Hamsten, A.; Hall, A.S.; Kooner, J.S.; Thompson, S.G.; Thompson, J.R.; Watkins, H.; Danesh, J.; Barnes, T.; Rafelt, S.; Codd, V.; Bruinsma, N.; Dekker, L.R.; Henriques, J.P.; Koch, K.T.; Winter, R.J. de; Alings, M.; Allaart, C.F.; Gorgels, A.P.; Verheugt, F.W.A.; Mueller, M.; Meisinger, C.; DerOhannessian, S.; Mehta, N.N.; Ferguson, J.; Hakonarson, H.; Matthai, W.; Wilensky, R.; Hopewell, J.C.; Parish, S.; Linksted, P.; Notman, J.; Gonzalez, H.; Young, A.; Ostley, T.; Munday, A.; Goodwin, N.; Verdon, V.; Shah, S.; Edwards, C.; Mathews, C.; Gunter, R.; Benham, J.; Davies, C.; Cobb, M.; Cobb, L.; Crowther, J.; Richards, A.; Silver, M.; Tochlin, S.; Mozley, S.; Clark, S.; Radley, M.; Kourellias, K.; Olsson, P.; Barlera, S.; Tognoni, G.; Rust, S.; Assmann, G.; Heath, S.; Zelenika, D.; Gut, I.; Green, F.; Farrall, M.; Goel, A.; Ongen, H.; Franzosi, M.G.; Lathrop, M.; Clarke, R.; Aly, A.; Anner, K.; Bjorklund, K.; Blomgren, G.; Cederschiold, B.; Danell-Toverud, K.; Eriksson, P.; Grundstedt, U.; Heinonen, M.; Hellenius, M.L.; Hooft, F. van 't; Husman, K.; Lagercrantz, J.; Larsson, A.; Larsson, M.; Mossfeldt, M.; Malarstig, A.; Olsson, G.; Sabater-Lleal, M.; Sennblad, B.; Silveira, A.; Strawbridge, R.; Soderholm, B.; Ohrvik, J.; Zaman, K.S.; Mallick, N.H.; Azhar, M.; Samad, A.; Ishaq, M.; Shah, N.; Samuel, M.; Kathiresan, S.C.; Assimes, T.L.; Holm, H.; Preuss, M.; Stewart, A.F.; Barbalic, M.; Gieger, C.; Absher, D.; Aherrahrou, Z.; Allayee, H.; Altshuler, D.; Anand, S.; Andersen, K.; Anderson, J.L.; Ardissino, D.; Ball, S.G.; Balmforth, A.J.; Barnes, T.A.; Becker, L.C.; Becker, D.M.; Berger, K.; Bis, J.C.; Boekholdt, S.M.; Boerwinkle, E.; Brown, M.J.; Burnett, M.S.; Buysschaert, I.; Carlquist, J.F.; Chen, L.; Davies, R.W.; Dedoussis, G.; Dehghan, A.; Demissie, S.; Devaney, J.; Do, R.; Doering, A.; El Mokhtari, N.E.; Ellis, S.G.; Elosua, R.; Engert, J.C.; Epstein, S.; Faire, U. de; Fischer, M.; Folsom, A.R.; Freyer, J.; Gigante, B.; Girelli, D.; Gretarsdottir, S.; Gudnason, V.; Gulcher, J.R.; Tennstedt, S.; Halperin, E.; Hammond, N.; Hazen, S.L.; Hofman, A.; Horne, B.D.; Illig, T.; Iribarren, C.; Jones, G.T.; Jukema, J.W.; Kaiser, M.A.; Kaplan, L.M.; Khaw, K.T.; Knowles, J.W.; Kolovou, G.; Kong, A.; Laaksonen, R.; Lambrechts, D.; Leander, K.; Li, M.; Lieb, W.; Lettre, G.; Loley, C.; Lotery, A.J.; Mannucci, P.M.; Martinelli, N.; McKeown, P.P.; Meitinger, T.; Melander, O.; Merlini, P.A.; Mooser, V.; Morgan, T.; Muhleisen T.W., .; Muhlestein, J.B.; Musunuru, K.; Nahrstaedt, J.; Nothen, Markus; Olivieri, O.; Peyvandi, F.; Patel, R.S.; Patterson, C.C.; Qu, L.; Quyyumi, A.A.; Rader, D.J.; Rallidis, L.S.; Rice, C.; Roosendaal, F.R.; Rubin, D.; Salomaa, V.; Sampietro, M.L.; Sandhu, M.S.; Schadt, E.; Schafer, A.; Schillert, A.; Schreiber, S.; Schrezenmeir, J.; Schwartz, S.M.; Siscovick, D.S.; Sivananthan, M.; Sivapalaratnam, S.; Smith, A.V.; Smith, T.B.; Snoep, J.D.; Spertus, J.A.; Stefansson, K.; Stirrups, K.; Stoll, M.; Tang, W.H.; Thorgeirsson, G.; Thorleifsson, G.; Tomaszewski, M.; Uitterlinden, A.G.; Rij, A.M. van; Voight, B.F.; Wareham, N.J.; AWells, G.; Wichmann, H.E.; Witteman, J.C.; Wright, B.J.; Ye, S.; Cupples, L.A.; Quertermous, T.; Marz, W.; Blankenberg, S.; Thorsteinsdottir, U.; Roberts, R.; O'Donnell, C.J.; Onland-Moret, N.C.; Setten, J. van; Bakker, P.I. de; Verschuren, W.M.; Boer, J.M.; Wijmenga, C.; Hofker, M.H.; Maitland-van der Zee, A.H.; Boer, A. de; Grobbee, D.E.; Attwood, T.; Belz, S.; Cooper, J.; Crisp-Hihn, A.; Deloukas, P.; Foad, N.; Goodall, A.H.; Gracey, J.; Gray, E.; Gwilliams, R.; Heimerl, S.; Hengstenberg, C.; Jolley, J.; Krishnan, U.; Lloyd-Jones, H.; Lugauer, I.; Lundmark, P.; Maouche, S.; Moore, J.S.; Muir, D.; Murray, E.; Nelson, C.P.; Neudert, J.; Niblett, D.; O'Leary, K.; Ouwehand, W.H.; Pollard, H.; Rankin, A.; Rice, C.M.; Sager, H.; Samani, N.J.; Sambrook, J.; Schmitz, G.; Scholz, M.; Schroeder, L.; Syvannen, A.C.; Wallace, C.
Coronary artery disease (CAD) has a significant genetic contribution that is incompletely characterized. To complement genome-wide association (GWA) studies, we conducted a large and systematic candidate gene study of CAD susceptibility, including analysis of many uncommon and functional variants.
Rossin, Elizabeth J.; Lage, Kasper; Raychaudhuri, Soumya; Xavier, Ramnik J.; Tatar, Diana; Benita, Yair
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in
Full Text Available Celiac disease is a common autoimmune disorder characterized by an intestinal inflammation triggered by gluten, a storage protein found in wheat, rye and barley. Similar to other autoimmune diseases such as type 1 diabetes, psoriasis and rheumatoid arthritis, celiac disease is the result of an immune response to self-antigens leading to tissue destruction and production of autoantibodies. Common diseases like celiac disease have a complex pattern of inheritance with inputs from both environmental as well as additive and non-additive genetic factors. In the past few years, Genome Wide Association Studies (GWAS have been successful in finding genetic risk variants behind many common diseases and traits. To complement and add to the previous findings, we performed a GWAS including 206 trios from 97 nuclear Swedish and Norwegian families affected with celiac disease. By stratifying for HLA-DQ, we identified a new genome-wide significant risk locus covering the DUSP10 gene. To further investigate the associations from the GWAS we performed pathway analyses and two-locus interaction analyses. These analyses showed an over-representation of genes involved in type 2 diabetes and identified a set of candidate mechanisms and genes of which some were selected for mRNA expression analysis using small intestinal biopsies from 98 patients. Several genes were expressed differently in the small intestinal mucosa from patients with celiac autoimmunity compared to intestinal mucosa from control patients. From top-scoring regions we identified susceptibility genes in several categories: 1 polarity and epithelial cell functionality; 2 intestinal smooth muscle; 3 growth and energy homeostasis, including proline and glutamine metabolism; and finally 4 innate and adaptive immune system. These genes and pathways, including specific functions of DUSP10, together reveal a new potential biological mechanism that could influence the genesis of celiac disease, and possibly
Mollenhauer, J; Holmskov, U; Wiemann, S
Increasing evidence has accumulated for an involvement of the inactivation of tumour suppressor genes at chromosome 10q in the carcinogenesis of brain tumours, melanomas, and carcinomas of the lung, the prostate, the pancreas, and the endometrium. The gene DMBT1 (Deleted in Malignant Brain Tumours...... 1) is located at chromosome 10q25.3-q26.1, within one of the putative intervals for tumour suppressor genes. DMBT1 is a member of the scavenger-receptor cysteine-rich (SRCR) superfamily and displays homozygous deletions or lack of expression in glioblastoma multiforme, medulloblastoma......, and in gastrointestinal and lung cancers. Based on these properties, DMBT1 has been proposed to be a candidate tumour suppressor gene. We have determined the genomic sequence of DMBT1 to allow analyses of mutations. The gene has at least 54 exons that span a genomic region of about 80 kb. We have identified a putative...
Li, Xueyan; Fan, Dingding; Zhang, Wei; Liu, Guichun; Zhang, Lu; Zhao, Li; Fang, Xiaodong; Chen, Lei; Dong, Yang; Chen, Yuan; Ding, Yun; Zhao, Ruoping; Feng, Mingji; Zhu, Yabing; Feng, Yue; Jiang, Xuanting; Zhu, Deying; Xiang, Hui; Feng, Xikan; Li, Shuaicheng; Wang, Jun; Zhang, Guojie; Kronforst, Marcus R.; Wang, Wen
Butterflies are exceptionally diverse but their potential as an experimental system has been limited by the difficulty of deciphering heterozygous genomes and a lack of genetic manipulation technology. Here we use a hybrid assembly approach to construct high-quality reference genomes for Papilio xuthus (contig and scaffold N50: 492 kb, 3.4 Mb) and Papilio machaon (contig and scaffold N50: 81 kb, 1.15 Mb), highly heterozygous species that differ in host plant affiliations, and adult and larval colour patterns. Integrating comparative genomics and analyses of gene expression yields multiple insights into butterfly evolution, including potential roles of specific genes in recent diversification. To functionally test gene function, we develop an efficient (up to 92.5%) CRISPR/Cas9 gene editing method that yields obvious phenotypes with three genes, Abdominal-B, ebony and frizzled. Our results provide valuable genomic and technological resources for butterflies and unlock their potential as a genetic model system. PMID:26354079
Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T
SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research
Li, Zuofeng; Liu, Xingnan; Wen, Jingran; Xu, Ye; Zhao, Xin; Li, Xuan; Liu, Lei; Zhang, Xiaoyan
With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html. © 2011 Wiley-Liss, Inc.
Zaghloul, Lamia; Baker, Antoine; Audit, Benjamin; Arneodo, Alain
We investigate the large-scale organization of human genes with respect to "master" replication origins that were previously identified as bordering nucleotide compositional skew domains. We separate genes in two categories depending on their CpG enrichment at the promoter which can be considered as a marker of germline DNA methylation. Using expression data in mouse, we confirm that CpG-rich genes are highly expressed in germline whereas CpG-poor genes are in a silent state. We further show that, whether tissue-specific or broadly expressed (housekeeping genes), the CpG-rich genes are over-represented close to the replication skew domain borders suggesting some coordination of replication and transcription. We also reveal that the transcription of the longest CpG-rich genes is co-oriented with replication fork progression so that the promoter of these transcriptionally active genes be located into the accessible open chromatin environment surrounding the master replication origins that border the replication skew domains. The observation of a similar gene organization in the mouse genome confirms the interplay of replication, transcription and chromatin structure as the cornerstone of mammalian genome architecture.
The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.
Betancourt, Angela M.; King, Adrienne L.; Fetterman, Jessica L.; Millender-Swain, Telisha; Finley, Rachel D.; Oliva, Claudia R.; Crowe, David Ralph; Ballinger, Scott W.; Bailey, Shannon M.
Nonalcoholic fatty liver disease (NAFLD) involves significant changes in liver metabolism characterized by oxidative stress, lipid accumulation, and fibrogenesis. Mitochondrial dysfunction and bioenergetic defects also contribute to NAFLD. Herein, we examined whether differences in mtDNA influence NAFLD. To determine the role of mitochondrial and nuclear genomes in NAFLD, Mitochondrial-Nuclear eXchange (MNX) mice were fed an atherogenic diet. MNX mice have mtDNA from C57BL/6J mice on a C3H/HeN nuclear background and vice versa. Results from MNX mice were compared to wild-type C57BL/6J and C3H/HeN mice fed a control or atherogenic diet. Mice with the C57BL/6J nuclear genome developed more macrosteatosis, inflammation, and fibrosis compared with mice containing the C3H/HeN nuclear genome when fed the atherogenic diet. These changes were associated with parallel alterations in inflammation and fibrosis gene expression in wild-type mice, with intermediate responses in MNX mice. Mice with the C57BL/6J nuclear genome had increased State 4 respiration, whereas MNX mice had decreased State 3 respiration and RCR when fed the atherogenic diet. Complex IV activity and most mitochondrial biogenesis genes were increased in mice with the C57BL/6J nuclear or mitochondrial genome, or both fed the atherogenic diet. These results reveal new interactions between mitochondrial and nuclear genomes and support the concept that mtDNA influences mitochondrial function and metabolic pathways implicated in NAFLD. PMID:24758559
Chiu, Yu-Chiao; Wang, Li-Ju; Hsiao, Tzu-Hung; Chuang, Eric Y; Chen, Yidong
With the advances in high-throughput gene profiling technologies, a large volume of gene interaction maps has been constructed. A higher-level layer of gene-gene interaction, namely modulate gene interaction, is composed of gene pairs of which interaction strengths are modulated by (i.e., dependent on) the expression level of a key modulator gene. Systematic investigations into the modulation by estrogen receptor (ER), the best-known modulator gene, have revealed the functional and prognostic significance in breast cancer. However, a genome-wide identification of key modulator genes that may further unveil the landscape of modulated gene interaction is still lacking. We proposed a systematic workflow to screen for key modulators based on genome-wide gene expression profiles. We designed four modularity parameters to measure the ability of a putative modulator to perturb gene interaction networks. Applying the method to a dataset of 286 breast tumors, we comprehensively characterized the modularity parameters and identified a total of 973 key modulator genes. The modularity of these modulators was verified in three independent breast cancer datasets. ESR1, the encoding gene of ER, appeared in the list, and abundant novel modulators were illuminated. For instance, a prognostic predictor of breast cancer, SFRP1, was found the second modulator. Functional annotation analysis of the 973 modulators revealed involvements in ER-related cellular processes as well as immune- and tumor-associated functions. Here we present, as far as we know, the first comprehensive analysis of key modulator genes on a genome-wide scale. The validity of filtering parameters as well as the conservativity of modulators among cohorts were corroborated. Our data bring new insights into the modulated layer of gene-gene interaction and provide candidates for further biological investigations.
Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.
Kaneko-Ishino, Tomoko; Ishino, Fumitoshi
Mammals, including human beings, have evolved a unique viviparous reproductive system and a highly developed central nervous system. How did these unique characteristics emerge in mammalian evolution, and what kinds of changes did occur in the mammalian genomes as evolution proceeded? A key conceptual term in approaching these issues is "mammalian-specific genomic functions", a concept covering both mammalian-specific epigenetics and genetics. Genomic imprinting and LTR retrotransposon-derived genes are reviewed as the representative, mammalian-specific genomic functions that are essential not only for the current mammalian developmental system, but also mammalian evolution itself. First, the essential roles of genomic imprinting in mammalian development, especially related to viviparous reproduction via placental function, as well as the emergence of genomic imprinting in mammalian evolution, are discussed. Second, we introduce the novel concept of "mammalian-specific traits generated by mammalian-specific genes from LTR retrotransposons", based on the finding that LTR retrotransposons served as a critical driving force in the mammalian evolution via generating mammalian-specific genes.
Gruenert, D C; Novelli, G; Dallapiccola, B; Colosimo, A
The recent surge of DNA sequence information resulting from the efforts of agencies interested in deciphering the human genetic code has facilitated technological developments that have been critical in the identification of genes associated with numerous disease pathologies. In addition, these efforts have opened the door to the opportunity to develop novel genetic therapies to treat a broad range of inherited disorders. Through a joint effort by the University of Vermont, the University of Rome, Tor Vergata, University of Rome, La Sapienza, and the CSS Mendel Institute, Rome, an international meeting, 'Genome Medicine: Gene Therapy for the Millennium' was organized. This meeting provided a forum for the discussion of scientific and clinical advances stimulated by the explosion of sequence information generated by the Human Genome Project and the implications these advances have for gene therapy. The meeting had six sessions that focused on the functional evaluation of specific genes via biochemical analysis and through animal models, the development of novel therapeutic strategies involving gene targeting, artificial chromsomes, DNA delivery systems and non-embryonic stem cells, and on the ethical and social implications of these advances.
Qiu, Ying-Hua; Deng, Fei-Yan; Li, Min-Jing; Lei, Shu-Feng
Type 1 diabetes mellitus is a serious disorder characterized by destruction of pancreatic β-cells, culminating in absolute insulin deficiency. Genetic factors contribute to the susceptibility of type 1 diabetes mellitus. The aim of the present study was to identify more susceptibility genes of type 1 diabetes mellitus. We carried out an initial gene-based genome-wide association study in a total of 4,075 type 1 diabetes mellitus cases and 2,604 controls by using the Gene-based Association Test using Extended Simes procedure. Furthermore, we carried out replication studies, differential expression analysis and functional annotation clustering analysis to support the significance of the identified susceptibility genes. We identified 452 genes associated with type 1 diabetes mellitus, even after adapting the genome-wide threshold for significance (P diabetes mellitus, which were ignored in single-nucleotide polymorphism-based association analysis and were not previously reported. We found that 53 genes have supportive evidence from replication studies and/or differential expression studies. In particular, seven genes including four non-human leukocyte antigen (HLA) genes (RASIP1, STRN4, BCAR1 and MYL2) are replicated in at least one independent population and also differentially expressed in peripheral blood mononuclear cells or monocytes. Furthermore, the associated genes tend to enrich in immune-related pathways or Gene Ontology project terms. The present results suggest the high power of gene-based association analysis in detecting disease-susceptibility genes. Our findings provide more insights into the genetic basis of type 1 diabetes mellitus.
Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay
A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Full Text Available Nucleotide binding site leucine-rich repeats (NBS-LRR disease resistance proteins play an important role in plant defense against pathogen attack. A number of recent studies have been carried out to identify and characterize NBS-LRR gene families in many important plant species. In this study, we identified NBS-LRR gene family comprising of 1015 NBS-LRRs using highly stringent computational methods. These NBS-LRRs were characterized on the basis of conserved protein motifs, gene duplication events, chromosomal locations, phylogenetic relationships and digital gene expression analysis. Surprisingly, equal distribution of Toll/interleukin-1 receptor (TIR and coiled coil (CC (1 ∶ 1 was detected in apple while the unequal distribution was reported in majority of all other known plant genome studies. Prediction of gene duplication events intriguingly revealed that not only tandem duplication but also segmental duplication may equally be responsible for the expansion of the apple NBS-LRR gene family. Gene expression profiling using expressed sequence tags database of apple and quantitative real-time PCR (qRT-PCR revealed the expression of these genes in wide range of tissues and disease conditions, respectively. Taken together, this study will provide a blueprint for future efforts towards improvement of disease resistance in apple.
Arya, Preeti; Kumar, Gulshan; Acharya, Vishal; Singh, Anil K
Nucleotide binding site leucine-rich repeats (NBS-LRR) disease resistance proteins play an important role in plant defense against pathogen attack. A number of recent studies have been carried out to identify and characterize NBS-LRR gene families in many important plant species. In this study, we identified NBS-LRR gene family comprising of 1015 NBS-LRRs using highly stringent computational methods. These NBS-LRRs were characterized on the basis of conserved protein motifs, gene duplication events, chromosomal locations, phylogenetic relationships and digital gene expression analysis. Surprisingly, equal distribution of Toll/interleukin-1 receptor (TIR) and coiled coil (CC) (1 ∶ 1) was detected in apple while the unequal distribution was reported in majority of all other known plant genome studies. Prediction of gene duplication events intriguingly revealed that not only tandem duplication but also segmental duplication may equally be responsible for the expansion of the apple NBS-LRR gene family. Gene expression profiling using expressed sequence tags database of apple and quantitative real-time PCR (qRT-PCR) revealed the expression of these genes in wide range of tissues and disease conditions, respectively. Taken together, this study will provide a blueprint for future efforts towards improvement of disease resistance in apple.
Ren, Shancheng; Wei, Gong-Hong; Liu, Dongbing
BACKGROUND: Global disparities in prostate cancer (PCa) incidence highlight the urgent need to identify genomic abnormalities in prostate tumors in different ethnic populations including Asian men. OBJECTIVE: To systematically explore the genomic complexity and define disease-driven genetic......-scale and comprehensive genomic data of prostate cancer from Asian population. Identification of these genetic alterations may help advance prostate cancer diagnosis, prognosis, and treatment....... alterations in PCa. DESIGN, SETTING, AND PARTICIPANTS: The study sequenced whole-genome and transcriptome of tumor-benign paired tissues from 65 treatment-naive Chinese PCa patients. Subsequent targeted deep sequencing of 293 PCa-relevant genes was performed in another cohort of 145 prostate tumors. OUTCOME...
Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full
Full Text Available Abstract Background Whole-genome physical maps facilitate genome sequencing, sequence assembly, mapping of candidate genes, and the design of targeted genetic markers. An automated protocol was used to construct a Vitis vinifera 'Cabernet Sauvignon' physical map. The quality of the result was addressed with regard to the effect of high heterozygosity on the accuracy of contig assembly. Its usefulness for the genome-wide mapping of genes for disease resistance, which is an important trait for grapevine, was then assessed. Results The physical map included 29,727 BAC clones assembled into 1,770 contigs, spanning 715,684 kbp, and corresponding to 1.5-fold the genome size. Map inflation was due to high heterozygosity, which caused either the separation of allelic BACs in two different contigs, or local mis-assembly in contigs containing BACs from the two haplotypes. Genetic markers anchored 395 contigs or 255,476 kbp to chromosomes. The fully automated assembly and anchorage procedures were validated by BAC-by-BAC blast of the end sequences against the grape genome sequence, unveiling 7.3% of chimerical contigs. The distribution across the physical map of candidate genes for non-host and host resistance, and for defence signalling pathways was then studied. NBS-LRR and RLK genes for host resistance were found in 424 contigs, 133 of them (32% were assigned to chromosomes, on which they are mostly organised in clusters. Non-host and defence signalling genes were found in 99 contigs dispersed without a discernable pattern across the genome. Conclusion Despite some limitations that interfere with the correct assembly of heterozygous clones into contigs, the 'Cabernet Sauvignon' physical map is a useful and reliable intermediary step between a genetic map and the genome sequence. This tool was successfully exploited for a quick mapping of complex families of genes, and it strengthened previous clues of co-localisation of major NBS-LRR clusters and
Hung, Sandy S C; McCaughey, Tristan; Swann, Olivia; Pébay, Alice; Hewitt, Alex W
The Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) and CRISPR-associated protein (Cas) system has enabled an accurate and efficient means to edit the human genome. Rapid advances in this technology could results in imminent clinical application, and with favourable anatomical and immunological profiles, ophthalmic disease will be at the forefront of such work. There have been a number of breakthroughs improving the specificity and efficacy of CRISPR/Cas-mediated genome editing. Similarly, better methods to identify off-target cleavage sites have also been developed. With the impending clinical utility of CRISPR/Cas technology, complex ethical issues related to the regulation and management of the precise applications of human gene editing must be considered. This review discusses the current progress and recent breakthroughs in CRISPR/Cas-based gene engineering, and outlines some of the technical issues that must be addressed before gene correction, be it in vivo or in vitro, is integrated into ophthalmic care. We outline a clinical pipeline for CRISPR-based treatments of inherited eye diseases and provide an overview of the important ethical implications of gene editing and how these may influence the future of this technology. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kim, Jeongwoo; Kim, Hyunjin; Yoon, Youngmi; Park, Sanghyun
Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which identifies disease-related genes using Google data and literature data. To implement this method, first, we construct a disease-related gene network using text-mining results. We then extract gene-gene interactions based on co-occurrences in abstract data obtained from PubMed, and calculate the weights of edges in the gene network by means of Z-scoring. The weights contain two values: the frequency and the Google search results. The frequency value is extracted from literature data, and the Google search result is obtained using Google. We assign a score to each gene through a network analysis. We assume that genes with a large number of links and numerous Google search results and frequency values are more likely to be involved in disease. For validation, we investigated the top 20 inferred genes for five different diseases using answer sets. The answer sets comprised six databases that contain information on disease-gene relationships. We identified a significant number of disease-related genes as well as candidate genes for Alzheimer's disease, diabetes, colon cancer, lung cancer, and prostate cancer. Our method was up to 40% more accurate than existing methods. Copyright © 2015 Elsevier Inc. All rights reserved.
Full Text Available High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia. The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as “crucial substances” in subgingival plaque, which may
Zhang, Yifei; Zhen, Min; Zhan, Yalin; Song, Yeqing; Zhang, Qian; Wang, Jinfeng
High-throughput sequencing has helped to reveal the close relationship between Prevotella and periodontal disease, but the roles of subspecies diversity and genomic variation within this genus in periodontal diseases still need to be investigated. We performed a comparative genome analysis of 48 Prevotella intermedia and Prevotella nigrescens isolates that from the same cohort of subjects to identify the main drivers of their pathogenicity and adaptation to different environments. The comparisons were done between two species and between disease and health based on pooled sequences. The results showed that both P. intermedia and P. nigrescens have highly dynamic genomes and can take up various exogenous factors through horizontal gene transfer. The major differences between disease-derived and health-derived samples of P. intermedia and P. nigrescens were factors related to genome modification and recombination, indicating that the Prevotella isolates from disease sites may be more capable of genomic reconstruction. We also identified genetic elements specific to each sample, and found that disease groups had more unique virulence factors related to capsule and lipopolysaccharide synthesis, secretion systems, proteinases, and toxins, suggesting that strains from disease sites may have more specific virulence, particularly for P. intermedia . The differentially represented pathways between samples from disease and health were related to energy metabolism, carbohydrate and lipid metabolism, and amino acid metabolism, consistent with data from the whole subgingival microbiome in periodontal disease and health. Disease-derived samples had gained or lost several metabolic genes compared to healthy-derived samples, which could be linked with the difference in virulence performance between diseased and healthy sample groups. Our findings suggest that P. intermedia and P. nigrescens may serve as "crucial substances" in subgingival plaque, which may reflect changes in
Skovgaard, Marie; Jensen, L.J.; Brunak, Søren
In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....
Taylor Derek J
Full Text Available Abstract Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats. We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained
Ford, Susan E.
More and more infectious diseases affect marine molluscs. Some diseases have impacted commercial species including MSX and Dermo of the eastern oyster, QPX of hard clams, withering syndrome of abalone and ostreid herpesvirus 1 (OsHV-1) infections of many molluscs. Although the exact transmission mechanisms are not well understood, human activities and associated environmental changes often correlate with increased disease prevalence. For instance, hatcheries and large-scale aquaculture create high host densities, which, along with increasing ocean temperature, might have contributed to OsHV-1 epizootics in scallops and oysters. A key to understanding linkages between the environment and disease is to understand how the environment affects the host immune system. Although we might be tempted to downplay the role of immunity in invertebrates, recent advances in genomics have provided insights into host and parasite genomes and revealed surprisingly sophisticated innate immune systems in molluscs. All major innate immune pathways are found in molluscs with many immune receptors, regulators and effectors expanded. The expanded gene families provide great diversity and complexity in innate immune response, which may be key to mollusc's defence against diverse pathogens in the absence of adaptive immunity. Further advances in host and parasite genomics should improve our understanding of genetic variation in parasite virulence and host disease resistance. PMID:26880838
Cheng, Yanbo; Ma, Qibin; Ren, Hailong; Xia, Qiuju; Song, Enliang; Tan, Zhiyuan; Li, Shuxian; Zhang, Gengyun; Nian, Hai
Using a combination of phenotypic screening, genetic and statistical analyses, and high-throughput genome-wide sequencing, we have finely mapped a dominant Phytophthora resistance gene in soybean cultivar Wayao. Phytophthora root rot (PRR) caused by Phytophthora sojae is one of the most important soil-borne diseases in many soybean-production regions in the world. Identification of resistant gene(s) and incorporating them into elite varieties are an effective way for breeding to prevent soybean from being harmed by this disease. Two soybean populations of 191 F 2 individuals and 196 F 7:8 recombinant inbred lines (RILs) were developed to map Rps gene by crossing a susceptible cultivar Huachun 2 with the resistant cultivar Wayao. Genetic analysis of the F 2 population indicated that PRR resistance in Wayao was controlled by a single dominant gene, temporarily named RpsWY, which was mapped on chromosome 3. A high-density genetic linkage bin map was constructed using 3469 recombination bins of the RILs to explore the candidate genes by the high-throughput genome-wide sequencing. The results of genotypic analysis showed that the RpsWY gene was located in bin 401 between 4466230 and 4502773 bp on chromosome 3 through line 71 and 100 of the RILs. Four predicted genes (Glyma03g04350, Glyma03g04360, Glyma03g04370, and Glyma03g04380) were found at the narrowed region of 36.5 kb in bin 401. These results suggest that the high-throughput genome-wide resequencing is an effective method to fine map PRR candidate genes.
The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.
Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P
While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.
Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J
Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours' biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription-quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes.
Honour C McCann
Full Text Available The origins of crop diseases are linked to domestication of plants. Most crops were domesticated centuries--even millennia--ago, thus limiting opportunity to understand the concomitant emergence of disease. Kiwifruit (Actinidia spp. is an exception: domestication began in the 1930s with outbreaks of canker disease caused by P. syringae pv. actinidiae (Psa first recorded in the 1980s. Based on SNP analyses of two circularized and 34 draft genomes, we show that Psa is comprised of distinct clades exhibiting negligible within-clade diversity, consistent with disease arising by independent samplings from a source population. Three clades correspond to their geographical source of isolation; a fourth, encompassing the Psa-V lineage responsible for the 2008 outbreak, is now globally distributed. Psa has an overall clonal population structure, however, genomes carry a marked signature of within-pathovar recombination. SNP analysis of Psa-V reveals hundreds of polymorphisms; however, most reside within PPHGI-1-like conjugative elements whose evolution is unlinked to the core genome. Removal of SNPs due to recombination yields an uninformative (star-like phylogeny consistent with diversification of Psa-V from a single clone within the last ten years. Growth assays provide evidence of cultivar specificity, with rapid systemic movement of Psa-V in Actinidia chinensis. Genomic comparisons show a dynamic genome with evidence of positive selection on type III effectors and other candidate virulence genes. Each clade has highly varied complements of accessory genes encoding effectors and toxins with evidence of gain and loss via multiple genetic routes. Genes with orthologs in vascular pathogens were found exclusively within Psa-V. Our analyses capture a pathogen in the early stages of emergence from a predicted source population associated with wild Actinidia species. In addition to candidate genes as targets for resistance breeding programs, our findings
Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun
Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Chipman, Ariel D.; Ferrier, David E. K.; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S. T.; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C.; Alonso, Claudio R.; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C. J.; Blankenburg, Kerstin P.; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K.; Du Pasquier, Louis; Duncan, Elizabeth J.; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D.; Extavour, Cassandra G.; Francisco, Liezl; Gabaldón, Toni; Gillis, William J.; Goodwin-Horn, Elizabeth A.; Green, Jack E.; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J. P.; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H. L.; Hunn, Julia P.; Hunnekuhl, Vera S.; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N.; Jiggins, Francis M.; Jones, Tamsin E.; Kaiser, Tobias S.; Kalra, Divya; Kenny, Nathan J.; Korchina, Viktoriya; Kovar, Christie L.; Kraus, F. Bernhard; Lapraz, François; Lee, Sandra L.; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N.; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J.; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H.; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C.; Robertson, Helen E.; Robertson, Hugh M.; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E.; Schurko, Andrew M.; Siggens, Kenneth W.; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J.; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M.; Willis, Judith H.; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M.; Worley, Kim C.; Gibbs, Richard A.; Akam, Michael; Richards, Stephen
Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific
Grzybowska, Ewa A.
Highlights: ► Functional characteristics of intronless genes (IGs). ► Diseases associated with IGs. ► Origin and evolution of IGs. ► mRNA processing without splicing. -- Abstract: Intronless genes (IGs) constitute approximately 3% of the human genome. Human IGs are essentially different in evolution and functionality from the IGs of unicellular eukaryotes, which represent the majority in their genomes. Functional analysis of IGs has revealed a massive over-representation of signal transduction genes and genes encoding regulatory proteins important for growth, proliferation, and development. IGs also often display tissue-specific expression, usually in the nervous system and testis. These characteristics translate into IG-associated diseases, mainly neuropathies, developmental disorders, and cancer. IGs represent recent additions to the genome, created mostly by retroposition of processed mRNAs with retained functionality. Processing, nuclear export, and translation of these mRNAs should be hampered dramatically by the lack of splice factors, which normally tightly cover mature transcripts and govern their fate. However, natural IGs manage to maintain satisfactory expression levels. Different mechanisms by which IGs solve the problem of mRNA processing and nuclear export are discussed here, along with their possible impact on reporter studies.
Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong
Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
Full Text Available Aquaporins (Aqps are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication.In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event.To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family provides an
Lyne, Mike; Smith, Richard N; Lyne, Rachel; Aleksic, Jelena; Hu, Fengyuan; Kalderimis, Alex; Stepan, Radek; Micklem, Gos
Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first
Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.
Vanavichit, Apichart [Kasetsart University, Kamphaengsaen, Nakorn Pathom (Thailand)
A map-based approach has allowed scientists to discover few genes at a time. In addition, the reproductive barrier between cultivated rice and wild relatives has prevented us from utilizing the germ plasm by a map-based approach. Most genetic traits important to agriculture or human diseases are manifested as observable, quantitative phenotypes called Quantitative Trait Loci (QTL). In many instances, the complexity of the phenotype/genotype interaction and the general lack of clearly identifiable gene products render the direct molecular cloning approach ineffective, thus additional strategies like genome mapping are required to identify the QTL in question. Genome mapping requires no prior knowledge of the gene function, but utilizes statistical methods to identify the most likely gene location. To completely characterize genes of interest, the initially mapped region of a gene location will have to be narrowed down to a size that is suitable for cloning and sequencing. Strategies for gene identification within the critical region have to be applied after the sequencing of a potentially large clone or set of clones that contains this gene(s). Tremendous success of positional cloning has been shown for cloning many genes responsible for human diseases, including cystic fibrosis and muscular dystrophy as well as plant disease resistance genes. Genome and QTL mapping, positional cloning: the pre-genomics era, comparative approaches to gene identification, and positional cloning: the genomics era are discussed in the report. (M. Suetake)
B Kalyana Babu
Full Text Available The major limiting factor for production and productivity of finger millet crop is blast disease caused by Magnaporthe grisea. Since, the genome sequence information available in finger millet crop is scarce, comparative genomics plays a very important role in identification of genes/QTLs linked to the blast resistance genes using SSR markers. In the present study, a total of 58 genic SSRs were developed for use in genetic analysis of a global collection of 190 finger millet genotypes. The 58 SSRs yielded ninety five scorable alleles and the polymorphism information content varied from 0.186 to 0.677 at an average of 0.385. The gene diversity was in the range of 0.208 to 0.726 with an average of 0.487. Association mapping for blast resistance was done using 104 SSR markers which identified four QTLs for finger blast and one QTL for neck blast resistance. The genomic marker RM262 and genic marker FMBLEST32 were linked to finger blast disease at a P value of 0.007 and explained phenotypic variance (R² of 10% and 8% respectively. The genomic marker UGEP81 was associated to finger blast at a P value of 0.009 and explained 7.5% of R². The QTLs for neck blast was associated with the genomic SSR marker UGEP18 at a P value of 0.01, which explained 11% of R². Three QTLs for blast resistance were found common by using both GLM and MLM approaches. The resistant alleles were found to be present mostly in the exotic genotypes. Among the genotypes of NW Himalayan region of India, VHC3997, VHC3996 and VHC3930 were found highly resistant, which may be effectively used as parents for developing blast resistant cultivars in the NW Himalayan region of India. The markers linked to the QTLs for blast resistance in the present study can be further used for cloning of the full length gene, fine mapping and their further use in the marker assisted breeding programmes for introgression of blast resistant alleles into locally adapted cultivars.
Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen
MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.
Full Text Available MicroRNAs (miRNAs play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.
Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.
Olga V Popova
Full Text Available Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida and Pycnophyes kielensis (Allomalorhagida. Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even
Full Text Available Bipolar disorder is a common and severe mental illness with unsolved pathophysiology. A genome-wide association study (GWAS has been used to find a number of risk genes, but it is difficult for a GWAS to find genes indirectly associated with a disease. To find core hub genes, we introduce a network analysis after the GWAS was conducted. Six thousand four hundred fifty eight single nucleotide polymorphisms (SNPs with p < 0.01 were sifted out from Wellcome Trust Case Control Consortium (WTCCC dataset and mapped to 2045 genes, which are then compared with the protein–protein network. One hundred twelve genes with a degree >17 were chosen as hub genes from which five significant modules and four core hub genes (FBXL13, WDFY2, bFGF, and MTHFD1L were found. These core hub genes have not been reported to be directly associated with BD but may function by interacting with genes directly related to BD. Our method engenders new thoughts on finding genes indirectly associated with, but important for, complex diseases.
The Human Genome Project is a massive international research project, costing 3 to 5 billion dollars and expected to take 15 years, which will identify the all the genes in the human genome - i.e. the complete sequence of bases in human DNA. The prize will be the ability to identify genes causing or predisposing to disease, and in some cases the development of gene therapy, but this new knowledge will raise important ethical issues
Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda
The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis
Full Text Available WRKY, an important transcription factor family, is widely distributed in the plant kingdom. Many reports focused on analysis of phylogenetic relationship and biological function of WRKY protein at the whole genome level in different plant species. However, little is known about WRKY proteins in the genome of Arachis species and their response to salicylic acid (SA and jasmonic acid (JA treatment. In this study, we identified 77 and 75 WRKY proteins from the two wild ancestral diploid genomes of cultivated tetraploid peanut, Arachis duranensis and Arachis ipaënsis, using bioinformatics approaches. Most peanut WRKY coding genes were located on A. duranensis chromosome A6 and A. ipaënsis chromosome B3, while the least number of WRKY genes was found in chromosome 9. The WRKY orthologous gene pairs in A. duranensis and A. ipaënsis chromosomes were highly syntenic. Our analysis indicated that segmental duplication events played a major role in AdWRKY and AiWRKY genes, and strong purifying selection was observed in gene duplication pairs. Furthermore, we translate the knowledge gained from the genome-wide analysis result of wild ancestral peanut to cultivated peanut to reveal that gene activities of specific cultivated peanut WRKY gene were changed due to SA and JA treatment. Peanut WRKY7, 8 and 13 genes were down-regulated, whereas WRKY1 and 12 genes were up-regulated with SA and JA treatment. These results could provide valuable information for peanut improvement.
Song, Hui; Wang, Pengfei; Lin, Jer-Young; Zhao, Chuanzhi; Bi, Yuping; Wang, Xingjun
WRKY, an important transcription factor family, is widely distributed in the plant kingdom. Many reports focused on analysis of phylogenetic relationship and biological function of WRKY protein at the whole genome level in different plant species. However, little is known about WRKY proteins in the genome of Arachis species and their response to salicylic acid (SA) and jasmonic acid (JA) treatment. In this study, we identified 77 and 75 WRKY proteins from the two wild ancestral diploid genomes of cultivated tetraploid peanut, Arachis duranensis and Arachis ipaënsis, using bioinformatics approaches. Most peanut WRKY coding genes were located on A. duranensis chromosome A6 and A. ipaënsis chromosome B3, while the least number of WRKY genes was found in chromosome 9. The WRKY orthologous gene pairs in A. duranensis and A. ipaënsis chromosomes were highly syntenic. Our analysis indicated that segmental duplication events played a major role in AdWRKY and AiWRKY genes, and strong purifying selection was observed in gene duplication pairs. Furthermore, we translate the knowledge gained from the genome-wide analysis result of wild ancestral peanut to cultivated peanut to reveal that gene activities of specific cultivated peanut WRKY gene were changed due to SA and JA treatment. Peanut WRKY7, 8 and 13 genes were down-regulated, whereas WRKY1 and 12 genes were up-regulated with SA and JA treatment. These results could provide valuable information for peanut improvement.
Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad
Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.
Sánchez-Vallet, Andrea; Fouché, Simone; Fudal, Isabelle; Hartmann, Fanny E; Soyer, Jessica L; Tellier, Aurélien; Croll, Daniel
Filamentous pathogens, including fungi and oomycetes, pose major threats to global food security. Crop pathogens cause damage by secreting effectors that manipulate the host to the pathogen's advantage. Genes encoding such effectors are among the most rapidly evolving genes in pathogen genomes. Here, we review how the major characteristics of the emergence, function, and regulation of effector genes are tightly linked to the genomic compartments where these genes are located in pathogen genomes. The presence of repetitive elements in these compartments is associated with elevated rates of point mutations and sequence rearrangements with a major impact on effector diversification. The expression of many effectors converges on an epigenetic control mediated by the presence of repetitive elements. Population genomics analyses showed that rapidly evolving pathogens show high rates of turnover at effector loci and display a mosaic in effector presence-absence polymorphism among strains. We conclude that effective pathogen containment strategies require a thorough understanding of the effector genome biology and the pathogen's potential for rapid adaptation. Expected final online publication date for the Annual Review of Phytopathology Volume 56 is August 25, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Full Text Available Abstract Background Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classified with regard to their substrates. To identify such enzymes in prokaryotic genomes, different approaches based on similarity, enzyme profiles or patterns exist. Unfortunately, several problems persist in the annotation, classification and naming of these enzymes due mainly to some erroneous entries in databases, mistake propagation, absence of updating and disparity in function description. Description In order to improve the current annotation of oxidative stress subsystems, an innovative platform named OxyGene has been developed. It integrates an original database called OxyDB, holding thoroughly tested anchor-based signatures associated to subfamilies of oxidative stress enzymes, and a new anchor-driven annotator, for ab initio detection of ROS/RNS response genes. All complete Bacterial and Archaeal genomes have been re-annotated, and the results stored in the OxyGene repository can be interrogated via a Graphical User Interface. Conclusion OxyGene enables the exploration and comparative analysis of enzymes belonging to 37 detoxification subclasses in 664 microbial genomes. It proposes a new classification that improves both the ontology and the annotation of the detoxification subsystems in prokaryotic whole genomes, while discovering new ORFs and attributing precise function to hypothetical annotated proteins. OxyGene is freely available at: http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software
Gubala, Aneta J; Proll, David F; Barnard, Ross T; Cowled, Chris J; Crameri, Sandra G; Hyatt, Alex D; Boyle, David B
Viruses belonging to the family Rhabdoviridae infect a variety of different hosts, including insects, vertebrates and plants. Currently, there are approximately 200 ICTV-recognised rhabdoviruses isolated around the world. However, the majority remain poorly characterised and only a fraction have been definitively assigned to genera. The genomic and transcriptional complexity displayed by several of the characterised rhabdoviruses indicates large diversity and complexity within this family. To enable an improved taxonomic understanding of this family, it is necessary to gain further information about the poorly characterised members of this family. Here we present the complete genome sequence and predicted transcription strategy of Wongabel virus (WONV), a previously uncharacterised rhabdovirus isolated from biting midges (Culicoides austropalpalis) collected in northern Queensland, Australia. The 13,196 nucleotide genome of WONV encodes five typical rhabdovirus genes N, P, M, G and L. In addition, the WONV genome contains three genes located between the P and M genes (U1, U2, U3) and two open reading frames overlapping with the N and G genes (U4, U5). These five additional genes and their putative protein products appear to be novel, and their functions are unknown. Predictive analysis of the U5 gene product revealed characteristics typical of viroporins, and indicated structural similarities with the alpha-1 protein (putative viroporin) of viruses in the genus Ephemerovirus. Phylogenetic analyses of the N and G proteins of WONV indicated closest similarity with the avian-associated Flanders virus; however, the genomes of these two viruses are significantly diverged. WONV displays a novel and unique genome structure that has not previously been described for any animal rhabdovirus.
Full Text Available Plants have evolved an elaborate innate immune system against invading pathogens. Within this system, intracellular nucleotide-binding leucine-rich repeat (NLR immune receptors are known play critical roles in effector-triggered immunity (ETI plant defense. We performed genome-wide identification and classification of NLR-coding sequences from the genomes of pepper, tomato, and potato using fixed criteria. We then compared genomic duplication and evolution features. We identified intact 267, 443, and 755 NLR-encoding genes in tomato, potato, and pepper genomes, respectively. Phylogenetic analyses and classification of Solanaceae NLRs revealed that the majority of NLR super family members fell into 14 subgroups, including a TIR-NLR (TNL subgroup and 13 non-TNL subgroups. Specific subgroups have expanded in each genome, with the expansion in pepper showing subgroup-specific physical clusters. Comparative analysis of duplications showed distinct duplication patterns within pepper and among Solanaceae plants suggesting subgroup- or species-specific gene duplication events after speciation, resulting in divergent evolution. Taken together, genome-wide analyses of NLR family members provide insights into their evolutionary history in Solanaceae. These findings also provide important foundational knowledge for understanding NLR evolution and will empower broader characterization of disease resistance genes to be used for crop breeding.
Franck, E.; Hulsen, T.; Huynen, M.A.; Jong, de W.W.; Lunsen, N.H.; Madsen, O.
The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of
Thiesen, H-J; Steinbeck, F; Maruschke, M; Koczan, D; Ziems, B; Hakenberg, O W
Tumorigenic processes are understood to be driven by epi-/genetic and genomic alterations from single point mutations to chromosomal alterations such as insertions and deletions of nucleotides up to gains and losses of large chromosomal fragments including products of chromosomal rearrangements e.g. fusion genes and proteins. Overall comparisons of copy number alterations (CNAs) presented in 48 clear cell renal cell carcinoma (ccRCC) genomes resulted in ratios of gene losses versus gene gains between 26 ccRCC Fuhrman malignancy grades G1 (ratio 1.25) and 20 G3 (ratio 0.58). Gene losses and gains of 15762 CNA genes were mapped to 795 chromosomal cytoband loci including 280 KEGG pathways. CNAs were classified according to their contribution to Fuhrman tumour gradings G1 and G3. Gene gains and losses turned out to be highly structured processes in ccRCC genomes enabling the subclassification and stratification of ccRCC tumours in a genome-wide manner. CNAs of ccRCC seem to start with common tumour related gene losses flanked by CNAs specifying Fuhrman grade G1 losses and CNA gains favouring grade G3 tumours. The appearance of recurrent CNA signatures implies the presence of causal mechanisms most likely implicated in the pathogenesis and disease-outcome of ccRCC tumours distinguishing lower from higher malignant tumours. The diagnostic quality of initial 201 genes (108 genes supporting G1 and 93 genes G3 phenotypes) has been successfully validated on published Swiss data (GSE19949) leading to a restricted CNA gene set of 171 CNA genes of which 85 genes favour Fuhrman grade G1 and 86 genes Fuhrman grade G3. Regarding these gene sets overall survival decreased with the number of G3 related gene losses plus G3 related gene gains. CNA gene sets presented define an entry to a gene-directed and pathway-related functional understanding of ongoing copy number alterations within and between individual ccRCC tumours leading to CNA genes of prognostic and predictive value.
SHAO, Ming; XU, Tian-Rui; CHEN, Ce-Shi
Targeted genome editing technology has been widely used in biomedical studies. The CRISPR-associated RNA-guided endonuclease Cas9 has become a versatile genome editing tool. The CRISPR/Cas9 system is useful for studying gene function through efficient knock-out, knock-in or chromatin modification of the targeted gene loci in various cell types and organisms. It can be applied in a number of fields, such as genetic breeding, disease treatment and gene functional investigation. In this review, we introduce the most recent developments and applications, the challenges, and future directions of Cas9 in generating disease animal model. Derived from the CRISPR adaptive immune system of bacteria, the development trend of Cas9 will inevitably fuel the vital applications from basic research to biotechnology and biomedicine. PMID:27469250
Shao, Ming; Xu, Tian-Rui; Chen, Ce-Shi
Targeted genome editing technology has been widely used in biomedical studies. The CRISPR-associated RNA-guided endonuclease Cas9 has become a versatile genome editing tool. The CRISPR/Cas9 system is useful for studying gene function through efficient knock-out, knock-in or chromatin modification of the targeted gene loci in various cell types and organisms. It can be applied in a number of fields, such as genetic breeding, disease treatment and gene functional investigation. In this review, we introduce the most recent developments and applications, the challenges, and future directions of Cas9 in generating disease animal model. Derived from the CRISPR adaptive immune system of bacteria, the development trend of Cas9 will inevitably fuel the vital applications from basic research to biotechnology and bio-medicine.
Full Text Available Cytokinin oxidase/dehydrogenase (CKX; EC.22.214.171.124 regulates cytokinin (CK level in plants and plays an essential role in CK regulatory processes. CKX proteins are encoded by a small gene family with a varying number of members in different plants. In spite of their physiological importance, systematic analyses of SiCKX genes in foxtail millet have not yet been examined. In this paper, we report the genome wide isolation and characterization of SiCKXs using bioinformatic methods. A total of 11 members of the family were identified in the foxtail millet genome. SiCKX genes were distributed in seven chromosomes (chromosome 1, 3, 4, 5, 6, 7, and 11. The coding sequences of all the SiCKX genes were disrupted by introns, with numbers varying from one to four. These genes expanded in the genome mainly due to segmental duplication events. Multiple alignment and motif display results showed that all SiCKX proteins share FAD- and CK-binding domains. Putative cis-elements involved in Ca2 +-response, abiotic stress response, light and circadian rhythm regulation, disease resistance and seed development were present in the promoters of SiCKX genes. Expression data mining suggested that SiCKX genes have diverse expression patterns. Real-time PCR analysis indicated that all 11 SiCKX genes were up-regulated in embryos under 6-BA treatment, and some were NaCl or PEG inducible. Collectively, these results provide molecular insights into CKX research in plants.
Stenson, Peter D; Mort, Matthew; Ball, Edward V; Shaw, Katy; Phillips, Andrew; Cooper, David N
The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.
Mochizuki, Atsushi; Yahara, Koji; Kobayashi, Ichizo; Iwasa, Yoh
The evolution and maintenance of the phenomenon of postsegregational host killing or genetic addiction are paradoxical. In this phenomenon, a gene complex, once established in a genome, programs death of a host cell that has eliminated it. The intact form of the gene complex would survive in other members of the host population. It is controversial as to why these genetic elements are maintained, due to the lethal effects of host killing, or perhaps some other properties are beneficial to the host. We analyzed their population dynamics by analytical methods and computer simulations. Genetic addiction turned out to be advantageous to the gene complex in the presence of a competitor genetic element. The advantage is, however, limited in a population without spatial structure, such as that in a well-mixed liquid culture. In contrast, in a structured habitat, such as the surface of a solid medium, the addiction gene complex can increase in frequency, irrespective of its initial density. Our demonstration that genomes can evolve through acquisition of addiction genes has implications for the general question of how a genome can evolve as a community of potentially selfish genes.
Betancourt, Angela M; King, Adrienne L; Fetterman, Jessica L; Millender-Swain, Telisha; Finley, Rachel D; Oliva, Claudia R; Crowe, David R; Ballinger, Scott W; Bailey, Shannon M
NAFLD (non-alcoholic fatty liver disease) involves significant changes in liver metabolism characterized by oxidative stress, lipid accumulation and fibrogenesis. Mitochondrial dysfunction and bioenergetic defects also contribute to NAFLD. In the present study, we examined whether differences in mtDNA influence NAFLD. To determine the role of mitochondrial and nuclear genomes in NAFLD, MNX (mitochondrial-nuclear exchange) mice were fed an atherogenic diet. MNX mice have mtDNA from C57BL/6J mice on a C3H/HeN nuclear background and vice versa. Results from MNX mice were compared with wild-type C57BL/6J and C3H/HeN mice fed a control or atherogenic diet. Mice with the C57BL/6J nuclear genome developed more macrosteatosis, inflammation and fibrosis compared with mice containing the C3H/HeN nuclear genome when fed the atherogenic diet. These changes were associated with parallel alterations in inflammation and fibrosis gene expression in wild-type mice, with intermediate responses in MNX mice. Mice with the C57BL/6J nuclear genome had increased State 4 respiration, whereas MNX mice had decreased State 3 respiration and RCR (respiratory control ratio) when fed the atherogenic diet. Complex IV activity and most mitochondrial biogenesis genes were increased in mice with the C57BL/6J nuclear or mitochondrial genome, or both fed the atherogenic diet. These results reveal new interactions between mitochondrial and nuclear genomes and support the concept that mtDNA influences mitochondrial function and metabolic pathways implicated in NAFLD.
Full Text Available The CRISPR/Cas9 prokaryotic adaptive immune system and its swift repurposing for genome editing enables modification of any prespecified genomic sequence with unprecedented accuracy and efficiency, including targeted gene repair. We used the CRISPR/Cas9 system for targeted repair of patient-specific point mutations in the Cytochrome b-245 heavy chain gene (CYBB, whose inactivation causes chronic granulomatous disease (XCGD—a life-threatening immunodeficiency disorder characterized by the inability of neutrophils and macrophages to produce microbicidal reactive oxygen species (ROS. We show that frameshift mutations can be effectively repaired in hematopoietic cells by non-integrating lentiviral vectors carrying RNA-guided Cas9 endonucleases (RGNs. Because about 25% of most inherited blood disorders are caused by frameshift mutations, our results suggest that up to a quarter of all patients suffering from monogenic blood disorders could benefit from gene therapy employing personalized, donor template-free RGNs.
Full Text Available The zebrafish (Danio rerio is an ideal vertebrate model to investigate the developmental molecular mechanism of organogenesis and regeneration. Recent innovation in genome editing technologies, such as zinc finger nucleases (ZFNs, transcription activator-like effector nucleases (TALENs and the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR associated protein 9 (Cas9 system, have allowed researchers to generate diverse genomic modifications in whole animals and in cultured cells. The CRISPR/Cas9 and TALEN techniques frequently induce DNA double-strand breaks (DSBs at the targeted gene, resulting in frameshift-mediated gene disruption. As a useful application of genome editing technology, several groups have recently reported efficient site-specific integration of exogenous genes into targeted genomic loci. In this review, we provide an overview of TALEN- and CRISPR/Cas9-mediated site-specific integration of exogenous genes in zebrafish.
Wu, Chen; Twort, Victoria G; Crowhurst, Ross N; Newcomb, Richard D; Buckley, Thomas R
Stick insects (Phasmatodea) have a high incidence of parthenogenesis and other alternative reproductive strategies, yet the genetic basis of reproduction is poorly understood. Phasmatodea includes nearly 3000 species, yet only the genome of Timema cristinae has been published to date. Clitarchus hookeri is a geographical parthenogenetic stick insect distributed across New Zealand. Sexual reproduction dominates in northern habitats but is replaced by parthenogenesis in the south. Here, we present a de novo genome assembly of a female C. hookeri and use it to detect candidate genes associated with gamete production and development in females and males. We also explore the factors underlying large genome size in stick insects. The C. hookeri genome assembly was 4.2 Gb, similar to the flow cytometry estimate, making it the second largest insect genome sequenced and assembled to date. Like the large genome of Locusta migratoria, the genome of C. hookeri is also highly repetitive and the predicted gene models are much longer than those from most other sequenced insect genomes, largely due to longer introns. Miniature inverted repeat transposable elements (MITEs), absent in the much smaller T. cristinae genome, is the most abundant repeat type in the C. hookeri genome assembly. Mapping RNA-Seq reads from female and male gonadal transcriptomes onto the genome assembly resulted in the identification of 39,940 gene loci, 15.8% and 37.6% of which showed female-biased and male-biased expression, respectively. The genes that were over-expressed in females were mostly associated with molecular transportation, developmental process, oocyte growth and reproductive process; whereas, the male-biased genes were enriched in rhythmic process, molecular transducer activity and synapse. Several genes involved in the juvenile hormone synthesis pathway were also identified. The evolution of large insect genomes such as L. migratoria and C. hookeri genomes is most likely due to the
Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V.
A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen – a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars – Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. PMID:26456591
Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David
Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...
Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A
Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. firstname.lastname@example.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com
Liu, Bin; Jin, Min; Zeng, Pan
The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015 Elsevier Inc. All rights reserved.
Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.
Full Text Available Abstract Background Plant mitochondrial genomes are known for their complexity, and there is abundant evidence demonstrating that this organelle is important for plant sexual reproduction. Cytoplasmic male sterility (CMS is a phenomenon caused by incompatibility between the nucleus and mitochondria that has been discovered in various plant species. As the exact sequence of steps leading to CMS has not yet been revealed, efforts should be made to elucidate the factors underlying the mechanism of this important trait for crop breeding. Results Two CMS mitochondrial genomes, LD-CMS, derived from Oryza sativa L. ssp. indica (434,735 bp, and CW-CMS, derived from Oryza rufipogon Griff. (559,045 bp, were newly sequenced in this study. Compared to the previously sequenced Nipponbare (Oryza sativa L. ssp. japonica mitochondrial genome, the presence of 54 out of 56 protein-encoding genes (including pseudo-genes, 22 tRNA genes (including pseudo-tRNAs, and three rRNA genes was conserved. Two other genes were not present in the CW-CMS mitochondrial genome, and one of them was present as part of the newly identified chimeric ORF, CW-orf307. At least 12 genomic recombination events were predicted between the LD-CMS mitochondrial genome and Nipponbare, and 15 between the CW-CMS genome and Nipponbare, and novel genetic structures were formed by these genomic rearrangements in the two CMS lines. At least one of the genomic rearrangements was completely unique to each CMS line and not present in 69 rice cultivars or 9 accessions of O. rufipogon. Conclusion Our results demonstrate novel mitochondrial genomic rearrangements that are unique in CMS cytoplasm, and one of the genes that is unique in the CW mitochondrial genome, CW-orf307, appeared to be the candidate most likely responsible for the CW-CMS event. Genomic rearrangements were dynamic in the CMS lines in comparison with those of rice cultivars, suggesting that 'death' and possible 'birth' processes of the
Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism. PMID:21816042
Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong
This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.
Demirci, Selami; Uchida, Naoya; Tisdale, John F
Sickle cell disease (SCD) is one of the most common life-threatening monogenic diseases affecting millions of people worldwide. Allogenic hematopietic stem cell transplantation is the only known cure for the disease with high success rates, but the limited availability of matched sibling donors and the high risk of transplantation-related side effects force the scientific community to envision additional therapies. Ex vivo gene therapy through globin gene addition has been investigated extensively and is currently being tested in clinical trials that have begun reporting encouraging data. Recent improvements in our understanding of the molecular pathways controlling mammalian erythropoiesis and globin switching offer new and exciting therapeutic options. Rapid and substantial advances in genome engineering tools, particularly CRISPR/Cas9, have raised the possibility of genetic correction in induced pluripotent stem cells as well as patient-derived hematopoietic stem and progenitor cells. However, these techniques are still in their infancy, and safety/efficacy issues remain that must be addressed before translating these promising techniques into clinical practice. Published by Elsevier Inc.
Seah, Yu Fen Samantha; EL Farran, Chadi A.; Warrier, Tushar; Xu, Jian; Loh, Yuin-Han
Embryonic stem cells (ESCs) are chiefly characterized by their ability to self-renew and to differentiate into any cell type derived from the three main germ layers. It was demonstrated that somatic cells could be reprogrammed to form induced pluripotent stem cells (iPSCs) via various strategies. Gene editing is a technique that can be used to make targeted changes in the genome, and the efficiency of this process has been significantly enhanced by recent advancements. The use of engineered endonucleases, such as homing endonucleases, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Cas9 of the CRISPR system, has significantly enhanced the efficiency of gene editing. The combination of somatic cell reprogramming with gene editing enables us to model human diseases in vitro, in a manner considered superior to animal disease models. In this review, we discuss the various strategies of reprogramming and gene targeting with an emphasis on the current advancements and challenges of using these techniques to model human diseases. PMID:26633382
Yu Fen Samantha Seah
Full Text Available Embryonic stem cells (ESCs are chiefly characterized by their ability to self-renew and to differentiate into any cell type derived from the three main germ layers. It was demonstrated that somatic cells could be reprogrammed to form induced pluripotent stem cells (iPSCs via various strategies. Gene editing is a technique that can be used to make targeted changes in the genome, and the efficiency of this process has been significantly enhanced by recent advancements. The use of engineered endonucleases, such as homing endonucleases, zinc finger nucleases (ZFNs, transcription activator-like effector nucleases (TALENs and Cas9 of the CRISPR system, has significantly enhanced the efficiency of gene editing. The combination of somatic cell reprogramming with gene editing enables us to model human diseases in vitro, in a manner considered superior to animal disease models. In this review, we discuss the various strategies of reprogramming and gene targeting with an emphasis on the current advancements and challenges of using these techniques to model human diseases.
Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A
Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.
Full Text Available Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera. The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.
Periwal, Vinita; Scaria, Vinod
Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Scala, Valeria; Grottoli, Alessandro; Aiese Cigliano, Riccardo; Anzar, Irantzu; Beccaccioli, Marzia; Fanelli, Corrado; Dall'Asta, Chiara; Battilani, Paola; Reverberi, Massimo; Sanseverino, Walter
Fusarium verticillioides causes ear rot disease in maize and its contamination with fumonisins, mycotoxins harmful for humans and livestock. Lipids, and their oxidized forms, may drive the fate of this disease. In a previous study, we have explored the role of oxylipins in this interaction by deleting by standard transformation procedures a linoleate diol synthase-coding gene, lds1 , in F. verticillioides . A profound phenotypic diversity in the mutants generated has prompted us to investigate more deeply the whole genome of two lds1 -deleted strains. Bioinformatics analyses pinpoint significant differences in the genome sequences emerged between the wild type and the lds1 -mutants further than those trivially attributable to the deletion of the lds1 locus, such as single nucleotide polymorphisms, small deletion/insertion polymorphisms and structural variations. Results suggest that the effect of a (theoretically) punctual transformation event might have enhanced the natural mechanisms of genomic variability and that transformation practices, commonly used in the reverse genetics of fungi, may potentially be responsible for unexpected, stochastic and henceforth off-target rearrangements throughout the genome.
Full Text Available Fusarium verticillioides causes ear rot disease in maize and its contamination with fumonisins, mycotoxins harmful for humans and livestock. Lipids, and their oxidized forms, may drive the fate of this disease. In a previous study, we have explored the role of oxylipins in this interaction by deleting by standard transformation procedures a linoleate diol synthase-coding gene, lds1, in F. verticillioides. A profound phenotypic diversity in the mutants generated has prompted us to investigate more deeply the whole genome of two lds1-deleted strains. Bioinformatics analyses pinpoint significant differences in the genome sequences emerged between the wild type and the lds1-mutants further than those trivially attributable to the deletion of the lds1 locus, such as single nucleotide polymorphisms, small deletion/insertion polymorphisms and structural variations. Results suggest that the effect of a (theoretically punctual transformation event might have enhanced the natural mechanisms of genomic variability and that transformation practices, commonly used in the reverse genetics of fungi, may potentially be responsible for unexpected, stochastic and henceforth off-target rearrangements throughout the genome.
Adomas Aleksandra B
Full Text Available Abstract Background Complementary approaches to assaying global gene expression are needed to assess gene expression in regions that are poorly assayed by current methodologies. A key component of nearly all gene expression assays is the reverse transcription of transcribed sequences that has traditionally been performed by priming the poly-A tails on many of the transcribed genes in eukaryotes with oligo-dT, or by priming RNA indiscriminately with random hexamers. We designed an algorithm to find common sequence motifs that were present within most protein-coding genes of Saccharomyces cerevisiae and of Neurospora crassa, but that were not present within their ribosomal RNA or transfer RNA genes. We then experimentally tested whether degenerately priming these motifs with multi-targeted primers improved the accuracy and completeness of transcriptomic assays. Results We discovered two multi-targeted primers that would prime a preponderance of genes in the genomes of Saccharomyces cerevisiae and Neurospora crassa while avoiding priming ribosomal RNA or transfer RNA. Examining the response of Saccharomyces cerevisiae to nitrogen deficiency and profiling Neurospora crassa early sexual development, we demonstrated that using multi-targeted primers in reverse transcription led to superior performance of microarray profiling and next-generation RNA tag sequencing. Priming with multi-targeted primers in addition to oligo-dT resulted in higher sensitivity, a larger number of well-measured genes and greater power to detect differences in gene expression. Conclusions Our results provide the most complete and detailed expression profiles of the yeast nitrogen starvation response and N. crassa early sexual development to date. Furthermore, our multi-targeting priming methodology for genome-wide gene expression assays provides selective targeting of multiple sequences and counter-selection against undesirable sequences, facilitating a more complete and
Full Text Available Extracellular electron transfer (EET is recognized as a key biochemical process in circumneutral pH Fe(II-oxidizing bacteria (FeOB. In this study, we searched for candidate EET genes in 73 neutrophilic FeOB genomes, among which 43 genomes are complete or close-to-complete and the rest have estimated genome completeness ranging from 5 to 91%. These neutrophilic FeOB span members of the microaerophilic, anaerobic phototrophic, and anaerobic nitrate-reducing FeOB groups. We found that many microaerophilic and several anaerobic FeOB possess homologs of Cyc2, an outer membrane cytochrome c originally identified in Acidithiobacillus ferrooxidans. The “porin-cytochrome c complex” (PCC gene clusters homologous to MtoAB/PioAB are present in eight FeOB, accounting for 19% of complete and close-to-complete genomes examined, whereas PCC genes homologous to OmbB-OmaB-OmcB in Geobacter sulfurreducens are absent. Further, we discovered gene clusters that may potentially encode two novel PCC types. First, a cluster (tentatively named “PCC3” encodes a porin, an extracellular and a periplasmic cytochrome c with remarkably large numbers of heme-binding motifs. Second, a cluster (tentatively named “PCC4” encodes a porin and three periplasmic multiheme cytochromes c. A conserved inner membrane protein (IMP encoded in PCC3 and PCC4 gene clusters might be responsible for translocating electrons across the inner membrane. Other bacteria possessing PCC3 and PCC4 are mostly Proteobacteria isolated from environments with a potential niche for Fe(II oxidation. In addition to cytochrome c, multicopper oxidase (MCO genes potentially involved in Fe(II oxidation were also identified. Notably, candidate EET genes were not found in some FeOB, especially the anaerobic ones, probably suggesting EET genes or Fe(II oxidation mechanisms are different from the searched models. Overall, based on current EET models, the search extends our understanding of bacterial EET and
Jeffrey J Coleman
Full Text Available The ascomycetous fungus Nectria haematococca, (asexual name Fusarium solani, is a member of a group of >50 species known as the "Fusarium solani species complex". Members of this complex have diverse biological properties including the ability to cause disease on >100 genera of plants and opportunistic infections in humans. The current research analyzed the most extensively studied member of this complex, N. haematococca mating population VI (MPVI. Several genes controlling the ability of individual isolates of this species to colonize specific habitats are located on supernumerary chromosomes. Optical mapping revealed that the sequenced isolate has 17 chromosomes ranging from 530 kb to 6.52 Mb and that the physical size of the genome, 54.43 Mb, and the number of predicted genes, 15,707, are among the largest reported for ascomycetes. Two classes of genes have contributed to gene expansion: specific genes that are not found in other fungi including its closest sequenced relative, Fusarium graminearum; and genes that commonly occur as single copies in other fungi but are present as multiple copies in N. haematococca MPVI. Some of these additional genes appear to have resulted from gene duplication events, while others may have been acquired through horizontal gene transfer. The supernumerary nature of three chromosomes, 14, 15, and 17, was confirmed by their absence in pulsed field gel electrophoresis experiments of some isolates and by demonstrating that these isolates lacked chromosome-specific sequences found on the ends of these chromosomes. These supernumerary chromosomes contain more repeat sequences, are enriched in unique and duplicated genes, and have a lower G+C content in comparison to the other chromosomes. Although the origin(s of the extra genes and the supernumerary chromosomes is not known, the gene expansion and its large genome size are consistent with this species' diverse range of habitats. Furthermore, the presence of unique
Coleman, J.J.; Rounsley, S.D.; Rodriguez-Carres, M.; Kuo, A.; Wasmann, C.c.; Grimwood, J.; Schmutz, J.; Taga, M.; White, G.J.; Zhuo, S.; Schwartz, D.C.; Freitag, M.; Ma, L.-J.; Danchin, E.G.J.; Henrissat, B.; Cutinho, P.M.; Nelson, D.R.; Straney, D.; Napoli, C.A.; Baker, B.M.; Gribskov, M.; Rep, M.; Kroken, S.; Molnar, I.; Rensing, C.; Kennell, J.C.; Zamora, J.; Farman, M.L.; Selker, E.U.; Salamov, A.; Shapiro, H.; Pangilinan, J.; Lindquist, E.; Lamers, C.; Grigoriev, I.V.; Geiser, D.M.; Covert, S.F.; Temporini, S.; VanEtten, H.D.
The ascomycetous fungus Nectria haematococca, (asexual name Fusarium solani), is a member of a group of .50 species known as the"Fusarium solani species complex". Members of this complex have diverse biological properties including the ability to cause disease on .100 genera of plants and opportunistic infections in humans. The current research analyzed the most extensively studied member of this complex, N. haematococca mating population VI (MPVI). Several genes controlling the ability of individual isolates of this species to colonize specific habitats are located on supernumerary chromosomes. Optical mapping revealed that the sequenced isolate has 17 chromosomes ranging from 530 kb to 6.52 Mb and that the physical size of the genome, 54.43 Mb, and the number of predicted genes, 15,707, are among the largest reported for ascomycetes. Two classes of genes have contributed to gene expansion: specific genes that are not found in other fungi including its closest sequenced relative, Fusarium graminearum; and genes that commonly occur as single copies in other fungi but are present as multiple copies in N. haematococca MPVI. Some of these additional genes appear to have resulted from gene duplication events, while others may have been acquired through horizontal gene transfer. The supernumerary nature of three chromosomes, 14, 15, and 17, was confirmed by their absence in pulsed field gel electrophoresis experiments of some isolates and by demonstrating that these isolates lacked chromosome-specific sequences found on the ends of these chromosomes. These supernumerary chromosomes contain more repeat sequences, are enriched in unique and duplicated genes, and have a lower G+C content in comparison to the other chromosomes. Although the origin(s) of the extra genes and the supernumerary chromosomes is not known, the gene expansion and its large genome size are consistent with this species' diverse range of habitats. Furthermore, the presence of unique genes on
Steven W Cole
Full Text Available A growing literature in human social genomics has begun to analyze how everyday life circumstances influence human gene expression. Social-environmental conditions such as urbanity, low socioeconomic status, social isolation, social threat, and low or unstable social status have been found to associate with differential expression of hundreds of gene transcripts in leukocytes and diseased tissues such as metastatic cancers. In leukocytes, diverse types of social adversity evoke a common conserved transcriptional response to adversity (CTRA characterized by increased expression of proinflammatory genes and decreased expression of genes involved in innate antiviral responses and antibody synthesis. Mechanistic analyses have mapped the neural "social signal transduction" pathways that stimulate CTRA gene expression in response to social threat and may contribute to social gradients in health. Research has also begun to analyze the functional genomics of optimal health and thriving. Two emerging opportunities now stand to revolutionize our understanding of the everyday life of the human genome: network genomics analyses examining how systems-level capabilities emerge from groups of individual socially sensitive genomes and near-real-time transcriptional biofeedback to empirically optimize individual well-being in the context of the unique genetic, geographic, historical, developmental, and social contexts that jointly shape the transcriptional realization of our innate human genomic potential for thriving.
Leila do Nascimento Vieira
Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of
DAVIS J M
Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.
Full Text Available Abstract Background The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally. Results Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent. Conclusion The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed.
Bryant Susan V
Full Text Available Abstract Background The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp were isolated and sequenced to characterize the structure of genic regions. Results Annotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86% of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! Conclusion This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.
Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J
Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and
Lau, Cia-Hin; Suh, Yousin
The recent advent of genome and epigenome editing technologies has provided a new paradigm in which the landscape of the human genome and epigenome can be precisely manipulated in their native context. Genome and epigenome editing technologies can be applied to many aspects of aging research and offer the potential to develop novel therapeutics against age-related diseases. Here, we discuss the latest technological advances in the CRISPR-based genome and epigenome editing toolbox, and provide insight into how these synthetic biology tools could facilitate aging research by establishing in vitro cell and in vivo animal models to dissect genetic and epigenetic mechanisms underlying aging and age-related diseases. We discuss recent developments in the field with the aims to precisely modulate gene expression and dynamic epigenetic landscapes in a spatial and temporal manner in cellular and animal models, by complementing the CRISPR-based editing capability with conditional genetic manipulation tools including chemically inducible expression systems, optogenetics, logic gate genetic circuits, tissue-specific promoters, and the serotype-specific adeno-associated virus. We also discuss how the combined use of genome and epigenome editing tools permits investigators to uncover novel molecular pathways involved in the pathophysiology and etiology conferred by risk variants associated with aging and aging-related disease. A better understanding of the genetic and epigenetic regulatory mechanisms underlying human aging and age-related disease will significantly contribute to the developments of new therapeutic interventions for extending health span and life span, ultimately improving the quality of life in the elderly populations. © 2016 S. Karger AG, Basel.
Joanna L Davies
Full Text Available Genome-wide association study (GWAS data on a disease are increasingly available from multiple related populations. In this scenario, meta-analyses can improve power to detect homogeneous genetic associations, but if there exist ancestry-specific effects, via interactions on genetic background or with a causal effect that co-varies with genetic background, then these will typically be obscured. To address this issue, we have developed a robust statistical method for detecting susceptibility gene-ancestry interactions in multi-cohort GWAS based on closely-related populations. We use the leading principal components of the empirical genotype matrix to cluster individuals into "ancestry groups" and then look for evidence of heterogeneous genetic associations with disease or other trait across these clusters. Robustness is improved when there are multiple cohorts, as the signal from true gene-ancestry interactions can then be distinguished from gene-collection artefacts by comparing the observed interaction effect sizes in collection groups relative to ancestry groups. When applied to colorectal cancer, we identified a missense polymorphism in iron-absorption gene CYBRD1 that associated with disease in individuals of English, but not Scottish, ancestry. The association replicated in two additional, independently-collected data sets. Our method can be used to detect associations between genetic variants and disease that have been obscured by population genetic heterogeneity. It can be readily extended to the identification of genetic interactions on other covariates such as measured environmental exposures. We envisage our methodology being of particular interest to researchers with existing GWAS data, as ancestry groups can be easily defined and thus tested for interactions.
Tarbell, John M; Shi, Zhong-Dong; Dunn, Jessilyn; Jo, Hanjoong
This review places modern research developments in vascular mechanobiology in the context of hemodynamic phenomena in the cardiovascular system and the discrete localization of vascular disease. The modern origins of this field are traced, beginning in the 1960s when associations between flow characteristics, particularly blood flow-induced wall shear stress, and the localization of atherosclerotic plaques were uncovered, and continuing to fluid shear stress effects on the vascular lining endothelial) cells (ECs), including their effects on EC morphology, biochemical production, and gene expression. The earliest single-gene studies and genome-wide analyses are considered. The final section moves from the ECs lining the vessel wall to the smooth muscle cells and fibroblasts within the wall that are fluid me chanically activated by interstitial flow that imposes shear stresses on their surfaces comparable with those of flowing blood on EC surfaces. Interstitial flow stimulates biochemical production and gene expression, much like blood flow on ECs.
Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J
Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.
Strebel, K.; Beck, E.; Strohmaier, K.; Schaller, H.
Defined segments of the cloned foot-and-mouth disease virus genome corresponding to all parts of the coding region were expressed in Escherichia coli as fusions to the N-terminal part of the MS2-polymerase gene under the control of the inducible λPL promoter. All constructs yielded large amounts of proteins, which were purified and used to raise sequence-specific antisera in rabbits. These antisera were used to identify the corresponding viral gene products in 35 S-labeled extracts from foot-and-mouth disease virus-infected BHK cells. This allowed us to locate unequivocally all mature foot-and-mouth disease virus gene products in the nucleotide sequence, to identify precursor-product relationships, and to detect several foot-and mouth disease virus gene products not previously identified in vivo or in vitro
Ji, Yan; Shi, Yixiang; Wang, Chuan; Dai, Jianliang; Li, Yixue
The human gut microbial ecosystem (HGME) exerts an important influence on the human health. In recent researches, meta-genomics provided deep insights into the HGME in terms of gene contents, metabolic processes and genome constitutions of meta-genome. Here we present a novel methodology to investigate the HGME on the basis of a set of functionally coupled genes regardless of their genome origins when considering the co-evolution properties of genes. By analyzing these coupled genes, we showed some basic properties of HGME significantly associated with each other, and further constructed a protein interaction map of human gut meta-genome to discover some functional modules that may relate with essential metabolic processes. Compared with other studies, our method provides a new idea to extract basic function elements from meta-genome systems and investigate complex microbial environment by associating its biological traits with co-evolutionary fingerprints encoded in it.
Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun
Background A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. Results From the pa...
Full Text Available Based on Darwin’s concept of the tree of life, vertical inheritance was thought to be dominant, and mutations, deletions and duplication were streaming the genomes of living organisms. In the current genomic era, increasing data indicated that both vertical and lateral gene inheritance interact in space and time to trigger genome evolution, particularly among microorganisms sharing a given ecological niche. As a paradigm to their diversity and their survival in a variety of cell types, intracellular microorganisms, and notably intracellular bacteria, were considered as less prone to lateral genetic exchanges. Such specialized microorganisms generally have a smaller gene repertoire because they do rely on their host’s factors for some basic regulatory and metabolic functions. Here we review events of lateral gene transfer (LGT that illustrate the genetic exchanges among intra-amoebal microorganisms or between the microorganism and its amoebal host. We tentatively investigate the functions of laterally transferred genes in the light of the interaction with their host as they should confer a selective advantage and success to the amoeba-resisting microorganisms.
Full Text Available The lack of knowledge about the earliest events in disease development is due to the multi-factorial nature of disease risk. This information gap is the consequence of the lack of appreciation for the fact that most diseases arise from the complex interactions between genes and the environment as a function of the age or stage of development of the individual. Whether an environmental exposure causes illness or not is dependent on the efficiency of the so-called Ã¢Â€Âœenvironmental response machineryÃ¢Â€Â (i.e., the complex of metabolic pathways that can modulate response to environmental perturbations that one has inherited. Thus, elucidating the causes of most chronic diseases will require an understanding of both the genetic and environmental contribution to their etiology. Unfortunately, the exploration of the relationship between genes and the environment has been hampered in the past by the limited knowledge of the human genome, and by the inclination of scientists to study disease development using experimental models that consider exposure to a single environmental agent. Rarely in the past were interactions between multiple genes or between genes and environmental agents considered in studies of human disease etiology. The most critical issue is how to relate exposure-disease association studies to pathways and mechanisms. To understand how genes and environmental factors interact to perturb biological pathways to cause injury or disease, scientists will need tools with the capacity to monitor the global expression of thousands of genes, proteins and metabolites simultaneously. The generation of such data in multiple species can be used to identify conserved and functionally significant genes and pathways involved in geneenvironment interactions. Ultimately, it is this knowledge that will be used to guide agencies such as the U.S. Department of Health and Human Services in decisions regarding biomedical research funding
Richards, Stephen; Liu, Yue; Bettencourt, Brian R.
years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences......We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each...... between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence...
Adam Alexander Thil Smith
Full Text Available Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes, a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short. The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.
Ling, Jian; Jiang, Weijie; Zhang, Ying; Yu, Hongjun; Mao, Zhenchuan; Gu, Xingfang; Huang, Sanwen; Xie, Bingyan
WRKY proteins are a large family of transcriptional regulators in higher plant. They are involved in many biological processes, such as plant development, metabolism, and responses to biotic and abiotic stresses. Prior to the present study, only one full-length cucumber WRKY protein had been reported. The recent publication of the draft genome sequence of cucumber allowed us to conduct a genome-wide search for cucumber WRKY proteins, and to compare these positively identified proteins with their homologs in model plants, such as Arabidopsis. We identified a total of 55 WRKY genes in the cucumber genome. According to structural features of their encoded proteins, the cucumber WRKY (CsWRKY) genes were classified into three groups (group 1-3). Analysis of expression profiles of CsWRKY genes indicated that 48 WRKY genes display differential expression either in their transcript abundance or in their expression patterns under normal growth conditions, and 23 WRKY genes were differentially expressed in response to at least one abiotic stresses (cold, drought or salinity). The expression profile of stress-inducible CsWRKY genes were correlated with those of their putative Arabidopsis WRKY (AtWRKY) orthologs, except for the group 3 WRKY genes. Interestingly, duplicated group 3 AtWRKY genes appear to have been under positive selection pressure during evolution. In contrast, there was no evidence of recent gene duplication or positive selection pressure among CsWRKY group 3 genes, which may have led to the expressional divergence of group 3 orthologs. Fifty-five WRKY genes were identified in cucumber and the structure of their encoded proteins, their expression, and their evolution were examined. Considering that there has been extensive expansion of group 3 WRKY genes in angiosperms, the occurrence of different evolutionary events could explain the functional divergence of these genes.
de Graaf Dirk C
Full Text Available Abstract Background As scientists continue to pursue various 'omics-based research, there is a need for high quality data for the most fundamental 'omics of all: genomics. The bacterium Paenibacillus larvae is the causative agent of the honey bee disease American foulbrood. If untreated, it can lead to the demise of an entire hive; the highly social nature of bees also leads to easy disease spread, between both individuals and colonies. Biologists have studied this organism since the early 1900s, and a century later, the molecular mechanism of infection remains elusive. Transcriptomics and proteomics, because of their ability to analyze multiple genes and proteins in a high-throughput manner, may be very helpful to its study. However, the power of these methodologies is severely limited without a complete genome; we undertake to address that deficiency here. Results We used the Illumina GAIIx platform and conventional Sanger sequencing to generate a 182-fold sequence coverage of the P. larvae genome, and assembled the data using ABySS into a total of 388 contigs spanning 4.5 Mbp. Comparative genomics analysis against fully-sequenced soil bacteria P. JDR2 and P. vortex showed that regions of poor conservation may contain putative virulence factors. We used GLIMMER to predict 3568 gene models, and named them based on homology revealed by BLAST searches; proteases, hemolytic factors, toxins, and antibiotic resistance enzymes were identified in this way. Finally, mass spectrometry was used to provide experimental evidence that at least 35% of the genes are expressed at the protein level. Conclusions This update on the genome of P. larvae and annotation represents an immense advancement from what we had previously known about this species. We provide here a reliable resource that can be used to elucidate the mechanism of infection, and by extension, more effective methods to control and cure this widespread honey bee disease.
Chen, Xinwei; Hedley, P.E.; Morris, J.; Liu, Hui; Niks, R.E.; Waugh, R.
Positional gene isolation in unsequenced species generally requires either a reference genome sequence or an inference of gene content based on conservation of synteny with a genomic model. In the large unsequenced genomes of the Triticeae cereals the latter, i.e. conservation of synteny with the
Full Text Available Cross-species translation of genomic information may play a pivotal role in applying biological knowledge gained from relatively simple model system to other less studied, but related, genomes. The information of abiotic stress (ABS-responsive genes in Arabidopsis was identified and translated into the legume model system, Medicago truncatula. Various data resources, such as TAIR/AtGI DB, expression profiles and literatures, were used to build a genome-wide list of ABS genes. tBlastX/BlastP similarity search tools and manual inspection of alignments were used to identify orthologous genes between the two genomes. A total of 1,377 genes were finally collected and classified into 18 functional criteria of gene ontology (GO. The data analysis according to the expression cues showed that there was substantial level of interaction among three major types (i.e., drought, salinity and cold stress of abiotic stresses. In an attempt to translate the ABS genes between these two species, genomic locations for each gene were mapped using an in-house-developed comparative analysis platform. The comparative analysis revealed that fragmental colinearity, represented by only 37 synteny blocks, existed between Arabidopsis and M. truncatula. Based on the combination of E-value and alignment remarks, estimated translation rate was 60.2% for this cross-family translation. As a prelude of the functional comparative genomic approaches, in-silico gene network/interactome analyses were conducted to predict key components in the ABS responses, and one of the sub-networks was integrated with corresponding comparative map. The results demonstrated that core members of the sub-network were well aligned with previously reported ABS regulatory networks. Taken together, the results indicate that network-based integrative approaches of comparative and functional genomics are important to interpret and translate genomic information for complex traits such as abiotic stresses.
Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David
://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...
Table of contents. India, Genomic diversity & Disease susceptibility · India, a paradise for Genetic Studies · Involved in earlier stages of Immune response protecting us from Diseases, Responsible for kidney and other transplant rejections Inherited from our parents · PowerPoint Presentation · Slide 5 · Slide 6 · Slide 7.
Full Text Available Genome-wide association studies (GWAS have successfully identified a number of single-nucleotide polymorphisms (SNPs associated with colorectal cancer (CRC risk. However, these susceptibility loci known today explain only a small fraction of the genetic risk. Gene-gene interaction (GxG is considered to be one source of the missing heritability. To address this, we performed a genome-wide search for pair-wise GxG associated with CRC risk using 8,380 cases and 10,558 controls in the discovery phase and 2,527 cases and 2,658 controls in the replication phase. We developed a simple, but powerful method for testing interaction, which we term the Average Risk Due to Interaction (ARDI. With this method, we conducted a genome-wide search to identify SNPs showing evidence for GxG with previously identified CRC susceptibility loci from 14 independent regions. We also conducted a genome-wide search for GxG using the marginal association screening and examining interaction among SNPs that pass the screening threshold (p<10(-4. For the known locus rs10795668 (10p14, we found an interacting SNP rs367615 (5q21 with replication p = 0.01 and combined p = 4.19×10(-8. Among the top marginal SNPs after LD pruning (n = 163, we identified an interaction between rs1571218 (20p12.3 and rs10879357 (12q21.1 (nominal combined p = 2.51×10(-6; Bonferroni adjusted p = 0.03. Our study represents the first comprehensive search for GxG in CRC, and our results may provide new insight into the genetic etiology of CRC.
Singh, Himanshu Narayan; Rajeswari, Moganty R
Purine repeat sequences present in a gene are unique as they have high propensity to form unusual DNA-triple helix structures. Friedreich's ataxia is the only human disease that is well known to be associated with DNA-triplexes formed by purine repeats. The purpose of this study was to recognize the expanded purine repeats (EPRs) in human genome and find their correlation with cancer pathogenesis. We developed "PuRepeatFinder.pl" algorithm to identify non-overlapping EPRs without pyrimidine interruptions in the human genome and customized for searching repeat lengths, n ≥ 200. A total of 1158 EPRs were identified in the genome which followed Wakeby distribution. Two hundred and ninety-six EPRs were found in geneic regions of 282 genes (EPR-genes). Gene clustering of EPR-genes was done based on their cellular function and a large number of EPR-genes were found to be enzymes/enzyme modulators. Meta-analysis of 282 EPR-genes identified only 63 EPR-genes in association with cancer, mostly in breast, lung, and blood cancers. Protein-protein interaction network analysis of all 282 EPR-genes identified proteins including those in cadherins and VEGF. The two observations, that EPRs can induce mutations under malignant conditions and that identification of some EPR-gene products in vital cell signaling-mediated pathways, together suggest the crucial role of EPRs in carcinogenesis. The new link between EPR-genes and their functionally interacting proteins throws a new dimension in the present understanding of cancer pathogenesis and can help in planning therapeutic strategies. Validation of present results using techniques like NGS is required to establish the role of the EPR genes in cancer pathology.
Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard; Hertweck, Christian; Linde, J?rg
Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies.
Full Text Available BACKGROUND: The concept of ribosomal constraints on rRNA genes is deduced primarily based on the comparison of consensus rRNA sequences between closely related species, but recent advances in whole-genome sequencing allow evaluation of this concept within organisms with multiple rRNA operons. METHODOLOGY/PRINCIPAL FINDINGS: Using the 23S rRNA gene as an example, we analyzed the diversity among individual rRNA genes within a genome. Of 184 prokaryotic species containing multiple 23S rRNA genes, diversity was observed in 113 (61.4% genomes (mean 0.40%, range 0.01%-4.04%. Significant (1.17%-4.04% intragenomic variation was found in 8 species. In 5 of the 8 species, the diversity in the primary structure had only minimal effect on the secondary structure (stem versus loop transition. In the remaining 3 species, the diversity significantly altered local secondary structure, but the alteration appears minimized through complex rearrangement. Intervening sequences (IVS, ranging between 9 and 1471 nt in size, were found in 7 species. IVS in Deinococcus radiodurans and Nostoc sp. encode transposases. T. tengcongensis was the only species in which intragenomic diversity >3% was observed among 4 paralogous 23S rRNA genes. CONCLUSIONS/SIGNIFICANCE: These findings indicate tight ribosomal constraints on individual 23S rRNA genes within a genome. Although classification using primary 23S rRNA sequences could be erroneous, significant diversity among paralogous 23S rRNA genes was observed only once in the 184 species analyzed, indicating little overall impact on the mainstream of 23S rRNA gene-based prokaryotic taxonomy.
Full Text Available Neighboring genes in the eukaryotic genome have a tendency to express concurrently, and the proximity of two adjacent genes is often considered a possible explanation for their co-expression behavior. However, the actual contribution of the physical distance between two genes to their co-expression behavior has yet to be defined. To further investigate this issue, we studied the co-expression of neighboring genes in zebrafish, which has a compact genome and has experienced a whole genome duplication event. Our analysis shows that the proportion of highly co-expressed neighboring pairs (Pearson’s correlation coefficient R>0.7 is low (0.24% ~ 0.67%; however, it is still significantly higher than that of random pairs. In particular, the statistical result implies that the co-expression tendency of neighboring pairs is negatively correlated with their physical distance. Our findings therefore suggest that physical distance may play an important role in the co-expression of neighboring genes. Possible mechanisms related to the neighboring genes’ co-expression are also discussed.
Bouma, G; Baggen, J M; van Bodegraven, A A; Mulder, C J J; Kraal, G; Zwiers, A; Horrevoets, A J; van der Pouw Kraan, C T M
Crohn's disease (CD) is characterized by chronic inflammation of the gastrointestinal tract, as a result of aberrant activation of the innate immune system through TLR stimulation by bacterial products. The conventional immunosuppressive thiopurine derivatives (azathioprine and mercaptopurine) are used to treat CD. The effects of thiopurines on circulating immune cells and TLR responsiveness are unknown. To obtain a global view of affected gene expression of the immune system in CD patients and the treatment effect of thiopurine derivatives, we performed genome-wide transcriptome analysis on whole blood samples from 20 CD patients in remission, of which 10 patients received thiopurine treatment, compared to 16 healthy controls, before and after TLR4 stimulation with LPS. Several immune abnormalities were observed, including increased baseline interferon activity, while baseline expression of ribosomal genes was reduced. After LPS stimulation, CD patients showed reduced cytokine and chemokine expression. None of these effects were related to treatment. Strikingly, only one highly correlated set of 69 genes was affected by treatment, not influenced by LPS stimulation and consisted of genes reminiscent of effector cytotoxic NK cells. The most reduced cytotoxicity-related gene in CD was the cell surface marker CD160. Concordantly, we could demonstrate an in vivo reduction of circulating CD160(+)CD3(-)CD8(-) cells in CD patients after treatment with thiopurine derivatives in an independent cohort. In conclusion, using genome-wide profiling, we identified a disturbed immune activation status in peripheral blood cells from CD patients and a clear treatment effect of thiopurine derivatives selectively affecting effector cytotoxic CD160-positive cells. Copyright © 2013 Elsevier Ltd. All rights reserved.
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.
The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.
Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong
Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) progr...... is freely available on a web server at http://fgf.genomics.org.cn/...
Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.
Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning
Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.
Full Text Available Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these
Liu, Yuan; Wei, Haichao
Soybean (Glycine max) is one of the most important crop plants. Wild and cultivated soybean varieties have significant differences worth further investigation, such as plant morphology, seed size, and seed coat development; these characters may be related to auxin biology. The PIN gene family encodes essential transport proteins in cell-to-cell auxin transport, but little research on soybean PIN genes (GmPIN genes) has been done, especially with respect to the evolution and differences between wild and cultivated soybean. In this study, we retrieved 23 GmPIN genes from the latest updated G. max genome database; six GmPIN protein sequences were changed compared with the previous database. Based on the Plant Genome Duplication Database, 18 GmPIN genes have been involved in segment duplication. Three pairs of GmPIN genes arose after the second soybean genome duplication, and six occurred after the first genome duplication. The duplicated GmPIN genes retained similar expression patterns. All the duplicated GmPIN genes experienced purifying selection (K a /K s genome sequence of 17 wild and 14 cultivated soybean varieties. Our research provides useful and comprehensive basic information for understanding GmPIN genes.
Lui, Julian C; Nilsson, Ola; Chan, Yingleong; Palmer, Cameron D; Andrade, Anenisia C; Hirschhorn, Joel N; Baron, Jeffrey
Previous meta-analysis of genome-wide association (GWA) studies has identified 180 loci that influence adult height. However, each GWA locus typically comprises a set of contiguous genes, only one of which presumably modulates height. We reasoned that many of the causative genes within these loci influence height because they are expressed in and function in the growth plate, a cartilaginous structure that causes bone elongation and thus determines stature. Therefore, we used expression microarray studies of mouse and rat growth plate, human disease databases and a mouse knockout phenotype database to identify genes within the GWAS loci that are likely required for normal growth plate function. Each of these approaches identified significantly more genes within the GWA height loci than at random genomic locations (P analysis strongly implicates 78 genes in growth plate function, including multiple genes that participate in PTHrP-IHH, BMP and CNP signaling, and many genes that have not previously been implicated in the growth plate. Thus, this analysis reveals a large number of novel genes that regulate human growth plate chondrogenesis and thereby contribute to the normal variations in human adult height. The analytic approach developed for this study may be applied to GWA studies for other common polygenic traits and diseases, thus providing a new general strategy to identify causative genes within GWA loci and to translate genetic associations into mechanistic biological insights.
Jul 6, 2016 ... candidate genes for drought tolerance in sesame. (Sesamum ... Our results provided genomic resources for further functional analysis and genetic engineering .... reverse transcribed using the Reverse Transcription System.
Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat
In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
Chuang, Katherine; Fields, Mark A; Del Priore, Lucian V
The advent of gene editing has introduced the ability to make changes to the genome of cells, thus allowing for correction of genetic mutations in patients with monogenic diseases. Retinal diseases are particularly suitable for the application of this new technology because many retinal diseases, such as Stargardt disease, retinitis pigmentosa (RP), and Leber congenital amaurosis (LCA), are monogenic. Moreover, gene delivery techniques such as the use of adeno-associated virus (AAV) vectors have been optimized for intraocular use, and phase III trials are well underway to treat LCA, a severe form of inherited retinal degeneration, with gene therapy. This review focuses on the use of gene editing techniques and another relatively recent advent, induced pluripotent stem cells (iPSCs), and their potential for the study and treatment of retinal disease. Investment in these technologies, including overcoming challenges such as off-target mutations and low transplanted cell integration, may allow for future treatment of many debilitating inherited retinal diseases.
Shittu, Ismaila; Sharma, Poonam; Joannis, Tony M.; Volkening, Jeremy D.; Odaibo, Georgina N.; Olaleye, David O.; Williams-Coplin, Dawn; Solomon, Ponman; Abolnik, Celia; Miller, Patti J.; Dimitrov, Kiril M.; Afonso, Claudio L.
The first complete genome sequence of a strain of Newcastle disease virus (NDV) of genotype XVII is described here. A velogenic strain (duck/Nigeria/903/KUDU-113/1992) was isolated from an apparently healthy free-roaming domestic duck sampled in Kuru, Nigeria, in 1992. Phylogenetic analysis of the fusion protein gene and complete genome classified the isolate as a member of NDV class II, genotype XVII.
Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S
PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out
Full Text Available Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30-150 amino acids encoded by short open reading frames (sORFs. SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription, presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10461, 30521, and 23599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation.
Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811
Spradling, A C; Stern, D M; Kiss, I; Roote, J; Laverty, T; Rubin, G M
Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome. DNA flanking the insertions is sequenced, thereby placing an extensive series of genetic markers on the physical genomic map and a...
Osuna-Cruz, Cristina M; Paytuvi-Gallart, Andreu; Di Donato, Antimo; Sundesha, Vicky; Andolfo, Giuseppe; Aiese Cigliano, Riccardo; Sanseverino, Walter; Ercolano, Maria R
The Plant Resistance Genes database (PRGdb; http://prgdb.org) has been redesigned with a new user interface, new sections, new tools and new data for genetic improvement, allowing easy access not only to the plant science research community but also to breeders who want to improve plant disease resistance. The home page offers an overview of easy-to-read search boxes that streamline data queries and directly show plant species for which data from candidate or cloned genes have been collected. Bulk data files and curated resistance gene annotations are made available for each plant species hosted. The new Gene Model view offers detailed information on each cloned resistance gene structure to highlight shared attributes with other genes. PRGdb 3.0 offers 153 reference resistance genes and 177 072 annotated candidate Pathogen Receptor Genes (PRGs). Compared to the previous release, the number of putative genes has been increased from 106 to 177 K from 76 sequenced Viridiplantae and algae genomes. The DRAGO 2 tool, which automatically annotates and predicts (PRGs) from DNA and amino acid with high accuracy and sensitivity, has been added. BLAST search has been implemented to offer users the opportunity to annotate and compare their own sequences. The improved section on plant diseases displays useful information linked to genes and genomes to connect complementary data and better address specific needs. Through, a revised and enlarged collection of data, the development of new tools and a renewed portal, PRGdb 3.0 engages the plant science community in developing a consensus plan to improve knowledge and strategies to fight diseases that afflict main crops and other plants. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Christine M Costello
Full Text Available BACKGROUND: The differential pathophysiologic mechanisms that trigger and maintain the two forms of inflammatory bowel disease (IBD, Crohn disease (CD, and ulcerative colitis (UC are only partially understood. cDNA microarrays can be used to decipher gene regulation events at a genome-wide level and to identify novel unknown genes that might be involved in perpetuating inflammatory disease progression. METHODS AND FINDINGS: High-density cDNA microarrays representing 33,792 UniGene clusters were prepared. Biopsies were taken from the sigmoid colon of normal controls (n = 11, CD patients (n = 10 and UC patients (n = 10. 33P-radiolabeled cDNA from purified poly(A+ RNA extracted from biopsies (unpooled was hybridized to the arrays. We identified 500 and 272 transcripts differentially regulated in CD and UC, respectively. Interesting hits were independently verified by real-time PCR in a second sample of 100 individuals, and immunohistochemistry was used for exemplary localization. The main findings point to novel molecules important in abnormal immune regulation and the highly disturbed cell biology of colonic epithelial cells in IBD pathogenesis, e.g., CYLD (cylindromatosis, turban tumor syndrome and CDH11 (cadherin 11, type 2. By the nature of the array setup, many of the genes identified were to our knowledge previously uncharacterized, and prediction of the putative function of a subsection of these genes indicate that some could be involved in early events in disease pathophysiology. CONCLUSION: A comprehensive set of candidate genes not previously associated with IBD was revealed, which underlines the polygenic and complex nature of the disease. It points out substantial differences in pathophysiology between CD and UC. The multiple unknown genes identified may stimulate new research in the fields of barrier mechanisms and cell signalling in the context of IBD, and ultimately new therapeutic approaches.
Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J
The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Kurbasic, Azra; Poveda, Alaitz; Chen, Yan; Agren, Asa; Engberg, Elisabeth; Hu, Frank B; Johansson, Ingegerd; Barroso, Ines; Brändström, Anders; Hallmans, Göran; Renström, Frida; Franks, Paul W
Most complex diseases have well-established genetic and non-genetic risk factors. In some instances, these risk factors are likely to interact, whereby their joint effects convey a level of risk that is either significantly more or less than the sum of these risks. Characterizing these gene-environment interactions may help elucidate the biology of complex diseases, as well as to guide strategies for their targeted prevention. In most cases, the detection of gene-environment interactions will require sample sizes in excess of those needed to detect the marginal effects of the genetic and environmental risk factors. Although many consortia have been formed, comprising multiple diverse cohorts to detect gene-environment interactions, few robust examples of such interactions have been discovered. This may be because combining data across studies, usually through meta-analysis of summary data from the contributing cohorts, is often a statistically inefficient approach for the detection of gene-environment interactions. Ideally, single, very large and well-genotyped prospective cohorts, with validated measures of environmental risk factor and disease outcomes should be used to study interactions. The presence of strong founder effects within those cohorts might further strengthen the capacity to detect novel genetic effects and gene-environment interactions. Access to accurate genealogical data would also aid in studying the diploid nature of the human genome, such as genomic imprinting (parent-of-origin effects). Here we describe two studies from northern Sweden (the GLACIER and VIKING studies) that fulfill these characteristics.
Fares, Mario A; Sabater-Muñoz, Beatriz; Toft, Christina
Gene duplication generates new genetic material, which has been shown to lead to major innovations in unicellular and multicellular organisms. A whole-genome duplication occurred in the ancestor of Saccharomyces yeast species but 92% of duplicates returned to single-copy genes shortly after duplication. The persisting duplicated genes in Saccharomyces led to the origin of major metabolic innovations, which have been the source of the unique biotechnological capabilities in the Baker's yeast Saccharomyces cerevisiae. What factors have determined the fate of duplicated genes remains unknown. Here, we report the first demonstration that the local genome mutation and transcription rates determine the fate of duplicates. We show, for the first time, a preferential location of duplicated genes in the mutational and transcriptional hotspots of S. cerevisiae genome. The mechanism of duplication matters, with whole-genome duplicates exhibiting different preservation trends compared to small-scale duplicates. Genome mutational and transcriptional hotspots are rich in duplicates with large repetitive promoter elements. Saccharomyces cerevisiae shows more tolerance to deleterious mutations in duplicates with repetitive promoter elements, which in turn exhibit higher transcriptional plasticity against environmental perturbations. Our data demonstrate that the genome traps duplicates through the accelerated regulatory and functional divergence of their gene copies providing a source of novel adaptations in yeast. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Full Text Available Sex-differences in human liver gene expression were characterized on a genome-wide scale using a large liver sample collection, allowing for detection of small expression differences with high statistical power. 1,249 sex-biased genes were identified, 70% showing higher expression in females. Chromosomal bias was apparent, with female-biased genes enriched on chrX and male-biased genes enriched on chrY and chr19, where 11 male-biased zinc-finger KRAB-repressor domain genes are distributed in six clusters. Top biological functions and diseases significantly enriched in sex-biased genes include transcription, chromatin organization and modification, sexual reproduction, lipid metabolism and cardiovascular disease. Notably, sex-biased genes are enriched at loci associated with polygenic dyslipidemia and coronary artery disease in genome-wide association studies. Moreover, of the 8 sex-biased genes at these loci, 4 have been directly linked to monogenic disorders of lipid metabolism and show an expression profile in females (elevated expression of ABCA1, APOA5 and LDLR; reduced expression of LIPC that is consistent with the lower female risk of coronary artery disease. Female-biased expression was also observed for CYP7A1, which is activated by drugs used to treat hypercholesterolemia. Several sex-biased drug-metabolizing enzyme genes were identified, including members of the CYP, UGT, GPX and ALDH families. Half of 879 mouse orthologs, including many genes of lipid metabolism and homeostasis, show growth hormone-regulated sex-biased expression in mouse liver, suggesting growth hormone might play a similar regulatory role in human liver. Finally, the evolutionary rate of protein coding regions for human-mouse orthologs, revealed by dN/dS ratio, is significantly higher for genes showing the same sex-bias in both species than for non-sex-biased genes. These findings establish that human hepatic sex differences are widespread and affect diverse cell
Full Text Available Wolfgang Hueber1,2,3, William H Robinson1,21VA Palo Alto Health Care System, Palo Alto, CA, USA; 2Division of Immunology and Rheumatology, Stanford University School of Medicine, Stanford, CA, USA; 3Novartis Institutes of Biomedical Research, Novartis, Basle, SwitzerlandAbstract: Tremendous progress has been made over the past decade in the development and refinement of genomic and proteomic technologies for the identification of novel drug targets and molecular signatures associated with clinically important disease states, disease subsets, or differential responses to therapies. The rapid progress in high-throughput technologies has been preceded and paralleled by the elucidation of cytokine networks, followed by the stepwise clinical development of pathway-specific biological therapies that revolutionized the treatment of autoimmune diseases. Together, these advances provide opportunities for a long-anticipated personalized medicine approach to the treatment of autoimmune disease. The ever-increasing numbers of novel, innovative therapies will need to be harnessed wisely to achieve optimal long-term outcomes in as many patients as possible while complying with the demands of health authorities and health care providers for evidence-based, economically sound prescription of these expensive drugs. Genomic and proteomic profiling of patients with autoimmune diseases holds great promise in two major clinical areas: (1 rapid identification of new targets for the development of innovative therapies and (2 identification of patients who will experience optimal benefit and minimal risk from a specific (targeted therapy. In this review, we attempt to capture important recent developments in the application of genomic and proteomic technologies to translational research by discussing informative examples covering a diversity of autoimmune diseases.Keywords: proteomics, genomics, autoimmune diseases, antigen microarrays, 2-Dih, rheumatoid arthritis
Balestrini, Raffaella; Sillo, Fabiano; Kohler, Annegret; Schneider, Georg; Faccio, Antonella; Tisserant, Emilie; Martin, Francis; Bonfante, Paola
A genome-wide inventory of proteins involved in cell wall synthesis and remodeling has been obtained by taking advantage of the recently released genome sequence of the ectomycorrhizal Tuber melanosporum black truffle. Genes that encode cell wall biosynthetic enzymes, enzymes involved in cell wall polysaccharide synthesis or modification, GPI-anchored proteins and other cell wall proteins were identified in the black truffle genome. As a second step, array data were validated and the symbiotic stage was chosen as the main focus. Quantitative RT-PCR experiments were performed on 29 selected genes to verify their expression during ectomycorrhizal formation. The results confirmed the array data, and this suggests that cell wall-related genes are required for morphogenetic transition from mycelium growth to the ectomycorrhizal branched hyphae. Labeling experiments were also performed on T. melanosporum mycelium and ectomycorrhizae to localize cell wall components.
Lee Bernett TK
Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.
Glas, Jürgen; Seiderer, Julia; Nagy, Melinda; Fries, Christoph; Beigel, Florian; Weidinger, Maria; Pfennig, Simone; Klein, Wolfram; Epplen, Jörg T.; Lohse, Peter; Folwaczny, Matthias; Göke, Burkhard; Ochsenkühn, Thomas; Diegelmann, Julia; Müller-Myhsok, Bertram
BACKGROUND: Recent studies demonstrated an association of STAT4 variants with systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), indicating that multiple autoimmune diseases share common susceptibility genes. We therefore investigated the influence of STAT4 variants on the susceptibility and phenotype of inflammatory bowel diseases (IBD) in a large patient and control cohort. METHODOLOGY/PRINCIPAL FINDINGS: Genomic DNA from 2704 individuals of Caucasian origin including 857 pat...
Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C
The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population
Full Text Available Abstract Background With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies. Results A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison. Conclusions The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.
Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich
The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.
Walter, Stefan; Atzmon, Gil; Demerath, Ellen W; Garcia, Melissa E; Kaplan, Robert C; Kumari, Meena; Lunetta, Kathryn L; Milaneschi, Yuri; Tanaka, Toshiko; Tranah, Gregory J; Völker, Uwe; Yu, Lei; Arnold, Alice; Benjamin, Emelia J; Biffar, Reiner; Buchman, Aron S; Boerwinkle, Eric; Couper, David; De Jager, Philip L; Evans, Denis A; Harris, Tamara B; Hoffmann, Wolfgang; Hofman, Albert; Karasik, David; Kiel, Douglas P; Kocher, Thomas; Kuningas, Maris; Launer, Lenore J; Lohman, Kurt K; Lutsey, Pamela L; Mackenbach, Johan; Marciante, Kristin; Psaty, Bruce M; Reiman, Eric M; Rotter, Jerome I; Seshadri, Sudha; Shardell, Michelle D; Smith, Albert V; van Duijn, Cornelia; Walston, Jeremy; Zillikens, M Carola; Bandinelli, Stefania; Baumeister, Sebastian E; Bennett, David A; Ferrucci, Luigi; Gudnason, Vilmundur; Kivimaki, Mika; Liu, Yongmei; Murabito, Joanne M; Newman, Anne B; Tiemeier, Henning; Franceschini, Nora
Human longevity and healthy aging show moderate heritability (20%-50%). We conducted a meta-analysis of genome-wide association studies from 9 studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium for 2 outcomes: (1) all-cause mortality, and (2) survival free of major disease or death. No single nucleotide polymorphism (SNP) was a genome-wide significant predictor of either outcome (p < 5 × 10(-8)). We found 14 independent SNPs that predicted risk of death, and 8 SNPs that predicted event-free survival (p < 10(-5)). These SNPs are in or near genes that are highly expressed in the brain (HECW2, HIP1, BIN2, GRIA1), genes involved in neural development and function (KCNQ4, LMO4, GRIA1, NETO1) and autophagy (ATG4C), and genes that are associated with risk of various diseases including cancer and Alzheimer's disease. In addition to considerable overlap between the traits, pathway and network analysis corroborated these findings. These findings indicate that variation in genes involved in neurological processes may be an important factor in regulating aging free of major disease and achieving longevity. Copyright © 2011 Elsevier Inc. All rights reserved.
Shittu, Ismaila; Sharma, Poonam; Joannis, Tony M.; Volkening, Jeremy D.; Odaibo, Georgina N.; Olaleye, David O.; Williams-Coplin, Dawn; Solomon, Ponman; Abolnik, Celia; Miller, Patti J.; Dimitrov, Kiril M.
The first complete genome sequence of a strain of Newcastle disease virus (NDV) of genotype XVII is described here. A velogenic strain (duck/Nigeria/903/KUDU-113/1992) was isolated from an apparently healthy free-roaming domestic duck sampled in Kuru, Nigeria, in 1992. Phylogenetic analysis of the fusion protein gene and complete genome classified the isolate as a member of NDV class II, genotype XVII. PMID:26847901
Genomics was introduced with big promises and expectations of its future contribution to our society. Medical genomics was introduced as that which would lay the foundation for a revolution in our management of common diseases. Genomics would lead the way towards a future of personalised medicine.