WorldWideScience

Sample records for genomic structure gene

  1. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene struc

  2. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  3. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  4. Genome-wide identification of structural variants in genes encoding drug targets

    DEFF Research Database (Denmark)

    Rasmussen, Henrik Berg; Dahmcke, Christina Mackeprang

    2012-01-01

    The objective of the present study was to identify structural variants of drug target-encoding genes on a genome-wide scale. We also aimed at identifying drugs that are potentially amenable for individualization of treatments based on knowledge about structural variation in the genes encoding the...

  5. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  6. The Complete Chloroplast Genome Sequence of Podocarpus lambertii: Genome Structure, Evolutionary Aspects, Gene Content and SSR Detection

    Science.gov (United States)

    Vieira, Leila do Nascimento; Faoro, Helisson; Rogalski, Marcelo; Fraga, Hugo Pacheco de Freitas; Cardoso, Rodrigo Luis Alves; de Souza, Emanuel Maltempi; de Oliveira Pedrosa, Fábio; Nodari, Rubens Onofre; Guerra, Miguel Pedro

    2014-01-01

    Background Podocarpus lambertii (Podocarpaceae) is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp) genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. Methodology/Principal Findings The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR). It contains 118 unique genes and one duplicated tRNA (trnN-GUU), which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi) and Araucariaceae (Agathis dammara). Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. Conclusion The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of this genus. PMID

  7. Gene finding with a hidden Markov model of genome structure and evolution

    DEFF Research Database (Denmark)

    Pedersen, Jakob Skou; Hein, Jotun

    2003-01-01

    annotation. The modelling of evolution by the existing comparative gene finders leaves room for improvement. Results: A probabilistic model of both genome structure and evolution is designed. This type of model is called an Evolutionary Hidden Markov Model (EHMM), being composed of an HMM and a set of region......Motivation: A growing number of genomes are sequenced. The differences in evolutionary pattern between functional regions can thus be observed genome-wide in a whole set of organisms. The diverse evolutionary pattern of different functional regions can be exploited in the process of genomic...

  8. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    Science.gov (United States)

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides.

  9. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA

  10. Alpha tubulin genes from Leishmania braziliensis: genomic organization, gene structure and insights on their expression.

    Science.gov (United States)

    Ramírez, César A; Requena, José M; Puerta, Concepción J

    2013-07-06

    Alpha tubulin is a fundamental component of the cytoskeleton which is responsible for cell shape and is involved in cell division, ciliary and flagellar motility and intracellular transport. Alpha tubulin gene expression varies according to the morphological changes suffered by Leishmania in its life cycle. However, the objective of studying the mechanisms responsible for the differential expression has resulted to be a difficult task due to the complex genome organization of tubulin genes and to the non-conventional mechanisms of gene regulation operating in Leishmania. We started this work by analyzing the genomic organization of α-tubulin genes in the Leishmania braziliensis genome database. The genomic organization of L. braziliensis α-tubulin genes differs from that existing in the L. major and L. infantum genomes. Two loci containing α-tubulin genes were found in the chromosomes 13 and 29, even though the existence of sequence gaps does not allow knowing the exact number of genes at each locus. Southern blot assays showed that α-tubulin locus at chromosome 13 contains at least 8 gene copies, which are tandemly organized with a 2.08-kb repetition unit; the locus at chromosome 29 seems to contain a sole α-tubulin gene. In addition, it was found that L. braziliensis α-tubulin locus at chromosome 13 contains two types of α-tubulin genes differing in their 3' UTR, each one presumably containing different regulatory motifs. It was also determined that the mRNA expression levels of these genes are controlled by post-transcriptional mechanisms tightly linked to the growth temperature. Moreover, the decrease in the α-tubulin mRNA abundance observed when promastigotes were cultured at 35°C was accompanied by parasite morphology alterations, similar to that occurring during the promastigote to amastigote differentiation. Information found in the genome databases indicates that α-tubulin genes have been reorganized in a drastic manner along Leishmania

  11. The genomic structure of the DMBT1 gene

    DEFF Research Database (Denmark)

    Mollenhauer, J; Holmskov, U; Wiemann, S

    1999-01-01

    Increasing evidence has accumulated for an involvement of the inactivation of tumour suppressor genes at chromosome 10q in the carcinogenesis of brain tumours, melanomas, and carcinomas of the lung, the prostate, the pancreas, and the endometrium. The gene DMBT1 (Deleted in Malignant Brain Tumour...

  12. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...... allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  13. Gene order data from a model amphibian (Ambystoma: new perspectives on vertebrate genome structure and evolution

    Directory of Open Access Journals (Sweden)

    Voss S Randal

    2006-08-01

    Full Text Available Abstract Background Because amphibians arise from a branch of the vertebrate evolutionary tree that is juxtaposed between fishes and amniotes, they provide important comparative perspective for reconstructing character changes that have occurred during vertebrate evolution. Here, we report the first comparative study of vertebrate genome structure that includes a representative amphibian. We used 491 transcribed sequences from a salamander (Ambystoma genetic map and whole genome assemblies for human, mouse, rat, dog, chicken, zebrafish, and the freshwater pufferfish Tetraodon nigroviridis to compare gene orders and rearrangement rates. Results Ambystoma has experienced a rate of genome rearrangement that is substantially lower than mammalian species but similar to that of chicken and fish. Overall, we found greater conservation of genome structure between Ambystoma and tetrapod vertebrates, nevertheless, 57% of Ambystoma-fish orthologs are found in conserved syntenies of four or more genes. Comparisons between Ambystoma and amniotes reveal extensive conservation of segmental homology for 57% of the presumptive Ambystoma-amniote orthologs. Conclusion Our analyses suggest relatively constant interchromosomal rearrangement rates from the euteleost ancestor to the origin of mammals and illustrate the utility of amphibian mapping data in establishing ancestral amniote and tetrapod gene orders. Comparisons between Ambystoma and amniotes reveal some of the key events that have structured the human genome since diversification of the ancestral amniote lineage.

  14. Extensive loss of translational genes in the structurally dynamic mitochondrial genome of the angiosperm Silene latifolia

    Directory of Open Access Journals (Sweden)

    Sloan Daniel B

    2010-09-01

    Full Text Available Abstract Background Mitochondrial gene loss and functional transfer to the nucleus is an ongoing process in many lineages of plants, resulting in substantial variation across species in mitochondrial gene content. The Caryophyllaceae represents one lineage that has experienced a particularly high rate of mitochondrial gene loss relative to other angiosperms. Results In this study, we report the first complete mitochondrial genome sequence from a member of this family, Silene latifolia. The genome can be mapped as a 253,413 bp circle, but its structure is complicated by a large repeated region that is present in 6 copies. Active recombination among these copies produces a suite of alternative genome configurations that appear to be at or near "recombinational equilibrium". The genome contains the fewest genes of any angiosperm mitochondrial genome sequenced to date, with intact copies of only 25 of the 41 protein genes inferred to be present in the common ancestor of angiosperms. As observed more broadly in angiosperms, ribosomal proteins have been especially prone to gene loss in the S. latifolia lineage. The genome has also experienced a major reduction in tRNA gene content, including loss of functional tRNAs of both native and chloroplast origin. Even assuming expanded wobble-pairing rules, the mitochondrial genome can support translation of only 17 of the 61 sense codons, which code for only 9 of the 20 amino acids. In addition, genes encoding 18S and, especially, 5S rRNA exhibit exceptional sequence divergence relative to other plants. Divergence in one region of 18S rRNA appears to be the result of a gene conversion event, in which recombination with a homologous gene of chloroplast origin led to the complete replacement of a helix in this ribosomal RNA. Conclusions These findings suggest a markedly expanded role for nuclear gene products in the translation of mitochondrial genes in S. latifolia and raise the possibility of altered

  15. Functional genomics and structural biology in the definition of gene function.

    Science.gov (United States)

    Hrmova, Maria; Fincher, Geoffrey B

    2009-01-01

    By mid-2007, the three-dimensional (3D) structures of some 45,000 proteins have been solved, over a period where the linear structures of millions of genes have been defined. Technical challenges associated with X-ray crystallography are being overcome and high-throughput methods both for crystallization of proteins and for solving their 3D structures are under development. The question arises as to how structural biology can be integrated with and adds value to functional genomics programs. Structural biology will assist in the definition of gene function through the identification of the likely function of the protein products of genes. The 3D information allows protein sequences predicted from DNA sequences to be classified into broad groups, according to the overall 'fold', or 3D shape, of the protein. Structural information can be used to predict the preferred substrate of a protein, and thereby greatly enhance the accurate annotation of the corresponding gene. Furthermore, it will enable the effects of amino acid substitutions in enzymes to be better understood with respect to enzyme function and could thereby provide insights into natural variation in genes. If the molecular basis of transcription factor-DNA interactions were defined through precise 3D knowledge of the protein-DNA binding site, it would be possible to predict the effects of base substitutions within the motif on the specificity and/or kinetics of binding. In this chapter, we present specific examples of how structural biology can provide valuable information for functional genomics programs.

  16. Genomic structure of the human BCCIP gene and its expression in cancer.

    Science.gov (United States)

    Meng, Xiangbing; Liu, Jingmei; Shen, Zhiyuan

    2003-01-02

    Human BCCIPalpha (Tok-1alpha) is a BRCA2 and CDKN1A (Cip1, p21) interacting protein. Our previous studies have showed that overexpression of BCCIPalpha inhibits the growth of certain tumor cells [Oncogene 20 (2001) 336]. In this study, we report the genomic structure of the human BCCIP gene, which contains nine exons. Alternative splicing of the 3'-terminal exons produces two isoforms of BCCIP transcripts, BCCIPalpha and BCCIPbeta. The BCCIP gene is flanked by two genes that are transcribed in the opposite orientation of the BCCIP gene. It lies head-to-head and shares a bi-directional promoter with the uroporphyrinogen III synthase (UROS) gene. The last three exons of BCCIP gene overlap the 3'-terminal seven exons of a DEAD/H helicase-like gene (DDX32). Using a matched normal/tumor cDNA array, we identified a reduced expression of BCCIP in kidney tumor, suggesting a role of BCCIP in cancer etiology.

  17. Genomic structure and refined chromosomal localization of the mouse Ptch2 gene.

    Science.gov (United States)

    Fröhlich, L; Liu, Z; Beier, D R; Lanske, B

    2002-01-01

    The vertebrate Patched 2 (Ptch2) gene encodes a putative membrane-embedded protein which may have roles in Hedgehog signaling during development and in tumorigenesis. We determined the genomic structure of the mouse Ptch2 gene and show that Ptch2 is composed of 22 exons spanning approximately 18 kb of genomic DNA. The exon-intron boundaries were found to be conserved within the human and mouse Ptch2 genes. Analysis of the 5' flanking region revealed a CpG island, the putative promoter region and the transcriptional start site while a polyadenylation signal as well as a mRNA destabilizing motif were identified in the 3' flanking region. Single-strand conformation polymorphism analysis was used to map mouse Ptch2 to chromosome 4 between the microsatellite markers D4Mit20 and D4Mit334.

  18. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results......: We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of about 84% and specificity of about 97%. We additionally annotate three pairwise alignments of the more distantly related HIV1...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  19. The mouse Fau gene: genomic structure, chromosomal localization, and characterization of two retropseudogenes.

    Science.gov (United States)

    Casteels, D; Poirier, C; Guénet, J L; Merregaert, J

    1995-01-01

    The Fau gene is the cellular homolog of the fox sequence of the Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV). FBR-MuSV acquired the Fau gene by transduction in a transcriptional orientation opposite to that of the genomic Fau gene. The genomic structure of the mouse Fau gene (MMFAU) and its upstream elements have been determined and are similar to those of the human FAU gene. The gene consists of five exons and is located on chromosome 19. The first exon is not translated. The promoter region has no well-defined TATA box but contains the polypyrimidine initiator flanked by regions of high GC content (65%) and shows all of the characteristics of a housekeeping gene. The 5' end of the mRNA transcript was determined by 5' RACE analysis and is located, as expected, in the polypyrimidine initiator site. Furthermore, the sequences of two retropseudogenes (Fau-ps1 and Fau-ps2) are reported. Both pseudogenes are approximately 75% identical to the Fau cDNA, but both are shorter due to a deletion at the 5' end and do not encode a functional protein. Fau-prs is interrupted by an AG-rich region of about 350 bp within the S30 region of the Fau cDNA. Fau-ps1 was localized on chromosome 1 and Fau-ps2 on chromosome 7.

  20. Global transcript structure resolution of high gene density genomes through multi-platform data integration.

    Science.gov (United States)

    O'Grady, Tina; Wang, Xia; Höner Zu Bentrup, Kerstin; Baddoo, Melody; Concha, Monica; Flemington, Erik K

    2016-10-14

    Annotation of herpesvirus genomes has traditionally been undertaken through the detection of open reading frames and other genomic motifs, supplemented with sequencing of individual cDNAs. Second generation sequencing and high-density microarray studies have revealed vastly greater herpesvirus transcriptome complexity than is captured by existing annotation. The pervasive nature of overlapping transcription throughout herpesvirus genomes, however, poses substantial problems in resolving transcript structures using these methods alone. We present an approach that combines the unique attributes of Pacific Biosciences Iso-Seq long-read, Illumina short-read and deepCAGE (Cap Analysis of Gene Expression) sequencing to globally resolve polyadenylated isoform structures in replicating Epstein-Barr virus (EBV). Our method, Transcriptome Resolution through Integration of Multi-platform Data (TRIMD), identifies nearly 300 novel EBV transcripts, quadrupling the size of the annotated viral transcriptome. These findings illustrate an array of mechanisms through which EBV achieves functional diversity in its relatively small, compact genome including programmed alternative splicing (e.g. across the IR1 repeats), alternative promoter usage by LMP2 and other latency-associated transcripts, intergenic splicing at the BZLF2 locus, and antisense transcription and pervasive readthrough transcription throughout the genome.

  1. The genomic structure of human BTK, the defective gene in X-linked agammaglobulinemia

    Energy Technology Data Exchange (ETDEWEB)

    Rohrer, J.; Parolini, O. [St. Jude Children`s Research Hospital, Memphis, TN (United States); Conley, M.E. [St. Jude Children`s Research Hospital, Memphis, TN (United States)]|[Univ. of Tennessee College of Medicine, Memphis, TN (United States); Belmont, J.W. [Baylor College of Medicine, Houston, TX (United States)

    1994-12-31

    It has recently been demonstrated that mutations in the gene for Bruton`s tyrosine kinase (BTK) are responsible for X-linked agammaglobulinemia. Southern blot analysis and sequencing of cDNA were used to document deletions, insertions, and single base pair substitutions. To facilitate analysis of BTK regulation and to permit the development of assays that could be used to screen genomic DNA for mutations in BTK, the authors determined the genomic organization of this gene. Subcloning of a cosmid and a yeast artificial chromosome showed that BTK is divided into 19 exons spanning 37 kilobases of genomic DNA. Analysis of the region 5{prime} to the first untranslated exon revealed no consensus TATAA or CAAT boxes; however, three retinoic acid binding sites were identified in this region. Comparison of the structure of BTK with that of other nonreceptor tyrosine kinases, including SRC, FES, and CSK, demonstrated a lack of conservation of exon borders. Information obtained in this study will contribute to understanding of the evolution of nonreceptor tyrosine kinases. It will also be useful in diagnostic studies, including carrier detection, and in studies directed towards gene therapy or gene replacement. 29 refs., 2 figs., 2 tabs.

  2. Structure and organization of Marchantia polymorpha chloroplast genome. I. Cloning and gene identification.

    Science.gov (United States)

    Ohyama, K; Fukuzawa, H; Kohchi, T; Sano, T; Sano, S; Shirai, H; Umesono, K; Shiki, Y; Takeuchi, M; Chang, Z

    1988-09-20

    We have determined the complete nucleotide sequence of chloroplast DNA from a liverwort, Marchantia polymorpha, using a clone bank of chloroplast DNA fragments. The circular genome consists of 121,024 base-pairs and includes two large inverted repeats (IRA and IRB, each 10,058 base-pairs), a large single-copy region (LSC, 81,095 base-pairs), and a small single-copy region (SSC, 19,813 base-pairs). The nucleotide sequence was analysed with a computer to deduce the entire gene organization, assuming the universal genetic code and the presence of introns in the coding sequences. We detected 136 possible genes. 103 gene products of which are related to known stable RNA or protein molecules. Stable RNA genes for four species of ribosomal RNA and 32 species of tRNA were located, although one of the tRNA genes may be defective. Twenty genes encoding polypeptides involved in photosynthesis and electron transport were identified by comparison with known chloroplast genes. Twenty-five open reading frames (ORFs) show structural similarities to Escherichia coli RNA polymerase subunits, 19 ribosomal proteins and two related proteins. Seven ORFs are comparable with human mitochondrial NADH dehydrogenase genes. A computer-aided homology search predicted possible chloroplast homologues of bacterial proteins; two ORFs for bacterial 4Fe-4S-type ferredoxin, two for distinct subunits of a protein-dependent transport system, one ORF for a component of nitrogenase, and one for an antenna protein of a light-harvesting complex. The other 33 ORFs, consisting of 29 to 2136 codons, remain to be identified, but some of them seem to be conserved in evolution. Detailed information on gene identification is presented in the accompanying papers. We postulated that there were 22 introns in 20 genes (8 tRNA genes and 12 ORFs), which may be classified into the groups I and II found in fungal mitochondrial genes. The structural gene for ribosomal protein S12 is trans-split on the opposite DNA strand

  3. Genomic structure, characterization, and identification of the promotor of the human IL-8 receptor A gene

    Energy Technology Data Exchange (ETDEWEB)

    Sprenger, H.; Lloyd, A.R.; Meyer, R.G.; Johnston, J.A.; Kelvin, D.J. [National Cancer Institute, Frederick, MA (United States)

    1994-09-15

    Two unique but homologous receptors for the neutrophil chemoattractant IL-8 have been cloned (designated IL-8RA and IL-8RB), each of which binds IL-8 with high affinity. IL-8RA mRNA expression was found to be regulated by granulocyte-CSF and LPS. In an attempt to understand the tissue-specific expression and to identify transcriptional regulatory elements, the authors have cloned, sequenced, and characterized the human IL-8RA gene. A {lambda}-DASH clone encoding the entire human IL-8RA gene was isolated by screening a genomic library with a PCR-generated cDNA. After mapping, subcloning, and sequencing several restriction fragments, a 9.2-kb continuous DNA sequence was obtained. As the sizes of the published cDNA (1.9 kb) and the mRNA determined by Northern blot analysis (2.1 kb) were not in agreement, a full-length cDNA was cloned by using a modified rapid amplification of cDNA ends technique. They identified a 5{prime}-untranslated region of 119 bp. After comparison with the genomic sequence, they found the gene consisted of two exons interrupted by an intron of 1.7 kb. A 1050-bp ORF was encoded entirely in the second exon together with a 834-bp 3{prime}-untranslated region. The immediate GC-rich 5{prime}-flanking region upstream of exon 1 could serve as a constitutively active promoter in chloramphenicolacetyl-transferase-expression assays. Expression analysis of additional upstream regions suggested the presence of silencer elements between positions -841 and -280. In conclusion, cloning a full-length cDNA permitted cloning of the human IL-8RA gene, identification of the genomic structure, and characterization of the promoter region. 45 refs., 6 figs.

  4. The population genomics of begomoviruses: global scale population structure and gene flow

    Directory of Open Access Journals (Sweden)

    Prasanna HC

    2010-09-01

    Full Text Available Abstract Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could

  5. Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

    Directory of Open Access Journals (Sweden)

    Ovchinnikov Sergey

    2012-03-01

    Full Text Available Abstract Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic

  6. Genomic structure, promoter analysis, and expression of the porcine (Sus scrofa) Mx1 gene.

    Science.gov (United States)

    Thomas, Anne V; Palm, Melanie; Broers, Aurore D; Zezafoun, Hussein; Desmecht, Daniel J-M

    2006-06-01

    Allelic polymorphisms at the mouse Mx1 locus affect the probability of survival after experimental influenzal disease, raising the possibility that marker-assisted selection using the homologous locus could improve the innate resistance of pigs to natural influenza infections. Several issues need to be resolved before efficient large scale screening of the allelic polymorphism at the porcine (Sus scrofa) Mx1 locus can be implemented. First, the Mx1 genomic structure has to be established and sufficient flanking intronic sequences have to be gathered to enable simple PCR amplification of the coding portions of the gene. Then, a basic knowledge of the promoter region needs to be obtained as an allelic variation there can significantly alter absolute levels and/or tissue-specificity of MX protein expression. The results gathered here show that the porcine Mx1 gene and promoter share the major structural and functional characteristics displayed by their homologs described in cattle, mouse, chicken, and man. The crucial function of the proximal interferon-sensitive response elements motif for gene expression is also demonstrated. The sequence data compiled here will allow an extensive analysis of the polymorphisms present among the widest spectrum possible of porcine breeds with the aim to identify an Mx1 allele providing antiviral resistance.

  7. A highly conserved gene island of three genes on chromosome 3B of hexaploid wheat: diverse gene function and genomic structure maintained in a tightly linked block

    Directory of Open Access Journals (Sweden)

    Ma Wujun

    2010-05-01

    Full Text Available Abstract Background The complexity of the wheat genome has resulted from waves of retrotransposable element insertions. Gene deletions and disruptions generated by the fast replacement of repetitive elements in wheat have resulted in disruption of colinearity at a micro (sub-megabase level among the cereals. In view of genomic changes that are possible within a given time span, conservation of genes between species tends to imply an important functional or regional constraint that does not permit a change in genomic structure. The ctg1034 contig completed in this paper was initially studied because it was assigned to the Sr2 resistance locus region, but detailed mapping studies subsequently assigned it to the long arm of 3B and revealed its unusual features. Results BAC shotgun sequencing of the hexaploid wheat (Triticum aestivum cv. Chinese Spring genome has been used to assemble a group of 15 wheat BACs from the chromosome 3B physical map FPC contig ctg1034 into a 783,553 bp genomic sequence. This ctg1034 sequence was annotated for biological features such as genes and transposable elements. A three-gene island was identified among >80% repetitive DNA sequence. Using bioinformatics analysis there were no observable similarity in their gene functions. The ctg1034 gene island also displayed complete conservation of gene order and orientation with syntenic gene islands found in publicly available genome sequences of Brachypodium distachyon, Oryza sativa, Sorghum bicolor and Zea mays, even though the intergenic space and introns were divergent. Conclusion We propose that ctg1034 is located within the heterochromatic C-band region of deletion bin 3BL7 based on the identification of heterochromatic tandem repeats and presence of significant matches to chromodomain-containing gypsy LTR retrotransposable elements. We also speculate that this location, among other highly repetitive sequences, may account for the relative stability in gene order and

  8. The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

    Energy Technology Data Exchange (ETDEWEB)

    Dyer, K.D. [National Inst. of Health, Bethesda, MD (United States)]|[Georgetown Univ. Medical Center, Washington, DC (United States); Handen, J.S.; Rosenberg, H.F. [National Inst. of Health, Bethesda, MD (United States)

    1997-03-01

    The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside binding site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.

  9. Ultra high-resolution gene centric genomic structural analysis of a non-syndromic congenital heart defect, Tetralogy of Fallot.

    Directory of Open Access Journals (Sweden)

    Douglas C Bittel

    Full Text Available Tetralogy of Fallot (TOF is one of the most common severe congenital heart malformations. Great progress has been made in identifying key genes that regulate heart development, yet approximately 70% of TOF cases are sporadic and nonsyndromic with no known genetic cause. We created an ultra high-resolution gene centric comparative genomic hybridization (gcCGH microarray based on 591 genes with a validated association with cardiovascular development or function. We used our gcCGH array to analyze the genomic structure of 34 infants with sporadic TOF without a deletion on chromosome 22q11.2 (n male = 20; n female = 14; age range of 2 to 10 months. Using our custom-made gcCGH microarray platform, we identified a total of 613 copy number variations (CNVs ranging in size from 78 base pairs to 19.5 Mb. We identified 16 subjects with 33 CNVs that contained 13 different genes which are known to be directly associated with heart development. Additionally, there were 79 genes from the broader list of genes that were partially or completely contained in a CNV. All 34 individuals examined had at least one CNV involving these 79 genes. Furthermore, we had available whole genome exon arrays from right ventricular tissue in 13 of our subjects. We analyzed these for correlations between copy number and gene expression level. Surprisingly, we could detect only one clear association between CNVs and expression (GSTT1 for any of the 591 focal genes on the gcCGH array. The expression levels of GSTT1 were correlated with copy number in all cases examined (r = 0.95, p = 0.001. We identified a large number of small CNVs in genes with varying associations with heart development. Our results illustrate the complexity of human genome structural variation and underscore the need for multifactorial assessment of potential genetic/genomic factors that contribute to congenital heart defects.

  10. Gene expression in chicken reveals correlation with structural genomic features and conserved patterns of transcription in the terrestrial vertebrates.

    Directory of Open Access Journals (Sweden)

    Haisheng Nie

    Full Text Available BACKGROUND: The chicken is an important agricultural and avian-model species. A survey of gene expression in a range of different tissues will provide a benchmark for understanding expression levels under normal physiological conditions in birds. With expression data for birds being very scant, this benchmark is of particular interest for comparative expression analysis among various terrestrial vertebrates. METHODOLOGY/PRINCIPAL FINDINGS: We carried out a gene expression survey in eight major chicken tissues using whole genome microarrays. A global picture of gene expression is presented for the eight tissues, and tissue specific as well as common gene expression were identified. A Gene Ontology (GO term enrichment analysis showed that tissue-specific genes are enriched with GO terms reflecting the physiological functions of the specific tissue, and housekeeping genes are enriched with GO terms related to essential biological functions. Comparisons of structural genomic features between tissue-specific genes and housekeeping genes show that housekeeping genes are more compact. Specifically, coding sequence and particularly introns are shorter than genes that display more variation in expression between tissues, and in addition intergenic space was also shorter. Meanwhile, housekeeping genes are more likely to co-localize with other abundantly or highly expressed genes on the same chromosomal regions. Furthermore, comparisons of gene expression in a panel of five common tissues between birds, mammals and amphibians showed that the expression patterns across tissues are highly similar for orthologous genes compared to random gene pairs within each pair-wise comparison, indicating a high degree of functional conservation in gene expression among terrestrial vertebrates. CONCLUSIONS: The housekeeping genes identified in this study have shorter gene length, shorter coding sequence length, shorter introns, and shorter intergenic regions, there seems

  11. Genomic structure, organisation, and promoter analysis of the bovine (Bos taurus) Mx1 gene.

    Science.gov (United States)

    Gérardin, Joël A; Baise, Etienne A; Pire, Grégory A; Leroy, Michaël P-P; Desmecht, Daniel J-M

    2004-02-04

    Some MX proteins are known to confer a specific resistance against a panel of single-stranded RNA viruses. Many diseases due to such viruses are known to affect cattle worldwide, raising the possibility that the identification of an antiviral isoform of a bovine MX protein would allow the implementation of genetic selection programs aimed at improving innate resistance of cattle. With this potential application in mind, the present study was designed to isolate the bovine Mx1 gene including its promoter region and to investigate its genomic organisation and promoter reactivity. The bovine Mx1 gene is made up of 15 exons. All exon-intron boundaries conformed to the consensus sequences. A PCR product that contained a approximately 1-kb, 5'-flanking region upstream from the putative transcription start site was sequenced. Unexpectedly, this DNA region did not contain TATA or CCAAT motifs. A computer scan of the region disclosed a series of putative binding sites for known cytokines and transcription factors. There was a GAAAN(1-2)GAAA(C/G) motif, typical of an interferon-sensitive responsive element, between -118 and -107 from the putative transcription start site. There were also a NF-kappaB, two interleukin-6 binding sites, two Sp1 sites and five GC-rich boxes. The region also contained 12 stretches of the GAAA type, as described in all IFN-inducible genes. Bovine Mx1 expression was assessed by Northern blotting and immunofluorescence in the Madin Darby bovine kidney cells (MDBK) cell line treated with several stimuli. In conclusion, the bovine Mx1 gene and promoter region share the major structural and functional characteristics displayed by their homologs described in the rainbow trout, chicken, mouse and man.

  12. Genome-wide analysis of the expansin gene superfamily reveals grapevine-specific structural and functional characteristics.

    Directory of Open Access Journals (Sweden)

    Silvia Dal Santo

    Full Text Available BACKGROUND: Expansins are proteins that loosen plant cell walls in a pH-dependent manner, probably by increasing the relative movement among polymers thus causing irreversible expansion. The expansin superfamily (EXP comprises four distinct families: expansin A (EXPA, expansin B (EXPB, expansin-like A (EXLA and expansin-like B (EXLB. There is experimental evidence that EXPA and EXPB proteins are required for cell expansion and developmental processes involving cell wall modification, whereas the exact functions of EXLA and EXLB remain unclear. The complete grapevine (Vitis vinifera genome sequence has allowed the characterization of many gene families, but an exhaustive genome-wide analysis of expansin gene expression has not been attempted thus far. METHODOLOGY/PRINCIPAL FINDINGS: We identified 29 EXP superfamily genes in the grapevine genome, representing all four EXP families. Members of the same EXP family shared the same exon-intron structure, and phylogenetic analysis confirmed a closer relationship between EXP genes from woody species, i.e. grapevine and poplar (Populus trichocarpa, compared to those from Arabidopsis thaliana and rice (Oryza sativa. We also identified grapevine-specific duplication events involving the EXLB family. Global gene expression analysis confirmed a strong correlation among EXP genes expressed in mature and green/vegetative samples, respectively, as reported for other gene families in the recently-published grapevine gene expression atlas. We also observed the specific co-expression of EXLB genes in woody organs, and the involvement of certain grapevine EXP genes in berry development and post-harvest withering. CONCLUSION: Our comprehensive analysis of the grapevine EXP superfamily confirmed and extended current knowledge about the structural and functional characteristics of this gene family, and also identified properties that are currently unique to grapevine expansin genes. Our data provide a model for the

  13. Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression

    Directory of Open Access Journals (Sweden)

    Adam Rodney D

    2007-04-01

    Full Text Available Abstract Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome.

  14. Comparisons of Copy Number, Genomic Structure, and Conserved Motifs for α-Amylase Genes from Barley, Rice, and Wheat

    Directory of Open Access Journals (Sweden)

    Qisen Zhang

    2017-10-01

    Full Text Available Barley is an important crop for the production of malt and beer. However, crops such as rice and wheat are rarely used for malting. α-amylase is the key enzyme that degrades starch during malting. In this study, we compared the genomic properties, gene copies, and conserved promoter motifs of α-amylase genes in barley, rice, and wheat. In all three crops, α-amylase consists of four subfamilies designated amy1, amy2, amy3, and amy4. In wheat and barley, members of amy1 and amy2 genes are localized on chromosomes 6 and 7, respectively. In rice, members of amy1 genes are found on chromosomes 1 and 2, and amy2 genes on chromosome 6. The barley genome has six amy1 members and three amy2 members. The wheat B genome contains four amy1 members and three amy2 members, while the rice genome has three amy1 members and one amy2 member. The B genome has mostly amy1 and amy2 members among the three wheat genomes. Amy1 promoters from all three crop genomes contain a GA-responsive complex consisting of a GA-responsive element (CAATAAA, pyrimidine box (CCTTTT and TATCCAT/C box. This study has shown that amy1 and amy2 from both wheat and barley have similar genomic properties, including exon/intron structures and GA-responsive elements on promoters, but these differ in rice. Like barley, wheat should have sufficient amy activity to degrade starch completely during malting. Other factors, such as high protein with haze issues and the lack of husk causing Lauting difficulty, may limit the use of wheat for brewing.

  15. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

    Science.gov (United States)

    Gao, Feng; Song, Weibo; Katz, Laura A

    2014-08-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.

  16. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection.

    Science.gov (United States)

    Gichira, Andrew W; Li, Zhizhong; Saina, Josphat K; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W; Wang, Qingfeng; Chen, Jinming

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica's chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  17. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae: structural comparative analysis, gene content and microsatellite detection

    Directory of Open Access Journals (Sweden)

    Andrew W. Gichira

    2017-01-01

    Full Text Available Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp, with a pair of Inverted Repeats (IR 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp and a small single copy (SSC, 18,696. H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  18. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection

    Science.gov (United States)

    Saina, Josphat K.; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W.

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  19. The genomic structure and developmental expression patterns of the human OPA-containing gene (HOPA).

    Science.gov (United States)

    Philibert, R A; Winfield, S L; Damschroder-Williams, P; Tengstrom, C; Martin, B M; Ginns, E I

    1999-01-01

    We determined the genomic organization of the human OPA-containing gene (HOPA) and characterized its developmental expression. The gene encoding HOPA, which contains a rare polymorphism tightly associated with non-specific mental retardation, is 25 kb in length and consists of 44 exons. A promoter scan analysis demonstrates two possible transcription initiation sites without TATA boxes upstream from the putative translation initiation start site. Several informative polymorphisms are evident in the sequence including a large pentanucleotide repeat. Northern blot analysis of the gene transcript and its murine orthologue, MOPA-1, demonstrates that only one transcript is expressed throughout the soma and the CNS, and that the transcript is highly expressed during early fetal development. We conclude that the delineation of the function of the HOPA gene locus merits further study.

  20. Structural analysis of the genome of breast cancer cell line ZR-75-30 identifies twelve expressed fusion genes

    Directory of Open Access Journals (Sweden)

    Schulte Ina

    2012-12-01

    Full Text Available Abstract Background It has recently emerged that common epithelial cancers such as breast cancers have fusion genes like those in leukaemias. In a representative breast cancer cell line, ZR-75-30, we searched for fusion genes, by analysing genome rearrangements. Results We first analysed rearrangements of the ZR-75-30 genome, to around 10kb resolution, by molecular cytogenetic approaches, combining array painting and array CGH. We then compared this map with genomic junctions determined by paired-end sequencing. Most of the breakpoints found by array painting and array CGH were identified in the paired end sequencing—55% of the unamplified breakpoints and 97% of the amplified breakpoints (as these are represented by more sequence reads. From this analysis we identified 9 expressed fusion genes: APPBP2-PHF20L1, BCAS3-HOXB9, COL14A1-SKAP1, TAOK1-PCGF2, TIAM1-NRIP1, TIMM23-ARHGAP32, TRPS1-LASP1, USP32-CCDC49 and ZMYM4-OPRD1. We also determined the genomic junctions of a further three expressed fusion genes that had been described by others, BCAS3-ERBB2, DDX5-DEPDC6/DEPTOR and PLEC1-ENPP2. Of this total of 12 expressed fusion genes, 9 were in the coamplification. Due to the sensitivity of the technologies used, we estimate these 12 fusion genes to be around two-thirds of the true total. Many of the fusions seem likely to be driver mutations. For example, PHF20L1, BCAS3, TAOK1, PCGF2, and TRPS1 are fused in other breast cancers. HOXB9 and PHF20L1 are members of gene families that are fused in other neoplasms. Several of the other genes are relevant to cancer—in addition to ERBB2, SKAP1 is an adaptor for Src, DEPTOR regulates the mTOR pathway and NRIP1 is an estrogen-receptor coregulator. Conclusions This is the first structural analysis of a breast cancer genome that combines classical molecular cytogenetic approaches with sequencing. Paired-end sequencing was able to detect almost all breakpoints, where there was adequate read depth. It supports

  1. The human glia maturation factor-gamma gene: genomic structure and mutation analysis in gliomas with chromosome 19q loss.

    Science.gov (United States)

    Peters, N; Smith, J S; Tachibana, I; Lee, H K; Pohl, U; Portier, B P; Louis, D N; Jenkins, R B

    1999-09-01

    Human glia maturation factor-gamma (hGMF-gamma) is a recently identified gene that may be involved in glial differentiation, neural regeneration, and inhibition of tumor cell proliferation. The gene maps to the long arm of chromosome 19 at band q13.2, a region that is frequently deleted in human malignant gliomas and is thus suspected to harbor a glioma tumor suppressor gene. Given the putative role of hGMF-gamma in cell differentiation and proliferation and its localization to chromosome 19q13, this gene is an interesting candidate for the chromosome 19q glioma tumor suppressor gene. To evaluate this possibility, we determined the genomic structure of human hGMF-gamma and performed mutation screening in a series of 41 gliomas with and without allelic loss of chromosome 19q. Mutations were not detected, which suggests that hGMF-gamma is not the chromosome 19q glioma suppressor gene. However, the elucidation of the genomic structure of hGMF-gamma may prove useful in future investigations of hGMF-gamma in the normal adult and developing human nervous system.

  2. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    Energy Technology Data Exchange (ETDEWEB)

    Tschaplinski, Timothy J [ORNL; Tsai, Chung-Jui [Michigan Technological University; Harding, Scott A [Michigan Technological University; Lindroth, richard L [University of Wisconsin, Madison; Yuan, Yinan [Michigan Technological University

    2006-01-01

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expanded hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.

  3. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    Science.gov (United States)

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.

  4. Genomic structure and mapping of precerebellin and a precerebellin-related gene.

    Science.gov (United States)

    Kavety, B; Jenkins, N A; Fletcher, C F; Copeland, N G; Morgan, J I

    1994-11-01

    The cerebellum-specific hexadecapeptide, cerebellin, is derived from a larger precursor, precerebellin, that has sequence homology to the complement component C1qB. We report the cloning of the murine homolog of precerebellin, Cbln1, and a closely related gene, Cbln2. Amino acid comparison of Cbln1 with Cbln2 revealed that Cbln2 is 88% identical to the carboxy terminal region of Cbln1. That these are independent genes was confirmed by Southern analysis and genome mapping. Cbln1 was positioned to the central region of mouse chromosome 8, 2.3 cM distal of JunB and 6.0 cM proximal of Mt1, while Cbln2 mapped to the distal end of mouse chromosome 18, 1.7 cM telomeric of Mbp.

  5. The chicken transforming growth factor-beta 3 gene: genomic structure, transcriptional analysis, and chromosomal location.

    Science.gov (United States)

    Burt, D W; Dey, B R; Paton, I R; Morrice, D R; Law, A S

    1995-02-01

    In this paper, we report the isolation, characterization, and mapping of the chicken transforming growth factor-beta 3 (TGF-beta 3) gene. The gene contains seven exons and six introns spanning 16-kb of the chicken genome. A comparison of the 5'-flanking regions of human and chicken TGF-beta 3 genes reveals two regions of sequence conservation. The first contains ATF/CRE and TBP/TATA sequence motifs within an 87-bp region. The second is a 162-bp region with no known sequence motifs. Identification of transcription start sites using chicken RNA isolated from various embryonic and adult tissues reveals two sites of initiation, P1 and P2, which map to these two conserved regions. Comparison of 3'-flanking regions of chicken and mammalian TGF-beta 3 genes also revealed conserved sequences. The most significant homologies were found in the 3'-most end of the transcribed region. DNA sequence analysis of chicken TGF-beta 3 cDNAs isolated by 3'-RACE revealed multiple polyadenylation sites unusually distant from a poly(A) signal motif. A Msc I restriction fragment length polymorphism (RFLP) marker was used to map the TGFB3 locus to linkage group E7 on the East Lansing reference backcross. Linkage to the TH locus showed that the TGFB3 locus was physically located on chicken chromosome 5.

  6. The genomic structure of the chicken ICSBP gene and its transcriptional regulation by chicken interferon.

    Science.gov (United States)

    Dosch, E; Zöller, B; Redmann-Müller, I; Nanda, I; Schmid, M; Viciano-Gofferge, A; Jungwirth, C

    1998-04-14

    The chicken interferon consensus sequence binding protein (ChICSBP) gene spans over 9 kb of DNA and consists, as its murine homolog, of nine exons. The first untranslated exon was identified by 5'-RACE technology. The second exon contains the translation initiation codon. Canonical consensus splice sites are found on every exon/intron junction. The introns are generally smaller than their mammalian counterparts. The ChICSBP and ChIRF-1 genes have been mapped by fluorescence in situ hybridization to different microchromosomes. The transcription start site has been mapped by primer extension. Inspection of the DNA sequence of a genomic clone containing the first exon and the region 1700-bp upstream revealed several potential cisregulatory elements of transcription. The ChICSBP mRNA is induced by recombinant ChIFN type I and ChIFN-gamma. A palindromic IFN regulatory element (pIRE) with high sequence homology to gamma activation site (GAS) sequences was functionally required in transient transfection assays for the induction of transcription by ChIFN-gamma.

  7. Genomic structure and marker-derived gene networks for growth and meat quality traits of Brazilian Nelore beef cattle.

    Science.gov (United States)

    Mudadu, Maurício A; Porto-Neto, Laercio R; Mokry, Fabiana B; Tizioto, Polyana C; Oliveira, Priscila S N; Tullio, Rymer R; Nassu, Renata T; Niciura, Simone C M; Tholon, Patrícia; Alencar, Maurício M; Higa, Roberto H; Rosa, Antônio N; Feijó, Gélson L D; Ferraz, André L J; Silva, Luiz O C; Medeiros, Sérgio R; Lanna, Dante P; Nascimento, Michele L; Chaves, Amália S; Souza, Andrea R D L; Packer, Irineu U; Torres, Roberto A A; Siqueira, Fabiane; Mourão, Gerson B; Coutinho, Luiz L; Reverter, Antonio; Regitano, Luciana C A

    2016-03-15

    Nelore is the major beef cattle breed in Brazil with more than 130 million heads. Genome-wide association studies (GWAS) are often used to associate markers and genomic regions to growth and meat quality traits that can be used to assist selection programs. An alternative methodology to traditional GWAS that involves the construction of gene network interactions, derived from results of several GWAS is the AWM (Association Weight Matrices)/PCIT (Partial Correlation and Information Theory). With the aim of evaluating the genetic architecture of Brazilian Nelore cattle, we used high-density SNP genotyping data (~770,000 SNP) from 780 Nelore animals comprising 34 half-sibling families derived from highly disseminated and unrelated sires from across Brazil. The AWM/PCIT methodology was employed to evaluate the genes that participate in a series of eight phenotypes related to growth and meat quality obtained from this Nelore sample. Our results indicate a lack of structuring between the individuals studied since principal component analyses were not able to differentiate families by its sires or by its ancestral lineages. The application of the AWM/PCIT methodology revealed a trio of transcription factors (comprising VDR, LHX9 and ZEB1) which in combination connected 66 genes through 359 edges and whose biological functions were inspected, some revealing to participate in biological growth processes in literature searches. The diversity of the Nelore sample studied is not high enough to differentiate among families neither by sires nor by using the available ancestral lineage information. The gene networks constructed from the AWM/PCIT methodology were a useful alternative in characterizing genes and gene networks that were allegedly influential in growth and meat quality traits in Nelore cattle.

  8. Genomic clones of Aspergillus nidulans containing alcA, the structural gene for alcohol dehydrogenase and alcR, a regulatory gene for ethanol metabolism.

    Science.gov (United States)

    Doy, C H; Pateman, J A; Olsen, J E; Kane, H J; Creaser, E H

    1985-04-01

    Our aim was to obtain from Aspergillus nidulans a genomic bank and then clone a region we expected from earlier genetic mapping to contain two closely linked genes, alcA, the structural gene for alcohol dehydrogenase (ADH) and alcR, a positive trans-acting regulatory gene for ethanol metabolism. The expression of alcA is repressed by carbon catabolites. A genomic restriction fragment characteristic of the alcA-alcR region was identified, cloned in pBR322, and used to select from a genomic bank in lambda EMBL3A three overlapping clones covering 24 kb of DNA. Southern genomic analysis of wild-type, alcA and alcR mutants showed that the mutants contained extra DNA at sites near the center of the cloned DNA and are close together, as expected for alcA and alcR. Transcription from the cloned DNA and hybridization with a clone carrying the Saccharomyces cerevisiae gene for ADHI (ADC1) are both confined to the alcA-alcR region. At least one of several species of mature mRNA is about 1 kb, the size required to code for ADH. For all species, carbon catabolite repression overrides control by induction. The overall characteristics of transcription, hybridization to ADC1 and earlier work suggest that alcA consists of a number of exons and/or that the alcA-alcR region represents a cluster of alcA-related genes or sequences.

  9. Gene finding in novel genomes

    Directory of Open Access Journals (Sweden)

    Korf Ian

    2004-05-01

    Full Text Available Abstract Background Computational gene prediction continues to be an important problem, especially for genomes with little experimental data. Results I introduce the SNAP gene finder which has been designed to be easily adaptable to a variety of genomes. In novel genomes without an appropriate gene finder, I demonstrate that employing a foreign gene finder can produce highly inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic neighbor. I find that foreign gene finders are more usefully employed to bootstrap parameter estimation and that the resulting parameters can be highly accurate. Conclusion Since gene prediction is sensitive to species-specific parameters, every genome needs a dedicated gene finder.

  10. Revised genomic structure of the human ghrelin gene and identification of novel exons, alternative splice variants and natural antisense transcripts

    Directory of Open Access Journals (Sweden)

    Herington Adrian C

    2007-08-01

    Full Text Available Abstract Background Ghrelin is a multifunctional peptide hormone expressed in a range of normal tissues and pathologies. It has been reported that the human ghrelin gene consists of five exons which span 5 kb of genomic DNA on chromosome 3 and includes a 20 bp non-coding first exon (20 bp exon 0. The availability of bioinformatic tools enabling comparative analysis and the finalisation of the human genome prompted us to re-examine the genomic structure of the ghrelin locus. Results We have demonstrated the presence of an additional novel exon (exon -1 and 5' extensions to exon 0 and 1 using comparative in silico analysis and have demonstrated their existence experimentally using RT-PCR and 5' RACE. A revised exon-intron structure demonstrates that the human ghrelin gene spans 7.2 kb and consists of six rather than five exons. Several ghrelin gene-derived splice forms were detected in a range of human tissues and cell lines. We have demonstrated ghrelin gene-derived mRNA transcripts that do not code for ghrelin, but instead may encode the C-terminal region of full-length preproghrelin (C-ghrelin, which contains the coding region for obestatin and a transcript encoding obestatin-only. Splice variants that differed in their 5' untranslated regions were also found, suggesting a role of these regions in the post-transcriptional regulation of preproghrelin translation. Finally, several natural antisense transcripts, termed ghrelinOS (ghrelin opposite strand transcripts, were demonstrated via orientation-specific RT-PCR, 5' RACE and in silico analysis of ESTs and cloned amplicons. Conclusion The sense and antisense alternative transcripts demonstrated in this study may function as non-coding regulatory RNA, or code for novel protein isoforms. This is the first demonstration of putative obestatin and C-ghrelin specific transcripts and these findings suggest that these ghrelin gene-derived peptides may also be produced independently of preproghrelin

  11. Chloroplast Genome Sequence of the Moss Tortula ruralis: Gene Content and Structural Arrangement Relative to Other Green Plant Chloroplast Genomes

    Science.gov (United States)

    Tortula ruralis, a widely distributed moss species in the family Pottiaceae, is increasingly being used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of Tortula ruralis, only the second publishe...

  12. Genomic survey, gene expression analysis and structural modeling suggest diverse roles of DNA methyltransferases in legumes.

    Directory of Open Access Journals (Sweden)

    Rohini Garg

    Full Text Available DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases, namely Methyltransferase (MET, Chromomethylase (CMT and Domains Rearranged Methyltransferase (DRM, which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2 subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes.

  13. Genomic survey, gene expression analysis and structural modeling suggest diverse roles of DNA methyltransferases in legumes.

    Science.gov (United States)

    Garg, Rohini; Kumari, Romika; Tiwari, Sneha; Goyal, Shweta

    2014-01-01

    DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases), namely Methyltransferase (MET), Chromomethylase (CMT) and Domains Rearranged Methyltransferase (DRM), which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2) subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA) MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes.

  14. Genomic organization of the structural genes controlling the astaxanthin biosynthesis pathway of Xanthophyllomyces dendrorhous.

    Science.gov (United States)

    Niklitschek, Mauricio; Alcaíno, Jennifer; Barahona, Salvador; Sepúlveda, Dionisia; Lozano, Carla; Carmona, Marisela; Marcoleta, Andrés; Martínez, Claudio; Lodato, Patricia; Baeza, Marcelo; Cifuentes, Víctor

    2008-01-01

    The cloning and nucleotide sequence of the genes (idi, crtE, crtYB, crtl and crtS) controlling the astaxanthin biosynthesis pathway of the wild-type ATCC 24230 strain of Xanthophyllomyces dendrorhous in their genomic and cDNA version were obtained. The idi, crtE, crtYB, crtl and crtS genes were cloned, as fragments of 10.9, 11.5, 15.8, 5.9 and 4 kb respectively. The nucleotide sequence data analysis indicates that the idi, crtE, crtYB, crtl and crtS genes have 4, 8,4, 11, and 17 introns and 5, 9, 5, 12 and 18 exons respectively. In addition, a highly efficient site-directed mutagenesis system was developed by transformation by integration, followed by mitotic recombination (the double recombinant method). Heterozygote idi (idi+/idi-::hph), crtE (crtE+/crtE-::hph), crtYB (crtYB+/crtYB-::hph), crtI (crtI+/crtI-::hph) and crtS (crtS+/crtS-::hph) and homozygote mutants crtYB (crtYB-::hph/crtYB-::hph), crtI (crtI-::hph/crtI-::hph) and crtS (crtS-::hph/crtS-::hph) were constructed. All the heterozygote mutants have a pale phenotype and produce less carotenoids than the wild-type strain. The genetic analysis of the crtYB, crtl and crtS loci in the wild-type, heterozygote, and homozygote give evidence of the diploid constitution of ATCC 24230 strains. In addition, the cloning of a truncated form of the crtYB that lacks 153 amino acids of the N-terminal region derived from alternatively spliced mRNA was obtained. Their heterologous expression in Escherichia coli carrying the carotenogenic cluster of Erwinia uredovora result in trans-complementation and give evidence of its functionality in this bacterium, maintaining its phytoene synthase activity but not the lycopene cyclase activity.

  15. The structure of HIV-1 genomic RNA in the gp120 gene determines a recombination hot spot in vivo.

    Science.gov (United States)

    Galetto, Román; Moumen, Abdeladim; Giacomoni, Véronique; Véron, Michel; Charneau, Pierre; Negroni, Matteo

    2004-08-27

    By frequently rearranging large regions of the genome, genetic recombination is a major determinant in the plasticity of the human immunodeficiency virus type I (HIV-1) population. In retroviruses, recombination mostly occurs by template switching during reverse transcription. The generation of retroviral vectors provides a means to study this process after a single cycle of infection of cells in culture. Using HIV-1-derived vectors, we present here the first characterization and estimate of the strength of a recombination hot spot in HIV-1 in vivo. In the hot spot region, located within the C2 portion of the gp120 envelope gene, the rate of recombination is up to ten times higher than in the surrounding regions. The hot region corresponds to a previously identified RNA hairpin structure. Although recombination breakpoints in vivo cluster in the top portion of the hairpin, the bias for template switching in this same region appears less marked in a cell-free system. By modulating the stability of this hairpin we were able to affect the local recombination rate both in vitro and in infected cells, indicating that the local folding of the genomic RNA is a major parameter in the recombination process. This characterization of reverse transcription products generated after a single cycle of infection provides insights in the understanding of the mechanism of recombination in vivo and suggests that specific regions of the genome might be prompted to yield different rates of evolution due to the presence of circumscribed recombination hot spots.

  16. Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns

    Directory of Open Access Journals (Sweden)

    Landweber Laura F

    2008-12-01

    Full Text Available Abstract Background Nyctotherus ovalis is a single-celled eukaryote that has hydrogen-producing mitochondria and lives in the hindgut of cockroaches. Like all members of the ciliate taxon, it has two types of nuclei, a micronucleus and a macronucleus. N. ovalis generates its macronuclear chromosomes by forming polytene chromosomes that subsequently develop into macronuclear chromosomes by DNA elimination and rearrangement. Results We examined the structure of these gene-sized macronuclear chromosomes in N. ovalis. We determined the telomeres, subtelomeric regions, UTRs, coding regions and introns by sequencing a large set of macronuclear DNA sequences (4,242 and cDNAs (5,484 and comparing them with each other. The telomeres consist of repeats CCC(AAAACCCCn, similar to those in spirotrichous ciliates such as Euplotes, Sterkiella (Oxytricha and Stylonychia. Per sequenced chromosome we found evidence for either a single protein-coding gene, a single tRNA, or the complete ribosomal RNAs cluster. Hence the chromosomes appear to encode single transcripts. In the short subtelomeric regions we identified a few overrepresented motifs that could be involved in gene regulation, but there is no consensus polyadenylation site. The introns are short (21–29 nucleotides, and a significant fraction (1/3 of the tiny introns is conserved in the distantly related ciliate Paramecium tetraurelia. As has been observed in P. tetraurelia, the N. ovalis introns tend to contain in-frame stop codons or have a length that is not dividable by three. This pattern causes premature termination of mRNA translation in the event of intron retention, and potentially degradation of unspliced mRNAs by the nonsense-mediated mRNA decay pathway. Conclusion The combination of short leaders, tiny introns and single genes leads to very minimal macronuclear chromosomes. The smallest we identified contained only 150 nucleotides.

  17. Clustering of gene ontology terms in genomes.

    Science.gov (United States)

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  18. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    Science.gov (United States)

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  19. Chloroplast genome sequence of the moss Tortula ruralis: gene content, polymorphism, and structural arrangement relative to other green plant chloroplast genomes

    Directory of Open Access Journals (Sweden)

    Wolf Paul G

    2010-02-01

    Full Text Available Abstract Background Tortula ruralis, a widely distributed species in the moss family Pottiaceae, is increasingly used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of T. ruralis, only the second published chloroplast genome for a moss, and the first for a vegetatively desiccation-tolerant plant. Results The Tortula chloroplast genome is ~123,500 bp, and differs in a number of ways from that of Physcomitrella patens, the first published moss chloroplast genome. For example, Tortula lacks the ~71 kb inversion found in the large single copy region of the Physcomitrella genome and other members of the Funariales. Also, the Tortula chloroplast genome lacks petN, a gene found in all known land plant plastid genomes. In addition, an unusual case of nucleotide polymorphism was discovered. Conclusions Although the chloroplast genome of Tortula ruralis differs from that of the only other sequenced moss, Physcomitrella patens, we have yet to determine the biological significance of the differences. The polymorphisms we have uncovered in the sequencing of the genome offer a rare possibility (for mosses of the generation of DNA markers for fine-level phylogenetic studies, or to investigate individual variation within populations.

  20. Sequencing and analysis of the prolate-headed lactococcal bacteriophage c2 genome and identification of the structural genes.

    Science.gov (United States)

    Lubbers, M W; Waterfield, N R; Beresford, T P; Le Page, R W; Jarvis, A W

    1995-12-01

    The 22,163-bp genome of the lactococcal prolate-headed phage c2 was sequenced. Thirty-nine open reading frames (ORFs), early and late promoters, and a putative transcription terminator were identified. Twenty-two ORFs were in the early gene region, and 17 were in the late gene region. Putative genes for a DNA polymerase, a recombination protein, a sigma factor protein, a transcription regulatory protein, holin proteins, and a terminase were identified. Transcription of the early and late genes proceeded divergently from a noncoding 611-bp region. A 521-bp fragment contained within the 611-bp intergenic region could act as an origin of replication in Lactococcus lactis. Three major structural proteins, with sizes of 175, 90, and 29 kDa, and eight minor proteins, with sizes of 143, 82, 66, 60, 44, 42, 32, and 28 kDa, were identified. Several of these proteins appeared to be posttranslationally modified by proteolytic cleavage. The 175- and 90-kDa proteins were identified as the major phage head proteins, and the 29- and 60-kDa proteins were identified as the major tail protein and (possibly) the tail adsorption protein, respectively. The head proteins appeared to be covalently linked multimers of the same 30-kDa gene product. Phage c2 and prolate-headed lactococcal phage bIL67 (C. Schouler, S. D. Ehrlich, and M.-C. Chopin, Microbiology 140:3061-3069, 1994) shared 80% nucleotide sequence identity. However, several DNA deletions or insertions which corresponded to the loss or acquisition of specific ORFs, respectively, were noted. The identification of direct nucleotide repeats flanking these sequences indicated that recombination may be important in the evolution of these phages.(ABSTRACT TRUNCATED AT 250 WORDS)

  1. Complete structure, genomic organization, and expression of channel catfish (Ictalurus punctatus, Rafinesque 1818) matrix metalloproteinase-9 gene.

    Science.gov (United States)

    Yeh, Hung-Yueh; Klesius, Phillip H

    2008-03-01

    In this study, the channel catfish (CC) matrix metalloproteinase-9 (MMP-9) gene was cloned, sequenced, and characterized at both the cDNA and the genomic DNA levels. The complete sequence of the CC MMP-9 cDNA consisted of 2,551 nucleotides, including one open reading frame and 5'- and 3'-end untranslated regions. The open reading frame potentially encoded a 686-amino-acid peptide with a calculated molecular mass (without glycosylation) of approximately 77.4 kDa, which included a signal peptide and potentially heavy O-glycosylation sites. CC MMP-9 did not have the tripeptide Arg-Gly-Asp motif. The degree of conservation of the CC MMP-9 amino acid sequence to human and mouse counterparts was 55%, while to those of other fish species was 67-74%. The full-length CC MMP-9 genomic DNA comprised 5,663 nucleotides, much shorter than human or mouse counterparts. The exon-intron structure followed the splice acceptor/donor consensus rule, and the sequence contained 13 exons. The MMP-9 transcript was constitutively expressed in restrictive CC tissues. This result should provide fundamental information for further exploration of the role of MMP-9 in fish pathophysiology.

  2. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols.

    Science.gov (United States)

    Martínez-García, Pedro J; Crepeau, Marc W; Puiu, Daniela; Gonzalez-Ibeas, Daniel; Whalen, Jeanne; Stevens, Kristian A; Paul, Robin; Butterfield, Timothy S; Britton, Monica T; Reagan, Russell L; Chakraborty, Sandeep; Walawage, Sriema L; Vasquez-Gross, Hans A; Cardeno, Charis; Famula, Randi A; Pratt, Kevin; Kuruganti, Sowmya; Aradhya, Mallikarjuna K; Leslie, Charles A; Dandekar, Abhaya M; Salzberg, Steven L; Wegrzyn, Jill L; Langley, Charles H; Neale, David B

    2016-09-01

    The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-β-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-β-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.

  3. Brief Guide to Genomics: DNA, Genes and Genomes

    Science.gov (United States)

    ... Breve guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is ... genetic basis for health and disease. Implications of Genomics for Medical Science Virtually every human ailment has ...

  4. Comparative genomic analysis of soybean flowering genes.

    Directory of Open Access Journals (Sweden)

    Chol-Hee Jung

    Full Text Available Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant

  5. Population structure and comparative genome hybridization of European flor yeast reveal a unique group of Saccharomyces cerevisiae strains with few gene duplications in their genome.

    Science.gov (United States)

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation.

  6. Population structure and comparative genome hybridization of European flor yeast reveal a unique group of Saccharomyces cerevisiae strains with few gene duplications in their genome.

    Directory of Open Access Journals (Sweden)

    Jean-Luc Legras

    Full Text Available Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation.

  7. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    Science.gov (United States)

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing.

  8. Genomics technologies to study structural variations in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Cardone Maria Francesca

    2016-01-01

    Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.

  9. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity.

    Science.gov (United States)

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation-induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome.

  10. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    Directory of Open Access Journals (Sweden)

    Arce-Johnson Patricio

    2008-07-01

    Full Text Available Abstract Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions.

  11. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    Science.gov (United States)

    Benítez-Páez, Alfonso; Moreno, F. Javier; Sanz, María L.; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  12. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    Directory of Open Access Journals (Sweden)

    Alfonso eBenítez-Páez

    2016-04-01

    Full Text Available Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome.

  13. Characterization of gene rearrangements resulted from genomic structural aberrations in human esophageal squamous cell carcinoma KYSE150 cells.

    Science.gov (United States)

    Hao, Jia-Jie; Gong, Ting; Zhang, Yu; Shi, Zhi-Zhou; Xu, Xin; Dong, Jin-Tang; Zhan, Qi-Min; Fu, Song-Bin; Wang, Ming-Rong

    2013-01-15

    Chromosomal rearrangements and involved genes have been reported to play important roles in the development and progression of human malignancies. But the gene rearrangements in esophageal squamous cell carcinoma (ESCC) remain to be identified. In the present study, array-based comparative genomic hybridization (array-CGH) was performed on the ESCC cell line KYSE150. Eight disrupted genes were detected according to the obviously distinct unbalanced breakpoints. The splitting of these genes was validated by dual-color fluorescence in-situ hybridization (FISH). By using rapid amplification of cDNA ends (RACE), genome walking and sequencing analysis, we further identified gene disruptions and rearrangements. A fusion transcript DTL-1q42.2 was derived from an intrachromosomal rearrangement of chromosome 1. Highly amplified segments of DTL and PTPRD were self-rearranged. The sequences on either side of the junctions possess micro-homology with each other. FISH results indicated that the split DTL and PTPRD were also involved in comprising parts of the derivative chromosomes resulted from t(1q;9p;12p) and t(9;1;9). Further, we found that regions harboring DTL (1q32.3) and PTPRD (9p23) were also splitting in ESCC tumors. The data supplement significant information on the existing genetic background of KYSE150, which may be used as a model for studying these gene rearrangements.

  14. Genomic structure analysis of SNC6, a progesterone-receptor associated protein gene, and cloning and characterization of its 5'-flanking region . 

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Objective: To analyze the genomic structure of SNC6, a progesterone-receptor associated protein gene and its regulatory elements in its 5'-flanking region. Methods: Genomic sequence from GenBank database (accession number: Z98048) covering the whole SNC6 gene was used to analyze the genomic structure of SNC6 and design primers for PCR amplification of its 5'-flanking region. A 1894 bp fragment of the 5'-flanking region (-1814 to +75) was cloned by PCR using genomic DNA from a healthy donor peripheral blood lymphocyte as template. This fragment, as well as 3 shorter derivative fragments (1423 bp, 632 bp and 416 bp, which correspond to -1344 to +75, -552 to +75 and -337 to +75 respectively), were subcloned into pGL2 series luciferase reporter vectors. These constructs were introduced into colorectal cancer cell line SW620 for transient expression of reporter gene and luciferase activities were measured. Results: The genomic structure analysis showed there are 12 exons for SNC6 gene, which spans 32017 bp (nt71529 to nt39513 in Z98048 sequence). All transfected SW620 cells with the above 5-flanking region-containing constructs showed luciferase activities. The highest luciferase activities were measured in transfected cells with vectors containing 1894 bp fragments, and the lowest luciferase activities were measured in transfected cells with vectors containing 416 bp fragments. Luciferase activities were higher in transfected cells with vectors containing 632 bp fragments than that in transfected cells with vectors containing 1423 bp fragments. Conclusion: The basic transcription-promoting element (promoter) for SNC6 expression resides between 0 to -337, and two transcription-enhancing elements (enhancer) resides between -337 to -552 and -1344 to -1814, whereas one transcription-inhibiting element (silencer) exists between -552 to -1344.

  15. Evolutionary origin of Rosaceae-specific active non-autonomous hAT elements and their contribution to gene regulation and genomic structural variation.

    Science.gov (United States)

    Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng

    2016-05-01

    Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus.

  16. From the genome to the phenome and back: linking genes with human brain function and structure using genetically informed neuroimaging

    DEFF Research Database (Denmark)

    Siebner, H R; Callicott, J H; Sommer, T

    2009-01-01

    In recent years, an array of brain mapping techniques has been successfully employed to link individual differences in circuit function or structure in the living human brain with individual variations in the human genome. Several proof-of-principle studies provided converging evidence that brain...

  17. Weeding out the genes: the Arabidopsis genome project.

    Science.gov (United States)

    Martienssen, R A

    2000-05-01

    The Arabidopsis genome sequence is scheduled for completion at the end of this year (December 2000). It will be the first higher plant genome to be sequenced, and will allow a detailed comparison with bacterial, yeast and animal genomes. Already, two of the five chromosomes have been sequenced, and we have had our first glimpse of higher eukaryotic centromeres, and the structure of heterochromatin. The implications for understanding plant gene function, genome structure and genome organization are profound. In this review, the lessons learned for future genome projects are reviewed as well as a summary of the initial findings in Arabidopsis.

  18. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context...

  19. Heat Shock Protein 70 and 90 Genes in the Harmful Dinoflagellate Cochlodinium polykrikoides: Genomic Structures and Transcriptional Responses to Environmental Stresses

    Directory of Open Access Journals (Sweden)

    Ruoyu Guo

    2015-01-01

    Full Text Available The marine dinoflagellate Cochlodinium polykrikoides is responsible for harmful algal blooms in aquatic environments and has spread into the world’s oceans. As a microeukaryote, it seems to have distinct genomic characteristics, like gene structure and regulation. In the present study, we characterized heat shock protein (HSP 70/90 of C. polykrikoides and evaluated their transcriptional responses to environmental stresses. Both HSPs contained the conserved motif patterns, showing the highest homology with those of other dinoflagellates. Genomic analysis showed that the CpHSP70 had no intron but was encoded by tandem arrangement manner with separation of intergenic spacers. However, CpHSP90 had one intron in the coding genomic regions, and no intergenic region was found. Phylogenetic analyses of separate HSPs showed that CpHSP70 was closely related with the dinoflagellate Crypthecodinium cohnii and CpHSP90 with other Gymnodiniales in dinoflagellates. Gene expression analyses showed that both HSP genes were upregulated by the treatments of separate algicides CuSO4 and NaOCl; however, they displayed downregulation pattern with PCB treatment. The transcription of CpHSP90 and CpHSP70 showed similar expression patterns under the same toxicant treatment, suggesting that both genes might have cooperative functions for the toxicant induced gene regulation in the dinoflagellate.

  20. Characterization of promoter region and genomic structure of the murine and human genes encoding Src like adapter protein.

    Science.gov (United States)

    Kratchmarova, I; Sosinowski, T; Weiss, A; Witter, K; Vincenz, C; Pandey, A

    2001-01-10

    Src-like adapter protein (SLAP) was identified as a signaling molecule in a yeast two-hybrid system using the cytoplasmic domain of EphA2, a receptor protein tyrosine kinase (Pandey et al., 1995. Characterization of a novel Src-like adapter protein that associates with the Eck receptor tyrosine kinase. J. Biol. Chem. 270, 19201-19204). It is very similar to members of the Src family of cytoplasmic tyrosine kinases in that it contains very homologous SH3 and SH2 domains (Abram and Courtneidge, 2000. Src family tyrosine kinases and growth factor signaling. Exp. Cell. Res. 254, 1-13.). However, instead of a kinase domain at the C-terminus, it contains a unique C-terminal region. In order to exclude the possibility that an alternative form exists, we have isolated genomic clones containing the murine Slap gene as well as the human SLA gene. The coding regions of murine Slap and human SLA genes contain seven exons and six introns. Absence of any kinase domain in the genomic region confirm its designation as an adapter protein. Additionally, we have cloned and sequenced approximately 2.6 kb of the region 5' to the initiator methionine of the murine Slap gene. When subcloned upstream of a luciferase gene, this fragment increased the transcriptional activity about 6-fold in a human Jurkat T cell line and approximately 52-fold in a murine T cell line indicating that this region contains promoter elements that dictate SLAP expression. We have also cloned the promoter region of the human SLA gene. Since SLAP is transcriptionally regulated by retinoic acid and by activation of B cells, the cloning of its promoter region will permit a detailed analysis of the elements required for its transcriptional regulation.

  1. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  2. Genomic structure and sequence polymorphism of E,E-alphafarnesene synthase gene in apples (Malus domestica Borkh.)

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Primer pairs were designed to amplify the genomic DNA sequence of the alpha-farnesene synthase (AFS) gene by PCR.The PCR products were sequenced,spliced and compared to Cdna sequences in the GenBank (accession No.AY182241).The genomic sequence and intron-exon organization of the AFS gene were thus obtained.The AFS genomic sequence has been registered in the GenBank (accession No.DQ901739).It has 6 introns and 7 exons,encoding a protein of 576 amino acids.The sizes of the 6 introns were 108 bp,113 bp,>1000 bp,125 bp,220 bp and 88 bp,and their phases were 0,1,2,2,0,0,respectively.The sizes of the deduced amino acids of the 7 exons were 57,89,127,73,48,83 and 99,respectively.The AFS protein contained three motifs:the RR(X8)W motif encoded by a sequence in exon 1,and the RxR motif and DDxxD motif encoded by two sequences in exon 4.After comparing the AFS genomic sequence (accession No.DQ901739) to the Cdna sequence (accession No.AY523409) in the GenBank,it was found that there were 6 single-nucleotide polymorphisms between the two sequences,four of which caused mutations at the amino acid level.Interestingly,one amino acid mutation (291R→G) was found in the RxR motif,and further investigation is needed to determine whether the alpha-farnesene synthesis ability and superficial scald susceptibility of apples are influenced by this amino acid mutation and other mutations.

  3. Genomic structure and expression analysis of the RNase kappa family ortholog gene in the insect Ceratitis capitata.

    Science.gov (United States)

    Rampias, Theodoros N; Fragoulis, Emmanuel G; Sideris, Diamantis C

    2008-12-01

    Cc RNase is the founding member of the recently identified RNase kappa family, which is represented by a single ortholog in a wide range of animal taxonomic groups. Although the precise biological role of this protein is still unknown, it has been shown that the recombinant proteins isolated so far from the insect Ceratitis capitata and from human exhibit ribonucleolytic activity. In this work, we report the genomic organization and molecular evolution of the RNase kappa gene from various animal species, as well as expression analysis of the ortholog gene in C. capitata. The high degree of amino acid sequence similarity, in combination with the fact that exon sizes and intronic positions are extremely conserved among RNase kappa orthologs in 15 diverse genomes from sea anemone to human, imply a very significant biological function for this enzyme. In C. capitata, two forms of RNase kappa mRNA (0.9 and 1.5 kb) with various lengths of 3' UTR were identified as alternative products of a single gene, resulting from the use of different polyadenylation signals. Both transcripts are expressed in all insect tissues and developmental stages. Sequence analysis of the extended region of the longer transcript revealed the existence of three mRNA instability motifs (AUUUA) and five poly(U) tracts, whose functional importance in RNase kappa mRNA decay remains to be explored.

  4. Practical applications of structural genomics technologies for mutagen research.

    Science.gov (United States)

    Zemla, Adam; Segelke, Brent W

    2011-06-17

    Here we present a perspective on a range of practical uses of structural genomics for mutagen research. Structural genomics is an overloaded term and requires some definition to bound the discussion; we give a brief description of public and private structural genomics endeavors, along with some of their objectives, their activities, their capabilities, and their limitations. We discuss how structural genomics might impact mutagen research in three different scenarios: at a structural genomics center, at a lab with modest resources that also conducts structural biology research, and at a lab that is conducting mutagen research without in-house experimental structural biology. Applications span functional annotation of single genes or SNP, to constructing gene networks and pathways, to an integrated systems biology approach. Structural genomics centers can take advantage of systems biology models to target high value targets for structure determination and in turn extend systems models to better understand systems biology diseases or phenomenon. Individual investigator run structural biology laboratories can collaborate with structural genomics centers, but can also take advantage of technical advances and tools developed by structural genomics centers and can employ a structural genomics approach to advancing biological understanding. Individual investigator-run non-structural biology laboratories can also collaborate with structural genomics centers, possibly influencing targeting decisions, but can also use structure based annotation tools enabled by the growing coverage of protein fold space provided by structural genomics. Better functional annotation can inform pathway and systems biology models.

  5. Structural variations in pig genomes

    NARCIS (Netherlands)

    Paudel, Y.

    2015-01-01

    Abstract Paudel, Y. (2015). Structural variations in pig genomes. PhD thesis, Wageningen University, the Netherlands Structural variations are chromosomal rearrangements such as insertions-deletions (INDELs), duplications, inversions, translocations, and copy number variations (CNVs

  6. Comparative genomic analysis of eutherian kallikrein genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2017-03-01

    Full Text Available The present study made attempts to update and revise eutherian kallikrein genes implicated in major physiological and pathological processes and in medical molecular diagnostics. Using eutherian comparative genomic analysis protocol and free available genomic sequence assemblies, the tests of reliability of eutherian public genomic sequences annotated most comprehensive curated third party data gene data set of eutherian kallikrein genes including 121 complete coding sequences among 335 potential coding sequences. The present analysis first described 13 major gene clusters of eutherian kallikrein genes, and explained their differential gene expansion patterns. One updated classification and nomenclature of eutherian kallikrein genes was proposed, as new framework of future experiments.

  7. Disturbances in metabolic, transport and structural genes in experimental colonic inflammation in the rat: a longitudinal genomic analysis

    Directory of Open Access Journals (Sweden)

    Suárez María

    2008-10-01

    Full Text Available Abstract Background Trinitrobenzenesulphonic acid (TNBS induced rat colitis is one of the most widely used models of inflammatory bowel disease (IBD, a condition whose aetiology and pathophysiology are incompletely understood. We have characterized this model at the genomic level using a longitudinal approach. Six control rats were compared with colitic animals at 2, 5, 7 and 14 days after TNBS administration (n = 3. The Affymetrix Rat Expression Array 230 2.0 system was used. Results TNBS-induced colitis had a profound impact on the gene expression profile, which was maximal 5 and 7 days post-induction. Most genes were affected at more than one time point. They were related to a number of biological functions, not only inflammation/immunity but also transport, metabolism, signal transduction, tissue remodeling and angiogenesis. Gene changes generally correlated with the severity of colitis. The results were successfully validated in a subset of genes by real-time PCR. Conclusion The TNBS model of rat colitis has been described in detail at the transcriptome level. The changes observed correlate with pathophysiological disturbances such as tissue remodelling and alterations in ion transport, which are characteristic of both this model and IBD.

  8. Genomic structure and characterization of the Drosophila S3 ribosomal/DNA repair gene and mutant alleles.

    Science.gov (United States)

    Kelley, M R; Xu, Y; Wilson, D M; Deutsch, W A

    2000-03-01

    The Drosophila S3 protein is known to be associated with ribosomes, where it is thought to play a role in the initiation of protein translation. The S3 protein also contains a DNA repair activity, efficiently processing 8-oxoguanine residues in DNA via an N-glycosylase/apurinic-apyrimidinic (AP) lyase activity. The gene that encodes S3 has previously been localized to one of the Minute loci on chromosome 3 in Drosophila. This study focused on the genomic organization of S3 at M(3)95A, initial promoter characterization, and analysis of three mutant alleles at this locus. The S3 gene was found to be a single-copy gene 2 to 3 kb in length and containing a single intron. The upstream 1.6-kb region was analyzed for promoter activity, identifying a presumptive regulatory domain containing potential enhancer and suppressor elements. This finding is of interest, as the S3 gene is constitutively expressed throughout development and mRNA is most likely maternally inherited. Lastly, three Minute alleles from the same locus were sequenced and two alleles found to contain a 22-bp deletion in exon 2, resulting in a truncated S3 protein, although wildtype levels of S3 mRNA and protein were detected in the viable heterozygous Minute alleles, possibly reflecting dosage compensation.

  9. Human retina-specific amine oxidase: genomic structure of the gene (AOC2), alternatively spliced variant, and mRNA expression in retina.

    Science.gov (United States)

    Imamura, Y; Noda, S; Mashima, Y; Kudoh, J; Oguchi, Y; Shimizu, N

    1998-07-15

    Previously, we reported the isolation of cDNA for human retina-specific amine oxidase (RAO) and the expression of RAO exclusively in retina. Bacterial artificial chromosome clones containing the human RAO gene (AOC2) were mapped to human chromosome 17q21 (Imamura et al., 1997, Genomics 40: 277-283). Here, we report the complete genomic structure of the RAO gene, including 5' flanking sequence, and mRNA expression in retina. The human RAO gene spans 6 kb and is composed of four exons corresponding to the amino acid sequence 1-530, 530-598, 598-641, and 642-729 separated by three introns of 3000, 310, and 351 bp. Screening of a human retina cDNA library revealed the existence of an alternatively spliced cDNA variant with an additional 81 bp at the end of exon 2. The sizes of exons and the locations of exon/intron boundaries in the human RAO gene showed remarkable similarity to those of the human kidney diamine oxidase gene (AOC1). In situ hybridization revealed that mRNA coding for RAO is expressed preferentially in the ganglion cell layer of the mouse retina. We designed four sets of PCR primers to amplify four exons, which will be valuable for analyzing mutations in patients with ocular diseases affecting the retinal ganglion cell layer.

  10. Uses of antimicrobial genes from microbial genome

    Science.gov (United States)

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  11. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L. genome

    Directory of Open Access Journals (Sweden)

    Łucja ePrzysiecka

    2015-04-01

    Full Text Available Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI, a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL, and fatty acid-binding (FAP proteins. Here, two Lupinus angustifolius (narrow-leafed lupin CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1 main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis

  12. Informational laws of genome structures

    Science.gov (United States)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  13. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    Science.gov (United States)

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  14. Genomics of local adaptation with gene flow.

    Science.gov (United States)

    Tigano, Anna; Friesen, Vicki L

    2016-05-01

    Gene flow is a fundamental evolutionary force in adaptation that is especially important to understand as humans are rapidly changing both the natural environment and natural levels of gene flow. Theory proposes a multifaceted role for gene flow in adaptation, but it focuses mainly on the disruptive effect that gene flow has on adaptation when selection is not strong enough to prevent the loss of locally adapted alleles. The role of gene flow in adaptation is now better understood due to the recent development of both genomic models of adaptive evolution and genomic techniques, which both point to the importance of genetic architecture in the origin and maintenance of adaptation with gene flow. In this review, we discuss three main topics on the genomics of adaptation with gene flow. First, we investigate selection on migration and gene flow. Second, we discuss the three potential sources of adaptive variation in relation to the role of gene flow in the origin of adaptation. Third, we explain how local adaptation is maintained despite gene flow: we provide a synthesis of recent genomic models of adaptation, discuss the genomic mechanisms and review empirical studies on the genomics of adaptation with gene flow. Despite predictions on the disruptive effect of gene flow in adaptation, an increasing number of studies show that gene flow can promote adaptation, that local adaptations can be maintained despite high gene flow, and that genetic architecture plays a fundamental role in the origin and maintenance of local adaptation with gene flow.

  15. [Integration of different T-DNA structures of ACC oxidase gene into carnation genome extended cut flower vase-life differently].

    Science.gov (United States)

    Yu, Yi-Xun; Bao, Man-Zhu

    2004-09-01

    The cultivar 'Master' of carnation (Dianthus caryophyllus L.) was transformed with four T-DNA structures containing sense, antisense, sense direct repeat and antisense direct repeat gene of ACC oxidase mediated by Agrobacterium tumefaciens. Southern blotting detection showed that foreign gene was integrated into the carnation genome and 14 transgenic lines were obtained. The transgenic plants were transplanted to soil and grew normally in greenhouse. Of the 12 transgenic lines screened, the cut flower vase life of 8 transgenic lines is up to 11 days and the longest one is 12.8 days while the vase life of the control is 5.8 days under 25 degrees C. The vase life of 2 lines out of 3 with single sense ACO gene is same as that of the control, while the vase life of 3 lines out of 4 with single antisense ACO gene is prolonged. The vase life of cut flowers of 5 lines with direct repeat ACO genes is all prolonged by about 6 days, while the vase life of 3 out of 7 lines with single ACO gene is same as that of the control. During the senescence of cut flowers, the ethylene production of the most of the transgenic lines decreased significantly, and the production of ethylene is not detectable in lines T456, T556 and T575. The results of the research demonstrate that antisense foreign gene inhibits expression of endogenesis gene more significantly than sense one. Both sense direct repeat and antisense direct repeat foreign genes can suppress endogenous gene expression more significantly comparing to single foreign genes. The transgenic lines obtained from this research are useful to minimize carnation cut flower transportation and storage expenses.

  16. Identification and Categorization of Horizontally Transferred Genes in Prokaryotic Genomes

    Institute of Scientific and Technical Information of China (English)

    Shuo-Yong SHI; Xiao-Hui CAI; Da-fu DING

    2005-01-01

    Horizontal gene transfer (HGT), a process through which genomes acquire genetic materials from distantly related organisms, is believed to be one of the major forces in prokaryotic genome evolution.However, systematic investigation is still scarce to clarify two basic issues about HGT: (1) what types of genes are transferred; and (2) what influence HGT events over the organization and evolution of biological pathways. Genome-scale investigations of these two issues will advance the systematical understanding of HGT in the context of prokaryotic genome evolution. Having investigated 82 genomes, we constructed an HGT database across broad evolutionary timescales. We identified four function categories containing a high proportion of horizontally transferred genes: cell envelope, energy metabolism, regulatory functions, and transport/binding proteins. Such biased function distribution indicates that HGT is not completely random;instead, it is under high selective pressure, required by function restraints in organisms. Furthermore, we mapped the transferred genes onto the connectivity structure map of organism-specific pathways listed in Kyoto Encyclopedia of Genes and Genomes (KEGG). Our results suggest that recruitment of transferred genes into pathways is also selectively constrained because of the tuned interaction between original pathway members. Pathway organization structures still conserve well through evolution even with the recruitment of horizontally transferred genes. Interestingly, in pathways whose organization were significantly affected by HGT events, the operon-like arrangement of transferred genes was found to be prevalent. Such results suggest that operon plays an essential and directional role in the integration of alien genes into pathways.

  17. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    Science.gov (United States)

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  18. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    Directory of Open Access Journals (Sweden)

    Uchiyama Ikuo

    2008-10-01

    Full Text Available Abstract Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  19. Evolution of the P-type II ATPase gene family in the fungi and presence of structural genomic changes among isolates of Glomus intraradices

    Directory of Open Access Journals (Sweden)

    Sanders Ian R

    2006-03-01

    that structural genomic changes, such as exonic indel mutations and gene duplications are less rare than previously thought and that these also occur within fungal populations.

  20. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  1. [Determination and analysis of the primary structure of a genomic sequence adjacent to the 3'-end of the human tissue plasminogen activator gene].

    Science.gov (United States)

    Sarafanov, A G; Timofeeva, M Ia; Bannikov, V M; Zakhar'ev, V M; Mamaeva, O K; Tikhomirova, T I; Baev, A A

    1995-01-01

    Primary structure was determined for the recently cloned f1/BglII-fragment [19] containing 2102 b.p. of the human tissue plasminogen activator (tPA) gene 3' end and adjacent DNA region. Computer analysis has revealed an Alu-repeat 820 b.p. downstream the tPA gene; the sequence proved to have a considerable homology (86-88%) with the Alus from the 3'-untranslated regions (3'UTRs) of cytochrome P-450, lysozyme and p53 protein human mRNAs. The same homology was estimated for this Alu in reversed orientation and Alus from the 3'UTRs of some other human mRNAs. In contrast, the homology between this 3' end tPA gene flanking Alu-repeat and other Alus dispersed throughout the gene introns either direct or reversed, was less than 70%. The polyadenylation signal AATAAA downstream the Alu and two nearby signals CACAG and GTGTT resembling consensus sequences CACAG and YGTGTTYY, respectively, were also detected. The two latter motifs located close to the 3' ends in most mammalian genes are likely to regulate mature mRNA formation. The comparison of the sequenced spaser flank adjacent to the tPA gene with short homologous sequence from the same genomic region primary structure reported previously has revealed discrepancies (substitutions, deletions or insertions) in 21 nucleotide positions. The nucleotide sequence of E. coli uvrB gene fragment (980 b.p.) is also reported. This E. coli gene fragment was cloned accidentally within the f1/BglII-fragment being an artifact of the host-vector system used.

  2. KEGG: kyoto encyclopedia of genes and genomes.

    Science.gov (United States)

    Kanehisa, M; Goto, S

    2000-01-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

  3. Chloroplast genome sequence of the moss Tortula ruralis: gene content, polymorphism, and structural arrangement relative to other green plant chloroplast genomes

    OpenAIRE

    Wolf Paul G; Everett Karin DE; Mandoli Dina F; Boore Jeffrey L; Kuehl Jennifer V; Mishler Brent D; Murdock Andrew G; Oliver Melvin J; Duffy Aaron M; Karol Kenneth G

    2010-01-01

    Abstract Background Tortula ruralis, a widely distributed species in the moss family Pottiaceae, is increasingly used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of T. ruralis, only the second published chloroplast genome for a moss, and the first for a vegetatively desiccation-t...

  4. Honeybee (Apis mellifera L.) mrjp gene family: computational analysis of putative promoters and genomic structure of mrjp1, the gene coding for the most abundant protein of larval food.

    Science.gov (United States)

    Malecová, Barbora; Ramser, Juliane; O'Brien, John K; Janitz, Michal; Júdová, Jana; Lehrach, Hans; Simúth, Jozef

    2003-01-16

    Mrjp1 gene belongs to the honeybee mrjp gene family encoding the major royal jelly proteins (MRJPs), secreted by nurse bees into the royal jelly. In this study, we have isolated the genomic clone containing the entire mrjp1 gene and determined its sequence. The mrjp1 gene sequence spans over 3038 bp and contains six exons separated by five introns. Seven mismatches between the mrjp1 gene sequence and two previously independently published cDNA sequences were found, but these differences do not lead to any change in the deduced amino acid sequence of MRJP1. With the aid of inverse polymerase chain reaction we obtained sequences flanking the 5' ends of other mrjp genes (mrjp2, mrjp3, mrjp4 and mrjp5). Putative promoters were predicted upstream of all mrjp genes (including mrjp1). The predicted promoters contain the TATA motif (TATATATT), highly conserved both in sequence and position. Ultraspiracle (USP) transcription factor (TF) binding sites in putative promoter regions and clusters of dead ringer TF binding sites upstream of these promoters were predicted computationally. We propose that USP, as a juvenile hormone (JH) binding TF, might possibly act as a mediator of mrjp expression in response to JH. Mrjp1's genomic locus is predicted to encode an antisense transcript, partially overlapping with five mrjp1 exons and entirely overlapping with the putative promoter and predicted transcriptional start point of mrjp1. This finding may shed light on the mechanisms of regulation of mrjps expression. Southern blot analysis of genomic DNA revealed that all so far known members of mrjp gene family (mrjp1, mrjp2, mrjp3, mrjp4 and mrjp5) are present as single-copy genes per haploid honeybee genome. Although MRJPs and the yellow protein of Drosophila melanogaster share a certain degree of similarity in aa sequence and although it has been shown that they share a common evolutionary origin, neither structural similarities in the gene organization, nor significant similarities

  5. A unified gene catalog for the laboratory mouse reference genome.

    Science.gov (United States)

    Zhu, Y; Richardson, J E; Hale, P; Baldarelli, R M; Reed, D J; Recla, J M; Sinclair, R; Reddy, T B K; Bult, C J

    2015-08-01

    We report here a semi-automated process by which mouse genome feature predictions and curated annotations (i.e., genes, pseudogenes, functional RNAs, etc.) from Ensembl, NCBI and Vertebrate Genome Annotation database (Vega) are reconciled with the genome features in the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org) into a comprehensive and non-redundant catalog. Our gene unification method employs an algorithm (fjoin--feature join) for efficient detection of genome coordinate overlaps among features represented in two annotation data sets. Following the analysis with fjoin, genome features are binned into six possible categories (1:1, 1:0, 0:1, 1:n, n:1, n:m) based on coordinate overlaps. These categories are subsequently prioritized for assessment of annotation equivalencies and differences. The version of the unified catalog reported here contains more than 59,000 entries, including 22,599 protein-coding coding genes, 12,455 pseudogenes, and 24,007 other feature types (e.g., microRNAs, lincRNAs, etc.). More than 23,000 of the entries in the MGI gene catalog have equivalent gene models in the annotation files obtained from NCBI, Vega, and Ensembl. 12,719 of the features are unique to NCBI relative to Ensembl/Vega; 11,957 are unique to Ensembl/Vega relative to NCBI, and 3095 are unique to MGI. More than 4000 genome features fall into categories that require manual inspection to resolve structural differences in the gene models from different annotation sources. Using the MGI unified gene catalog, researchers can easily generate a comprehensive report of mouse genome features from a single source and compare the details of gene and transcript structure using MGI's mouse genome browser.

  6. Ligninolytic peroxidase genes in the oyster mushroom genome: heterologous expression, molecular structure, catalytic and stability properties, and lignin-degrading ability

    Science.gov (United States)

    2014-01-01

    Background The genome of Pleurotus ostreatus, an important edible mushroom and a model ligninolytic organism of interest in lignocellulose biorefineries due to its ability to delignify agricultural wastes, was sequenced with the purpose of identifying and characterizing the enzymes responsible for lignin degradation. Results Heterologous expression of the class II peroxidase genes, followed by kinetic studies, enabled their functional classification. The resulting inventory revealed the absence of lignin peroxidases (LiPs) and the presence of three versatile peroxidases (VPs) and six manganese peroxidases (MnPs), the crystal structures of two of them (VP1 and MnP4) were solved at 1.0 to 1.1 Å showing significant structural differences. Gene expansion supports the importance of both peroxidase types in the white-rot lifestyle of this fungus. Using a lignin model dimer and synthetic lignin, we showed that VP is able to degrade lignin. Moreover, the dual Mn-mediated and Mn-independent activity of P. ostreatus MnPs justifies their inclusion in a new peroxidase subfamily. The availability of the whole POD repertoire enabled investigation, at a biochemical level, of the existence of duplicated genes. Differences between isoenzymes are not limited to their kinetic constants. Surprising differences in their activity T50 and residual activity at both acidic and alkaline pH were observed. Directed mutagenesis and spectroscopic/structural information were combined to explain the catalytic and stability properties of the most interesting isoenzymes, and their evolutionary history was analyzed in the context of over 200 basidiomycete peroxidase sequences. Conclusions The analysis of the P. ostreatus genome shows a lignin-degrading system where the role generally played by LiP has been assumed by VP. Moreover, it enabled the first characterization of the complete set of peroxidase isoenzymes in a basidiomycete, revealing strong differences in stability properties and providing

  7. Structural Genomics of Protein Phosphatases

    Energy Technology Data Exchange (ETDEWEB)

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  8. Genome structures and halophyte-specific gene expression of the extremophile thellungiella parvula in comparison with Thellungiella salsuginea (Thellungiella halophila) and arabidopsis

    KAUST Repository

    Oh, Dongha

    2010-09-10

    The genome of Thellungiella parvula, a halophytic relative of Arabidopsis (Arabidopsis thaliana), is being assembled using Roche-454 sequencing. Analyses of a 10-Mb scaffold revealed synteny with Arabidopsis, with recombination and inversion and an uneven distribution of repeat sequences. T. parvula genome structure and DNA sequences were compared with orthologous regions from Arabidopsis and publicly available bacterial artificial chromosome sequences from Thellungiella salsuginea (previously Thellungiella halophila). The three-way comparison of sequences, from one abiotic stress-sensitive species and two tolerant species, revealed extensive sequence conservation and microcolinearity, but grouping Thellungiella species separately from Arabidopsis. However, the T. parvula segments are distinguished from their T. salsuginea counterparts by a pronounced paucity of repeat sequences, resulting in a 30% shorter DNA segment with essentially the same gene content in T. parvula. Among the genes is SALT OVERLY SENSITIVE1 (SOS1), a sodium/proton antiporter, which represents an essential component of plant salinity stress tolerance. Although the SOS1 coding region is highly conserved among all three species, the promoter regions show conservation only between the two Thellungiella species. Comparative transcript analyses revealed higher levels of basal as well as salt-induced SOS1 expression in both Thellungiella species as compared with Arabidopsis. The Thellungiella species and other halophytes share conserved pyrimidine-rich 5\\' untranslated region proximal regions of SOS1 that are missing in Arabidopsis. Completion of the genome structure of T. parvula is expected to highlight distinctive genetic elements underlying the extremophile lifestyle of this species. © American Society of Plant Biologists.

  9. Gene enrichment in plant genomic shotgun libraries.

    Science.gov (United States)

    Rabinowicz, Pablo D; McCombie, W Richard; Martienssen, Robert A

    2003-04-01

    The Arabidopsis genome (about 130 Mbp) has been completely sequenced; whereas a draft sequence of the rice genome (about 430 Mbp) is now available and the sequencing of this genome will be completed in the near future. The much larger genomes of several important crop species, such as wheat (about 16,000 Mbp) or maize (about 2500 Mbp), may not be fully sequenced with current technology. Instead, sequencing-analysis strategies are being developed to obtain sequencing and mapping information selectively for the genic fraction (gene space) of complex plant genomes.

  10. Genome classification by gene distribution: An overlapping subspace clustering approach

    Directory of Open Access Journals (Sweden)

    Halgamuge Saman K

    2008-04-01

    Full Text Available Abstract Background Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. Bacteriophage genomes are a typical example. One recent approach that addresses this problem is the unsupervised clustering of genomes based on gene order and genome position, which helps to reveal species relationships that may not be apparent from traditional phylogenetic methods. Results We propose the use of an overlapping subspace clustering algorithm for such genome classification problems. The advantage of subspace clustering over traditional clustering is that it can associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced. Additionally, overlapping capability is desirable for the discovery of multiple conserved patterns within a single genome, such as those acquired from different species via horizontal gene transfers. The proposed method involves a novel strategy to vectorize genomes based on their gene distribution. A number of existing subspace clustering and biclustering algorithms were evaluated to identify the best framework upon which to develop our algorithm; we extended a generic subspace clustering algorithm called HARP to incorporate overlapping capability. The proposed algorithm was assessed and applied on bacteriophage genomes. The phage grouping results are consistent overall with the Phage Proteomic Tree and showed common genomic characteristics among the TP901-like, Sfi21-like and sk1-like phage groups. Among 441 phage genomes, we identified four significantly conserved distribution patterns structured by the terminase, portal, integrase, holin and lysin genes. We also observed a subgroup of Sfi21-like phages comprising a distinctive divergent genome organization and identified nine new phage members to the Sfi21-like genus: Staphylococcus 71, phiPVL108, Listeria A118, 2389, Lactobacillus phi AT3, A2

  11. Maximum likelihood for genome phylogeny on gene content.

    Science.gov (United States)

    Zhang, Hongmei; Gu, Xun

    2004-01-01

    With the rapid growth of entire genome data, reconstructing the phylogenetic relationship among different genomes has become a hot topic in comparative genomics. Maximum likelihood approach is one of the various approaches, and has been very successful. However, there is no reported study for any applications in the genome tree-making mainly due to the lack of an analytical form of a probability model and/or the complicated calculation burden. In this paper we studied the mathematical structure of the stochastic model of genome evolution, and then developed a simplified likelihood function for observing a specific phylogenetic pattern under four genome situation using gene content information. We use the maximum likelihood approach to identify phylogenetic trees. Simulation results indicate that the proposed method works well and can identify trees with a high correction rate. Real data application provides satisfied results. The approach developed in this paper can serve as the basis for reconstructing phylogenies of more than four genomes.

  12. Gene conversion in the rice genome

    DEFF Research Database (Denmark)

    Xu, Shuqing; Clark, Terry; Zheng, Hongkun;

    2008-01-01

    BACKGROUND: Gene conversion causes a non-reciprocal transfer of genetic information between similar sequences. Gene conversion can both homogenize genes and recruit point mutations thereby shaping the evolution of multigene families. In the rice genome, the large number of duplicated genes...... is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less...

  13. Toward Elucidating the Structure of Tetraploid Cotton Genome

    Institute of Scientific and Technical Information of China (English)

    GUO Wang-zhen

    2008-01-01

    @@ Upland cotton has the highest yield,and accounts for >95% of world cotton production.Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural,functional,and evolutionary studies of the species.Here,we employed GeneTrek and BAC tagging information approaches to predict the general composition and structure of the allotetraploid cotton genome.

  14. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  15. Using Genomics for Natural Product Structure Elucidation.

    Science.gov (United States)

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  16. Genes but not genomes reveal bacterial domestication of Lactococcus lactis.

    Directory of Open Access Journals (Sweden)

    Delphine Passerini

    Full Text Available BACKGROUND: The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE. METHODOLOGY/PRINCIPAL FINDINGS: The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST differing by up to 230 kb in genome size. CONCLUSION/SIGNIFICANCE: The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between "environmental" strains, the main contributors to the genetic diversity within the subspecies, and "domesticated" strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the "domesticated" strains essentially arose through substantial genomic flux within the dispensable

  17. Genomic evidence for adaptation by gene duplication.

    Science.gov (United States)

    Qian, Wenfeng; Zhang, Jianzhi

    2014-08-01

    Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation.

  18. hSmad5 gene, a human hSmad family member: its full length cDNA, genomic structure, promoter region and mutation analysis in human tumors.

    Science.gov (United States)

    Gemma, A; Hagiwara, K; Vincent, F; Ke, Y; Hancock, A R; Nagashima, M; Bennett, W P; Harris, C C

    1998-02-19

    hSmad (mothers against decapentaplegic)-related proteins are important messengers within the Transforming Growth Factor-beta1 (TGF-beta1) superfamily signal transduction pathways. To further characterize a member of this family, we obtained a full length cDNA of the human hSmad5 (hSmad5) gene by rapid amplification of cDNA ends (RACE) and then determined the genomic structure of the gene. There are eight exons and two alternative transcripts; the shorter transcript lacks exon 2. We identified the hSmad5 promoter region from a human genomic YAC clone by obtaining the nucleotide sequence extending 1235 base pairs upstream of the 5' end of the cDNA. We found a CpG island consistent with a promoter region, and we demonstrated promoter activity in a 1232 bp fragment located upstream of the transcription initiation site. To investigate the frequency of somatic hSmad5 mutations in human cancers, we designed intron-based primers to examine coding regions by polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP) analysis. Neither homozygous deletions or point mutations were found in 40 primary gastric tumors and 51 cell lines derived from diverse types of human cancer including 20 cell lines resistant to the growth inhibitory effects of TGF-beta1. These results suggest that the hSmad5 gene is not commonly mutated and that other genetic alterations mediate the loss of TGF-beta1 responsiveness in human cancers.

  19. Gene finding in the chicken genome

    Directory of Open Access Journals (Sweden)

    Antonarakis Stylianos E

    2005-05-01

    Full Text Available Abstract Background Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method. Results We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end. Conclusions De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods.

  20. Genomic disorders: A window into human gene and genome evolution

    Science.gov (United States)

    Carvalho, Claudia M. B.; Zhang, Feng; Lupski, James R.

    2010-01-01

    Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes. In this context, the study of genomic disorders has uncovered the essential role played by the genomic architecture, especially low copy repeats (LCRs) or segmental duplications (SDs). In fact, regardless of the mechanism, LCRs can mediate or stimulate rearrangements, inciting genomic instability and generating dynamic and unstable regions prone to rapid molecular evolution. In humans, copy-number variation (CNV) has been implicated in common traits such as neuropathy, hypertension, color blindness, infertility, and behavioral traits including autism and schizophrenia, as well as disease susceptibility to HIV, lupus nephritis, and psoriasis among many other clinical phenotypes. The same mechanisms implicated in the origin of genomic disorders may also play a role in the emergence of segmental duplications and the evolution of new genes by means of genomic and gene duplication and triplication, exon shuffling, exon accretion, and fusion/fission events. PMID:20080665

  1. Reproduction-related genes in the pearl oyster genome.

    Science.gov (United States)

    Matsumoto, Toshie; Masaoka, Tetsuji; Fujiwara, Atsushi; Nakamura, Yoji; Satoh, Nori; Awaji, Masahiko

    2013-10-01

    Molluscan reproduction has been a target of biological research because of the various reproductive strategies that have evolved in this phylum. It has also been studied for the development of fisheries technologies, particularly aquaculture. Although fundamental processes of reproduction in other phyla, such as vertebrates and arthropods, have been well studied, information on the molecular mechanisms of molluscan reproduction remains limited. The recently released draft genome of the pearl oyster Pinctada fucata provides a novel and powerful platform for obtaining structural information on the genes and proteins involved in bivalve reproduction. In the present study, we analyzed the pearl oyster draft genome to screen reproduction-related genes. Analysis was mainly conducted for genes reported from other molluscs for encoding orthologs of reproduction-related proteins in other phyla. The gene search in the P. fucata gene models (version 1.1) and genome assembly (version 1.0) were performed using Genome Browser and BLAST software. The obtained gene models were then BLASTP searched against a public database to confirm the best-hit sequences. As a result, more than 40 gene models were identified with high accuracy to encode reproduction-related genes reported for P. fucata and other molluscs. These include vasa, nanos, doublesex- and mab-3-related transcription factor, 5-hydroxytryptamine (5-HT) receptors, vitellogenin, estrogen receptor, and others. The set of reproduction-related genes of P. fucata identified in the present study constitute a new tool for research on bivalve reproduction at the molecular level.

  2. Correlation of microsynteny conservation and disease gene distribution in mammalian genomes

    Directory of Open Access Journals (Sweden)

    Li Xiting

    2009-11-01

    Full Text Available Abstract Background With the completion of the whole genome sequence for many organisms, investigations into genomic structure have revealed that gene distribution is variable, and that genes with similar function or expression are located within clusters. This clustering suggests that there are evolutionary constraints that determine genome architecture. However, as most of the evidence for constraints on genome evolution comes from studies on yeast, it is unclear how much of this prior work can be extrapolated to mammalian genomes. Therefore, in this work we wished to examine the constraints on regions of the mammalian genome containing conserved gene clusters. Results We first identified regions of the mouse genome with microsynteny conservation by comparing gene arrangement in the mouse genome to the human, rat, and dog genomes. We then asked if any particular gene types were found preferentially in conserved regions. We found a significant correlation between conserved microsynteny and the density of mouse orthologs of human disease genes, suggesting that disease genes are clustered in genomic regions of increased microsynteny conservation. Conclusion The correlation between microsynteny conservation and disease gene locations indicates that regions of the mouse genome with microsynteny conservation may contain undiscovered human disease genes. This study not only demonstrates that gene function constrains mammalian genome organization, but also identifies regions of the mouse genome that can be experimentally examined to produce mouse models of human disease.

  3. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    Directory of Open Access Journals (Sweden)

    Brunham Robert C

    2004-07-01

    Full Text Available Abstract Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics.

  4. The mitochondrial genome of Xiphinema americanum sensu stricto (Nematoda: Enoplea): considerable economization in the length and structural features of encoded genes.

    Science.gov (United States)

    He, Y; Jones, J; Armstrong, M; Lamberti, F; Moens, M

    2005-12-01

    The complete sequence of the mitochondrial genome of the plant parasitic nematode Xiphinema americanum sensu stricto has been determined. At 12626bp it is the smallest metazoan mitochondrial genome reported to date. Genes are transcribed from both strands. Genes coding for 12 proteins, 2 rRNAs and 17 putative tRNAs (with the tRNA-C, I, N, S1, S2 missing) are predicted from the sequence. The arrangement of genes within the X. americanum mitochondrial genome is unique and includes gene overlaps. Comparisons with the mtDNA of other nematodes show that the small size of the X. americanum mtDNA is due to a combination of factors. The two mitochondrial rRNA genes are considerably smaller than those of other nematodes, with most of the protein encoding and tRNA genes also slightly smaller. In addition, five tRNAs genes are absent, lengthy noncoding regions are not present in the mtDNA, and several gene overlaps are present.

  5. Genome-wide SNPs and re-sequencing of growth habit and inflorescence genes in barley: implications for association mapping in germplasm arrays varying in size and structure

    Directory of Open Access Journals (Sweden)

    Muehlbauer Gary J

    2010-12-01

    then conducted association analyses - with SNP data only - in the larger germplasm arrays. For both vernalization sensitivity and inflorescence type, the most significant associations in the larger data sets were found with SNPs coincident with the synthetic markers used in the CAP Core and with SNPs detected via interaction analysis in the CAP Core. Conclusions Small and highly structured collections of germplasm, such as the CAP Core, are cost-effectively phenotyped and genotyped with high-throughput markers. They are also useful for characterizing allelic diversity at loci in germplasm of interest. Our results suggest that discovery-oriented exercises in AM in such small arrays may generate a large number of false-positives. However, if haplotypes in candidate genes are available, they may be used as anchors in an analysis of interactions to identify other candidate regions harboring genes determining target traits. Using larger germplasm arrays, genome regions where the principal genes determining vernalization sensitivity and row type are located were identified.

  6. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    cells are capable of regulating their gene expression, so that each cell can only express a particular set of genes yielding limited numbers of proteins with specialized functions. Therefore a rigid control of differential gene expression is necessary for cellular diversity. On the other hand, aberrant...... gene regulation will disrupt the cell’s fundamental processes, which in turn can cause disease. Hence, understanding gene regulation is essential for deciphering the code of life. Along with the development of high throughput sequencing (HTS) technology and the subsequent large-scale data analysis......, genome-wide assays have increased our understanding of gene regulation significantly. This thesis describes the integration and analysis of HTS data across different important aspects of gene regulation. Gene expression can be regulated at different stages when the genetic information is passed from gene...

  7. Structural Genomics of Minimal Organisms: Pipeline and Results

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  8. 2004 Structural, Function and Evolutionary Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  9. From trees to the forest: genes to genomics.

    Science.gov (United States)

    Mullighan, Charles; Petersdorf, Effie; Davies, Stella M; DiPersio, John

    2011-01-01

    Crick, Watson, and colleagues revealed the genetic code in 1953, and since that time, remarkable progress has been made in understanding what makes each of us who we are. Identification of single genes important in disease, and the development of a mechanistic understanding of genetic elements that regulate gene function, have cast light on the pathophysiology of many heritable and acquired disorders. In 1990, the human genome project commenced, with the goal of sequencing the entire human genome, and a "first draft" was published with astonishing speed in 2001. The first draft, although an extraordinary achievement, reported essentially an imaginary haploid mix of alleles rather than a true diploid genome. In the years since 2001, technology has further improved, and efforts have been focused on filling in the gaps in the initial genome and starting the huge task of looking at normal variation in the human genome. This work is the beginning of understanding human genetics in the context of the structure of the genome as a complete entity, and as more than simply the sum of a series of genes. We present 3 studies in this review that apply genomic approaches to leukemia and to transplantation to improve and extend therapies.

  10. Genome editing for human gene therapy.

    Science.gov (United States)

    Meissner, Torsten B; Mandal, Pankaj K; Ferreira, Leonardo M R; Rossi, Derrick J; Cowan, Chad A

    2014-01-01

    The rapid advancement of genome-editing techniques holds much promise for the field of human gene therapy. From bacteria to model organisms and human cells, genome editing tools such as zinc-finger nucleases (ZNFs), TALENs, and CRISPR/Cas9 have been successfully used to manipulate the respective genomes with unprecedented precision. With regard to human gene therapy, it is of great interest to test the feasibility of genome editing in primary human hematopoietic cells that could potentially be used to treat a variety of human genetic disorders such as hemoglobinopathies, primary immunodeficiencies, and cancer. In this chapter, we explore the use of the CRISPR/Cas9 system for the efficient ablation of genes in two clinically relevant primary human cell types, CD4+ T cells and CD34+ hematopoietic stem and progenitor cells. By using two guide RNAs directed at a single locus, we achieve highly efficient and predictable deletions that ablate gene function. The use of a Cas9-2A-GFP fusion protein allows FACS-based enrichment of the transfected cells. The ease of designing, constructing, and testing guide RNAs makes this dual guide strategy an attractive approach for the efficient deletion of clinically relevant genes in primary human hematopoietic stem and effector cells and enables the use of CRISPR/Cas9 for gene therapy.

  11. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    Directory of Open Access Journals (Sweden)

    Nobutaka Nakashima

    2014-02-01

    Full Text Available Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption, knock-in (insertion, and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering.

  12. Gene discovery in the Entamoeba invadens genome.

    Science.gov (United States)

    Wang, Zheng; Samuelson, John; Clark, C Graham; Eichinger, Daniel; Paul, Jaishree; Van Dellen, Katrina; Hall, Neil; Anderson, Iain; Loftus, Brendan

    2003-06-01

    Entamoeba invadens, a parasite of reptiles, is a model for the study of encystation by the human enteric pathogen Entamoeba histolytica, because E. invadens form cysts in axenic culture. With approximately 0.5-fold sequence coverage of the genome, we were able to get insights into E. invadens gene and genome features. Overall, the E. invadens genome displays many of the features that are emerging from ongoing genome sequencing efforts in E. histolytica. At the nucleotide level the E. invadens genome has on average 60% sequence identity with that of E. histolytica. The presence of introns in E. invadens was predicted with similar consensus (GTTTGT em leader A/TAG) sequences to those identified in E. histolytica and Entamoeba dispar. Sequences highly repeated in the genome of E. histolytica (rRNAs, tRNAs, CXXC-rich proteins, and Leu-rich repeat proteins) were found to be highly repeated in the E. invadens genome. Numerous proteins homologous to those implicated in amoebic virulence, (Gal/GalNAc lectins, amoebapores, and cysteine proteinases) and drug resistance (p-glycoproteins) were identified. Homologs of proteins involved in cell cycle, vesicular trafficking and signal transduction were identified, which may be involved in en/excystation and cell growth of E. invadens. Finally, multiple copies of a number of E. invadens genes coding for predicted enzymes involved in core metabolism and the targets of anti-amoebic drugs were identified.

  13. Gene Expression in Chicken Reveals Correlation with Structural Genomic Features and Conserved Patterns of Transcription in the Terrestrial Vertebrates

    NARCIS (Netherlands)

    Nie, H.; Crooijmans, R.P.M.A.; Lammers, A.; Schothorst, van E.M.; Keijer, J.; Neerincx, P.; Leunissen, J.A.M.; Megens, H.J.W.C.; Groenen, M.A.M.

    2010-01-01

    Background - The chicken is an important agricultural and avian-model species. A survey of gene expression in a range of different tissues will provide a benchmark for understanding expression levels under normal physiological conditions in birds. With expression data for birds being very scant, thi

  14. Stem-loop structures in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Boccia Angelo

    2006-07-01

    Full Text Available Abstract Background Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression. Results Systematic analysis of the distribution of stem-loop structures (SLSs in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families. Conclusion In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or

  15. Stem-loop structures in prokaryotic genomes

    Science.gov (United States)

    Petrillo, Mauro; Silvestro, Giustina; Di Nocera, Pier Paolo; Boccia, Angelo; Paolella, Giovanni

    2006-01-01

    Background Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression. Results Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families. Conclusion In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed

  16. Regulation of methane genes and genome expression

    Energy Technology Data Exchange (ETDEWEB)

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  17. Regulation of methane genes and genome expression

    Energy Technology Data Exchange (ETDEWEB)

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  18. Pseudomonas aeruginosa genomic structure and diversity

    Directory of Open Access Journals (Sweden)

    Jens eKlockgether

    2011-07-01

    Full Text Available The Pseudomonas aeruginosa genome (G + C content 65-67%, size 5.5 – 7 Mbp is made up of a single circular chromosome and a variable number of plasmids. Sequencing of complete genomes or blocks of the accessory genome has revealed that the genome encodes a large repertoire of transporters, transcriptional regulators and two-component regulatory systems which reflects its metabolic diversity to utilize a broad range of nutrients. The conserved core component of the genome is largely collinear among P. aeruginosa strains and exhibits an interclonal sequence diversity of 0.5 – 0.7%. Only a few loci of the core genome are subject to diversifying selection. Genome diversity is mainly caused by accessory DNA elements located in 79 regions of genome plasticity that are scattered around the genome and show an anomalous usage of mono- to tetradecanucleotides. Genomic islands of the pKLC102/PAGI-2 family that integrate into tRNALys or tRNAGly genes represent hotspots of inter- and intraclonal genomic diversity. The individual islands differ in their repertoire of metabolic genes that make a large contribution to the pangenome. In order to unravel intraclonal diversity of P. aeruginosa, the genomes of two members of the PA14 clonal complex from diverse habitats and geographic origin were compared. The genome sequences differed by less than 0.01% from each other. 198 of the 231 SNPs were non-randomly distributed in the genome. Non-synonymous SNPs were mainly found in an integrated Pf1-like phage and in genes involved in transcriptional regulation, membrane and extracellular constituents, transport and secretion. In summary, P. aeruginosa is endowed with a highly conserved core genome of low sequence diversity and a highly variable accessory genome that communicates with other pseudomonads and genera via horizontal gene transfer.

  19. Tandemly Arrayed Genes in Vertebrate Genomes

    Directory of Open Access Journals (Sweden)

    Deng Pan

    2008-01-01

    Full Text Available Tandemly arrayed genes (TAGs are duplicated genes that are linked as neighbors on a chromosome, many of which have important physiological and biochemical functions. Here we performed a survey of these genes in 11 available vertebrate genomes. TAGs account for an average of about 14% of all genes in these vertebrate genomes, and about 25% of all duplications. The majority of TAGs (72–94% have parallel transcription orientation (i.e., they are encoded on the same strand in contrast to the genome, which has about 50% of its genes in parallel transcription orientation. The majority of tandem arrays have only two members. In all species, the proportion of genes that belong to TAGs tends to be higher in large gene families than in small ones; together with our recent finding that tandem duplication played a more important role than retroposition in large families, this fact suggests that among all types of duplication mechanisms, tandem duplication is the predominant mechanism of duplication, especially in large families. Finally, several species have a higher proportion of large tandem arrays that are species-specific than random expectation.

  20. Chromatin structure regulates gene conversion.

    Directory of Open Access Journals (Sweden)

    W Jason Cummings

    2007-10-01

    Full Text Available Homology-directed repair is a powerful mechanism for maintaining and altering genomic structure. We asked how chromatin structure contributes to the use of homologous sequences as donors for repair using the chicken B cell line DT40 as a model. In DT40, immunoglobulin genes undergo regulated sequence diversification by gene conversion templated by pseudogene donors. We found that the immunoglobulin Vlambda pseudogene array is characterized by histone modifications associated with active chromatin. We directly demonstrated the importance of chromatin structure for gene conversion, using a regulatable experimental system in which the heterochromatin protein HP1 (Drosophila melanogaster Su[var]205, expressed as a fusion to Escherichia coli lactose repressor, is tethered to polymerized lactose operators integrated within the pseudo-Vlambda donor array. Tethered HP1 diminished histone acetylation within the pseudo-Vlambda array, and altered the outcome of Vlambda diversification, so that nontemplated mutations rather than templated mutations predominated. Thus, chromatin structure regulates homology-directed repair. These results suggest that histone modifications may contribute to maintaining genomic stability by preventing recombination between repetitive sequences.

  1. Chloroplast genome structure in Ilex (Aquifoliaceae).

    Science.gov (United States)

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T

    2016-07-05

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species.

  2. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome.

    Science.gov (United States)

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-11-28

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.

  3. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae.

    Science.gov (United States)

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-11-07

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS₁ and trnS₂) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies.

  4. The genomic environment around the Aromatase gene: evolutionary insights

    Directory of Open Access Journals (Sweden)

    Reis-Henriques Maria A

    2005-08-01

    Full Text Available Abstract Background The cytochrome P450 aromatase (CYP19, catalyses the aromatisation of androgens to estrogens, a key mechanism in vertebrate reproductive physiology. A current evolutionary hypothesis suggests that CYP19 gene arose at the origin of vertebrates, given that it has not been found outside this clade. The human CYP19 gene is located in one of the proposed MHC-paralogon regions (HSA15q. At present it is unclear whether this genomic location is ancestral (which would suggest an invertebrate origin for CYP19 or derived (genomic location with no evolutionary meaning. The distinction between these possibilities should help to clarify the timing of the CYP19 emergence and which taxa should be investigated. Results Here we determine the "genomic environment" around CYP19 in three vertebrate species Homo sapiens, Tetraodon nigroviridis and Xenopus tropicalis. Paralogy studies and phylogenetic analysis of six gene families suggests that the CYP19 gene region was structured through "en bloc" genomic duplication (as part of the MHC-paralogon formation. Four gene families have specifically duplicated in the vertebrate lineage. Moreover, the mapping location of the different paralogues is consistent with a model of "en bloc" duplication. Furthermore, we also determine that this region has retained the same gene content since the divergence of Actinopterygii and Tetrapods. A single inversion in gene order has taken place, probably in the mammalian lineage. Finally, we describe the first invertebrate CYP19 sequence, from Branchiostoma floridae. Conclusion Contrary to previous suggestions, our data indicates an invertebrate origin for the aromatase gene, given the striking conservation pattern in both gene order and gene content, and the presence of aromatase in amphioxus. We propose that CYP19 duplicated in the vertebrate lineage to yield four paralogues, followed by the subsequent loss of all but one gene in vertebrate evolution. Finally, we

  5. Genomics of the human carnitine acyltransferase genes

    NARCIS (Netherlands)

    van der Leij, FR; Huijkman, NCA; Boomsma, C; Kuipers, JRG; Bartelds, B

    2000-01-01

    Five genes in the human genome are known to encode different active forms of related carnitine acyltransferases: CPT1A for liver-type carnitine palmitoyltransferase I, CPT1B for muscle-type carnitine palmitoyltransferase I, CPT2 for carnitine palmitoyltransferase II, CROT for carnitine octanoyltrans

  6. Genomic organization of the CC chemokine mip-3alpha/CCL20/larc/exodus/SCYA20, showing gene structure, splice variants, and chromosome localization.

    Science.gov (United States)

    Nelson, R T; Boyd, J; Gladue, R P; Paradis, T; Thomas, R; Cunningham, A C; Lira, P; Brissette, W H; Hayes, L; Hames, L M; Neote, K S; McColl, S R

    2001-04-01

    We describe the genomic organization of a recently identified CC chemokine, MIP3alpha/CCL20 (HGMW-approved symbol SCYA20). The MIP-3alpha/CCL20 gene was cloned and sequenced, revealing a four exon, three intron structure, and was localized by FISH analysis to 2q35-q36. Two distinct cDNAs were identified, encoding two forms of MIP-3alpha/CCL20, Ala MIP-3alpha/CCL20 and Ser MIP-3alpha/CCL20, that differ by one amino acid at the predicted signal peptide cleavage site. Examination of the sequence around the boundary of intron 1 and exon 2 showed that use of alternative splice acceptor sites could give rise to Ala MIP-3alpha/CCL20 or Ser MIP-3alpha/CCL20. Both forms of MIP-3alpha/CCL20 were chemically synthesized and tested for biological activity. Both flu antigen plus IL-2-activated CD4(+) and CD8(+) T lymphoblasts and cord blood-derived dendritic cells responded to Ser and Ala MIP-3alpha/CCL20. T lymphocytes exposed only to IL-2 responded inconsistently, while no response was detected in naive T lymphocytes, monocytes, or neutrophils. The biological activity of Ser MIP-3alpha/CCL20 and Ala MIP-3alpha/CCL20 and the tissue-specific preference of different splice acceptor sites are not yet known.

  7. Structural and functional analysis of rice genome

    Indian Academy of Sciences (India)

    Akhilesh K. Tyagi; Jitendra P. Khurana; Paramjit Khurana; Saurabh Raghuvanshi; Anupama Gaur; Anita Kapur; Vikrant Gupta; Dibyendu Kumar; V. Ravi; Shubha Vij; Parul Khurana; Sulabha Sharma

    2004-04-01

    Rice is an excellent system for plant genomics as it represents a modest size genome of 430 Mb. It feeds more than half the population of the world. Draft sequences of the rice genome, derived by whole-genome shotgun approach at relatively low coverage (4–6 X), were published and the International Rice Genome Sequencing Project (IRGSP) declared high quality (>10 X), genetically anchored, phase 2 level sequence in 2002. In addition, phase 3 level finished sequence of chromosomes 1, 4 and 10 (out of 12 chromosomes of rice) has already been reported by scientists from IRGSP consortium. Various estimates of genes in rice place the number at > 50,000. Already, over 28,000 full-length cDNAs have been sequenced, most of which map to genetically anchored genome sequence. Such information is very useful in revealing novel features of macro- and micro-level synteny of rice genome with other cereals. Microarray analysis is unraveling the identity of rice genes expressing in temporal and spatial manner and should help target candidate genes useful for improving traits of agronomic importance. Simultaneously, functional analysis of rice genome has been initiated by marker-based characterization of useful genes and employing functional knock-outs created by mutation or gene tagging. Integration of this enormous information is expected to catalyze tremendous activity on basic and applied aspects of rice genomics.

  8. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  9. Evolutionary genomics of LysM genes in land plants

    Directory of Open Access Journals (Sweden)

    Stacey Gary

    2009-08-01

    Full Text Available Abstract Background The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns and lipo-chitin (nodulation factors in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. Results We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. Conclusion We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.

  10. Evolutionary genomics of LysM genes in land plants.

    Science.gov (United States)

    Zhang, Xue-Cheng; Cannon, Steven B; Stacey, Gary

    2009-08-03

    The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin) and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns) and lipo-chitin (nodulation factors) in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.

  11. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  12. Floral gene resources from basal angiosperms for comparative genomics research

    Directory of Open Access Journals (Sweden)

    Zhang Xiaohong

    2005-03-01

    Full Text Available Abstract Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04 generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii many known floral gene homologues have been captured, and (iii phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage

  13. Floral gene resources from basal angiosperms for comparative genomics research

    Science.gov (United States)

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  14. An enigmatic fourth runt domain gene in the fugu genome: ancestral gene loss versus accelerated evolution

    Directory of Open Access Journals (Sweden)

    Hood Leroy

    2004-11-01

    Full Text Available Abstract Background The runt domain transcription factors are key regulators of developmental processes in bilaterians, involved both in cell proliferation and differentiation, and their disruption usually leads to disease. Three runt domain genes have been described in each vertebrate genome (the RUNX gene family, but only one in other chordates. Therefore, the common ancestor of vertebrates has been thought to have had a single runt domain gene. Results Analysis of the genome draft of the fugu pufferfish (Takifugu rubripes reveals the existence of a fourth runt domain gene, FrRUNT, in addition to the orthologs of human RUNX1, RUNX2 and RUNX3. The tiny FrRUNT packs six exons and two putative promoters in just 3 kb of genomic sequence. The first exon is located within an intron of FrSUPT3H, the ortholog of human SUPT3H, and the first exon of FrSUPT3H resides within the first intron of FrRUNT. The two gene structures are therefore "interlocked". In the human genome, SUPT3H is instead interlocked with RUNX2. FrRUNT has no detectable ortholog in the genomes of mammals, birds or amphibians. We consider alternative explanations for an apparent contradiction between the phylogenetic data and the comparison of the genomic neighborhoods of human and fugu runt domain genes. We hypothesize that an ancient RUNT locus was lost in the tetrapod lineage, together with FrFSTL6, a member of a novel family of follistatin-like genes. Conclusions Our results suggest that the runt domain family may have started expanding in chordates much earlier than previously thought, and exemplify the importance of detailed analysis of whole-genome draft sequence to provide new insights into gene evolution.

  15. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    Science.gov (United States)

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  16. The genome BLASTatlas-a GeneWiz extension for visualization of whole-genome homology.

    Science.gov (United States)

    Hallin, Peter F; Binnewies, Tim T; Ussery, David W

    2008-05-01

    The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context of regions. Additional information can be added to these plots, and as an example we have added circles showing the probability of the DNA helix opening up under superhelical tension. The tool is SOAP compliant and WSDL (web services description language) files are located on our website: (http://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence enabling automation of repeated tasks. This tool can be relevant in many pangenomic as well as in metagenomic studies, by giving a quick overview of clusters of insertion sites, genomic islands and overall homology between a reference sequence and a data set.

  17. The evolution of chloroplast genes and genomes in ferns.

    Science.gov (United States)

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  18. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...

  19. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Directory of Open Access Journals (Sweden)

    Yunsheng Wang

    Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  20. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Science.gov (United States)

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  1. FGF: a web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...... to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...... is freely available on a web server at http://fgf.genomics.org.cn/...

  2. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...... to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...... is freely available on a web server at http://fgf.genomics.org.cn/...

  3. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    Science.gov (United States)

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-01-01

    Background The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and other land plants, including

  4. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    Directory of Open Access Journals (Sweden)

    Khan Shafiq A

    2003-06-01

    Full Text Available Abstract Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells.

  5. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    Energy Technology Data Exchange (ETDEWEB)

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  6. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  7. Isolation, cDNA, and genomic structure of a conserved gene (NOF) at chromosome 11q13 next to FAU and oriented in the opposite transcriptional orientation.

    Science.gov (United States)

    Kas, K; Lemahieu, V; Meyen, E; Van de Ven, W J; Merregaert, J

    1996-06-15

    In our effort to characterize a gene at chromosome 11q13 involved in a t(11;17)(q13;q21) translocation in B-non-Hodgkin lymphoma, we have identified a novel human gene, NOF (Neighbour of FAU). It maps right next to FAU in a head to head configuration separated by a maximum of 146 nucleotides. cDNA clones representing NOF hybridized to a 2. 2-kb mRNA present in all tissues tested. The largest open reading frame appeared to contain 166 amino acids and is proline rich, and the sequence shows no homology with any known gene in the public databases. The NOF gene consists of 4 exons and 3 introns spanning approximately 5 kb, and the boundaries between exons and introns follow the GT/AG rule. The NOF locus is conserved during evolution, with the predicted protein having over 80% identity to three translated mouse and rat ESTs of unknown function. Moreover, the mouse ESTs map in the same organization, closely linked to the FAU gene, in the mouse genome. NOF, however, is not affected by the t(11;17)(q13;q21) chromosomal translocation.

  8. Molecular Characterization of Soybean Pterocarpan 2-Dimethylallyltransferase in Glyceollin Biosynthesis: Local Gene and Whole-Genome Duplications of Prenyltransferase Genes Led to the Structural Diversity of Soybean Prenylated Isoflavonoids.

    Science.gov (United States)

    Yoneyama, Keisuke; Akashi, Tomoyoshi; Aoki, Toshio

    2016-12-01

    Soybean (Glycine max) accumulates several prenylated isoflavonoid phytoalexins, collectively referred to as glyceollins. Glyceollins (I, II, III, IV and V) possess modified pterocarpan skeletons with C5 moieties from dimethylallyl diphosphate, and they are commonly produced from (6aS, 11aS)-3,9,6a-trihydroxypterocarpan [(-)-glycinol]. The metabolic fate of (-)-glycinol is determined by the enzymatic introduction of a dimethylallyl group into C-4 or C-2, which is reportedly catalyzed by regiospecific prenyltransferases (PTs). 4-Dimethylallyl (-)-glycinol and 2-dimethylallyl (-)-glycinol are precursors of glyceollin I and other glyceollins, respectively. Although multiple genes encoding (-)-glycinol biosynthetic enzymes have been identified, those involved in the later steps of glyceollin formation mostly remain unidentified, except for (-)-glycinol 4-dimethylallyltransferase (G4DT), which is involved in glyceollin I biosynthesis. In this study, we identified four genes that encode isoflavonoid PTs, including (-)-glycinol 2-dimethylallyltransferase (G2DT), using homology-based in silico screening and biochemical characterization in yeast expression systems. Transcript analyses illustrated that changes in G2DT gene expression were correlated with the induction of glyceollins II, III, IV and V in elicitor-treated soybean cells and leaves, suggesting its involvement in glyceollin biosynthesis. Moreover, the genomic signatures of these PT genes revealed that G4DT and G2DT are paralogs derived from whole-genome duplications of the soybean genome, whereas other PT genes [isoflavone dimethylallyltransferase 1 (IDT1) and IDT2] were derived via local gene duplication on soybean chromosome 11.

  9. Genome-level identification, gene expression, and comparative analysis of porcine ß-defensin genes

    Directory of Open Access Journals (Sweden)

    Choi Min-Kyeung

    2012-11-01

    Full Text Available Abstract Background Beta-defensins (β-defensins are innate immune peptides with evolutionary conservation across a wide range of species and has been suggested to play important roles in innate immune reactions against pathogens. However, the complete β-defensin repertoire in the pig has not been fully addressed. Result A BLAST analysis was performed against the available pig genomic sequence in the NCBI database to identify β-defensin-related sequences using previously reported β-defensin sequences of pigs, humans, and cattle. The porcine β-defensin gene clusters were mapped to chromosomes 7, 14, 15 and 17. The gene expression analysis of 17 newly annotated porcine β-defensin genes across 15 tissues using semi-quantitative reverse transcription polymerase chain reaction (RT-PCR showed differences in their tissue distribution, with the kidney and testis having the largest pBD expression repertoire. We also analyzed single nucleotide polymorphisms (SNPs in the mature peptide region of pBD genes from 35 pigs of 7 breeds. We found 8 cSNPs in 7 pBDs. Conclusion We identified 29 porcine β-defensin (pBD gene-like sequences, including 17 unreported pBDs in the porcine genome. Comparative analysis of β-defensin genes in the pig genome with those in human and cattle genomes showed structural conservation of β-defensin syntenic regions among these species.

  10. Toward Elucidating the Structure of Tetraploid Cotton Genome

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Upland cotton has the highest yield,and accounts for >95% of world cotton production.Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural,functional,and evolutionary studies of the species.Here,we employed GeneTrek and BAC

  11. Genomic variation in Salmonella enterica core genes for epidemiological typing

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Lukjancenko, Oksana; Rundsten, Carsten Friis

    2012-01-01

    Background: Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over...... genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher...... that there is a positive selection towards mutations leading to amino acid changes. Conclusions: Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important...

  12. Characterization of histone genes isolated from Xenopus laevis and Xenopus tropicalis genomic libraries.

    Science.gov (United States)

    Ruberti, I; Fragapane, P; Pierandrei-Amaldi, P; Beccari, E; Amaldi, F; Bozzoni, I

    1982-12-11

    Using a cDNA clone for the histone H3 we have isolated, from two genomic libraries of Xenopus laevis and Xenopus tropicalis, clones containing four different histone gene clusters. The structural organization of X. laevis histone genes has been determined by restriction mapping, Southern blot hybridization and translation of the mRNAs which hybridize to the various restriction fragments. The arrangement of the histone genes in X. tropicalis has been determined by Southern analysis using X. laevis genomic fragments, containing individual genes, as probes. Histone genes are clustered in the genome of X. laevis and X. tropicalis and, compared to invertebrates, show a higher organization heterogeneity as demonstrated by structural analysis of the four genomic clones. In fact, the order of the genes within individual clusters is not conserved.

  13. Genomic variation in Salmonella enterica core genes for epidemiological typing

    Directory of Open Access Journals (Sweden)

    Leekitcharoenphon Pimlapas

    2012-03-01

    Full Text Available Abstract Background Technological advances in high throughput genome sequencing are making whole genome sequencing (WGS available as a routine tool for bacterial typing. Standardized procedures for identification of relevant genes and of variation are needed to enable comparison between studies and over time. The core genes--the genes that are conserved in all (or most members of a genus or species--are potentially good candidates for investigating genomic variation in phylogeny and epidemiology. Results We identify a set of 2,882 core genes clusters based on 73 publicly available Salmonella enterica genomes and evaluate their value as typing targets, comparing whole genome typing and traditional methods such as 16S and MLST. A consensus tree based on variation of core genes gives much better resolution than 16S and MLST; the pan-genome family tree is similar to the consensus tree, but with higher confidence. The core genes can be divided into two categories: a few highly variable genes and a larger set of conserved core genes, with low variance. For the most variable core genes, the variance in amino acid sequences is higher than for the corresponding nucleotide sequences, suggesting that there is a positive selection towards mutations leading to amino acid changes. Conclusions Genomic variation within the core genome is useful for investigating molecular evolution and providing candidate genes for bacterial genome typing. Identification of genes with different degrees of variation is important especially in trend analysis.

  14. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    Science.gov (United States)

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  15. Elucidation of operon structures across closely related bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Chuan Zhou

    Full Text Available About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.

  16. Elucidation of operon structures across closely related bacterial genomes.

    Science.gov (United States)

    Zhou, Chuan; Ma, Qin; Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.

  17. Alu recombination-mediated structural deletions in the chimpanzee genome.

    Directory of Open Access Journals (Sweden)

    Kyudong Han

    2007-10-01

    Full Text Available With more than 1.2 million copies, Alu elements are one of the most important sources of structural variation in primate genomes. Here, we compare the chimpanzee and human genomes to determine the extent of Alu recombination-mediated deletion (ARMD in the chimpanzee genome since the divergence of the chimpanzee and human lineages ( approximately 6 million y ago. Combining computational data analysis and experimental verification, we have identified 663 chimpanzee lineage-specific deletions (involving a total of approximately 771 kb of genomic sequence attributable to this process. The ARMD events essentially counteract the genomic expansion caused by chimpanzee-specific Alu inserts. The RefSeq databases indicate that 13 exons in six genes, annotated as either demonstrably or putatively functional in the human genome, and 299 intronic regions have been deleted through ARMDs in the chimpanzee lineage. Therefore, our data suggest that this process may contribute to the genomic and phenotypic diversity between chimpanzees and humans. In addition, we found four independent ARMD events at orthologous loci in the gorilla or orangutan genomes. This suggests that human orthologs of loci at which ARMD events have already occurred in other nonhuman primate genomes may be "at-risk" motifs for future deletions, which may subsequently contribute to human lineage-specific genetic rearrangements and disorders.

  18. Putative essential and core-essential genes in Mycoplasma genomes.

    Science.gov (United States)

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first "synthetic life", has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The algorithm, which achieved an accuracy of 80.8% and 78.9% in self-consistence and cross-validation tests, respectively, predicted 5880 essential genes in the 16 Mycoplasma genomes. The intersection set of essential genes in available Mycoplasma genomes consists of 153 core essential genes. The predicted essential genes (available from pDEG, tubic.tju.edu.cn/pdeg) and the proposed algorithm can be helpful for studying minimal Mycoplasma genomes as well as essential genes in other genomes.

  19. Gene and genome parameters of mammalian liver circadian genes (LCGs.

    Directory of Open Access Journals (Sweden)

    Gang Wu

    Full Text Available The mammalian circadian system controls various physiology processes and behavior responses by regulating thousands of circadian genes with rhythmic expressions. In this study, we redefined circadian-regulated genes based on published results in the mouse liver and compared them with other gene groups defined relative to circadian regulations, especially the non-circadian-regulated genes expressed in liver at multiple molecular levels from gene position to protein expression based on integrative analyses of different datasets from the literature. Based on the intra-tissue analysis, the liver circadian genes or LCGs show unique features when compared to other gene groups. First, LCGs in general have less neighboring genes and larger in both genomic and 3'-UTR lengths but shorter in CDS (coding sequence lengths. Second, LCGs have higher mRNA and protein abundance, higher temporal expression variations, and shorter mRNA half-life. Third, more than 60% of LCGs form major co-expression clusters centered in four temporal windows: dawn, day, dusk, and night. In addition, larger and smaller LCGs are found mainly expressed in the day and night temporal windows, respectively, and we believe that LCGs are well-partitioned into the gene expression regulatory network that takes advantage of gene size, expression constraint, and chromosomal architecture. Based on inter-tissue analysis, more than half of LCGs are ubiquitously expressed in multiple tissues but only show rhythmical expression in one or limited number of tissues. LCGs show at least three-fold lower expression variations across the temporal windows than those among different tissues, and this observation suggests that temporal expression variations regulated by the circadian system is relatively subtle as compared with the tissue expression variations formed during development. Taken together, we suggest that the circadian system selects gene parameters in a cost effective way to improve tissue

  20. Genomic structure of the rat major AP endonuclease gene (Apex with an adjacent putative O-sialoglycoprotease gene (Prsmg1/Gcpl1 and a processed Apex pseudogene (Apexp1.

    Directory of Open Access Journals (Sweden)

    Yao,Ming

    1999-12-01

    Full Text Available Genomic sequencing and chromosomal assignment of the gene encoding rat APEX nuclease, a multifunctional DNA repair enzyme, were performed. An active Apex gene and a processed pseudogene were isolated from a rat genomic library. The active Apex gene consists of 5 exons and 4 introns spanning 2.1 kb. The putative promoter region of the Apex gene lacks the typical TATA box, but contains CAAT boxes and a CpG island having putative binding sites for several transcription factors, such as Sp1, AP-2, GATA-1 and ATF. A putative O-sialoglycoprotease (a homologue of Pasteurella haemolytica glycoprotease, gcp; abbreviated as Prsmg1/Gcpl1 gene consisting of 11 exons and 10 introns spanning 7.3 kb lies immediately adjacent to the Apex gene in a 5'-to-5' orientation. The Apex gene locus was mapped to rat chromosome 15p12 using in situ hybridization. The processed pseudogene (designated as rat Apexp1 has a nucleotide sequence 87.1% identical to that of the rat Apex cDNA, although several stop codons interrupting the coding sequences and multiple nucleotide deletions were observed. The Apexp1 is located in an inactive LINE sequence. Calculation of nucleotide substitution rates suggests that the immediate, active progenitor of Apexp1 arose 23 million years ago and that the non-functionalization occurred 15 million years ago.

  1. Genomic analysis and gene structure of the plant carotenoid dioxygenase 4 family: a deeper study in Crocus sativus and its allies.

    Science.gov (United States)

    Ahrazem, Oussama; Trapero, Almudena; Gómez, M Dolores; Rubio-Moraga, Angela; Gómez-Gómez, Lourdes

    2010-10-01

    The plastoglobule-targeted enzyme carotenoid cleavage dioxygenase (CCD4) mediates the formation of volatile C13 ketones, such as β-ionone, by cleaving the C9-C10 and C9'-C10' double bonds of cyclic carotenoids. Here, we report the isolation and analysis of CCD4 genomic DNA regions in Crocus sativus. Different CCD4 alleles have been identified: CsCCD4a which is found with and without an intron and CsCCD4b that showed the presence of a unique intron. The presence of different CCD4 alleles was also observed in other Crocus species. Furthermore, comparison of the locations of CCD4 introns within the coding region with CCD4 genes from other plant species suggests that independent gain/losses have occurred. The comparison of the promoter region of CsCCD4a and CsCCD4b with available CCD4 gene promoters from other plant species highlighted the conservation of cis-elements involved in light response, heat stress, as well as the absence and unique presence of cis-elements involved in circadian regulation and low temperature responses, respectively. Functional characterization of the Crocus sativus CCD4a promoter using Arabidopsis plants stably transformed with a DNA fragment of 1400 base pairs (P-CsCCD4a) fused to the β-glucuronidase (GUS) reporter gene showed that this sequence was sufficient to drive GUS expression in the flower, in particular high levels were detected in pollen.

  2. Wolbachia genome integrated in an insect chromosome: evolution and fate of laterally transferred endosymbiont genes.

    Science.gov (United States)

    Nikoh, Naruo; Tanaka, Kohjiro; Shibata, Fukashi; Kondo, Natsuko; Hizume, Masahiro; Shimada, Masakazu; Fukatsu, Takema

    2008-02-01

    Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we thoroughly investigated the bacterial genes derived from a Wolbachia endosymbiont on the nuclear genome of the beetle Callosobruchus chinensis. Exhaustive PCR detection and Southern blot analysis suggested that approximately 30% of Wolbachia genes, in terms of the gene repertoire of wMel, are present on the insect nuclear genome. Fluorescent in situ hybridization located the transferred genes on the proximal region of the basal short arm of the X chromosome. Molecular evolutionary and other lines of evidence indicated that the transferred genes are probably derived from a single lateral transfer event. The transferred genes were, for the length examined, structurally disrupted, freed from functional constraints, and transcriptionally inactive. Hence, most, if not all, of the transferred genes have been pseudogenized. Notwithstanding this, the transferred genes were ubiquitously detected from Japanese and Taiwanese populations of C. chinensis, while the number of the transferred genes detected differed between the populations. The transferred genes were not detected from congenic beetle species, indicating that the transfer event occurred after speciation of C. chinensis, which was estimated to be one or several million years ago. These features of the laterally transferred endosymbiont genes are compared with the evolutionary patterns of mitochondrial and plastid genome fragments acquired by nuclear genomes through recent endosymbiotic gene transfers.

  3. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob Hull

    2014-01-01

    BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However......, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome...

  4. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    Science.gov (United States)

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (Pfood concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Genome-wide analysis of regions similar to promoters of histone genes

    KAUST Repository

    Chowdhary, Rajesh

    2010-05-28

    Background: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.Results: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.Conclusions: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that

  6. p63 gene structure in the phylum mollusca.

    Science.gov (United States)

    Baričević, Ana; Štifanić, Mauro; Hamer, Bojan; Batel, Renato

    2015-08-01

    Roles of p53 family ancestor (p63) in the organisms' response to stressful environmental conditions (mainly pollution) have been studied among molluscs, especially in the genus Mytilus, within the last 15 years. Nevertheless, information about gene structure of this regulatory gene in molluscs is scarce. Here we report the first complete genomic structure of the p53 family orthologue in the mollusc Mediterranean mussel Mytilus galloprovincialis and confirm its similarity to vertebrate p63 gene. Our searches within the available molluscan genomes (Aplysia californica, Lottia gigantea, Crassostrea gigas and Biomphalaria glabrata), found only one p53 family member present in a single copy per haploid genome. Comparative analysis of those orthologues, additionally confirmed the conserved p63 gene structure. Conserved p63 gene structure can be a helpful tool to complement or/and revise gene annotations of any future p63 genomic sequence records in molluscs, but also in other animal phyla. Knowledge of the correct gene structure will enable better prediction of possible protein isoforms and their functions. Our analyses also pointed out possible mis-annotations of the p63 gene in sequenced molluscan genomes and stressed the value of manual inspection (based on alignments of cDNA and protein onto the genome sequence) for a reliable and complete gene annotation.

  7. Four genes encode acetylcholinesterases in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. cDNA sequences, genomic structures, mutations and in vivo expression.

    Science.gov (United States)

    Combes, D; Fedon, Y; Grauso, M; Toutant, J P; Arpagaus, M

    2000-07-21

    We report the full coding sequences and the genomic organization of the four genes encoding acetylcholinesterase (AChE) in Caenorhabditis elegans and Caenorhabditis briggsae, in relation to the properties of the encoded enzymes. ace-1 and ace-2, located on chromosome X and I, respectively, encode two AChEs (ACE-1 and ACE-2) that present 35% identity. The C-terminal end of ACE-1 is homologous to the C terminus of T subunits of vertebrate AChEs. ACE-1 oligomerizes into amphiphilic tetramers. ACE-2 has a hydrophobic C terminus of H type. It associates into glycolipid-anchored dimers. In C. elegans and C. briggsae, ace-3 and ace-4 are organized in tandem on chromosome II, with only 356 nt and 369 nt, respectively, between the stop codon of ace-4 (upstream gene) and the ATG of ace-3. ace-3 produces only 5 % of the total AChE activity. It encodes an H subunit that associates into dimers of glycolipid-anchored catalytic subunits, which are highly resistant to the usual AChE inhibitors, and which hydrolyze butyrylthiocholine faster than acetylthiocholine. ACE-4 is closer to ACE-3 (54 % identity) than to ACE-1 or ACE-2. The usual sequence FGESAG surrounding the active serine residue in cholinesterases is changed to FGQSAG in ace-4. ACE-4 was not detected by our current biochemical methods, although the gene is transcribed in vivo. However the level of ace-4 mRNAs is far lower than those of ace-1, ace-2 and ace-3. The ace-2, ace-3 and ace-4 transcripts were found to be trans-spliced by both SL1 and SL2, although these genes are not included in typical operons. The molecular bases of null mutations g72 (ace-2), p1304 and dc2 (ace-3) have been identified. Copyright 2000 Academic Press.

  8. Structural dynamics of retroviral genome and the packaging

    Directory of Open Access Journals (Sweden)

    Yasuyuki eMiyazaki

    2011-12-01

    Full Text Available Retroviruses can cause diseases such as AIDS, leukemia and tumors, but are also used as vectors for human gene therapy. All retroviruses, except foamy viruses, package two copies of unspliced genomic RNA into their progeny viruses. Understanding the molecular mechanisms of retroviral genome packaging will aid the design of new anti-retroviral drugs targeting the packaging process and improve the efficacy of retroviral vectors. Retroviral genomes have to be specifically recognized by the cognate nucleocapsid (NC domain of the Gag polyprotein from among an excess of cellular and spliced viral mRNA. Extensive virological and structural studies have revealed how retroviral genomic RNA is selectively packaged into the viral particles. The genomic area responsible for the packaging is generally located in the 5’ untranslated region (5’ UTR, and contains dimerization site(s. Recent studies have shown that retroviral genome packaging is modulated by structural changes of RNA at the 5’ UTR accompanied by the dimerization. In this review, we focus on three representative retroviruses, Moloney murine leukemia virus (MoMLV, human immunodeficiency virus type 1 (HIV-1 and 2 (HIV-2, and describe the molecular mechanism of retroviral genome packaging.

  9. Insular organization of gene space in grass genomes.

    Science.gov (United States)

    Gottlieb, Andrea; Müller, Hans-Georg; Massa, Alicia N; Wanjugi, Humphrey; Deal, Karin R; You, Frank M; Xu, Xiangyang; Gu, Yong Q; Luo, Ming-Cheng; Anderson, Olin D; Chan, Agnes P; Rabinowicz, Pablo; Devos, Katrien M; Dvorak, Jan

    2013-01-01

    Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands "gene insulae" to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes.

  10. Insular organization of gene space in grass genomes.

    Directory of Open Access Journals (Sweden)

    Andrea Gottlieb

    Full Text Available Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands "gene insulae" to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes.

  11. Integrase-directed recovery of functional genes from genomic libraries.

    Science.gov (United States)

    Rowe-Magnus, Dean A

    2009-09-01

    Large population sizes, rapid growth and 3.8 billion years of evolution firmly establish microorganisms as a major source of the planet's biological and genetic diversity. However, up to 99% of the microorganisms in a given environment cannot be cultured. Culture-independent methods that directly access the genetic potential of an environmental sample can unveil new proteins with diverse functions, but the sequencing of random DNA can generate enormous amounts of extraneous data. Integrons are recombination systems that accumulate open reading frames (gene cassettes), many of which code for functional proteins with enormous adaptive potential. Some integrons harbor hundreds of gene cassettes and evidence suggests that the gene cassette pool may be limitless in size. Accessing this genetic pool has been hampered since sequence-based techniques, such as hybridization or PCR, often recover only partial genes or a small subset of those present in the sample. Here, a three-plasmid genetic strategy for the sequence-independent recovery of gene cassettes from genomic libraries is described and its use by retrieving functional gene cassettes from the chromosomal integron of Vibrio vulnificus ATCC 27562 is demonstrated. By manipulating the natural activity of integrons, we can gain access to the caches of functional genes amassed by these structures.

  12. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes.

    Science.gov (United States)

    Lin, Michael F; Carlson, Joseph W; Crosby, Madeline A; Matthews, Beverley B; Yu, Charles; Park, Soo; Wan, Kenneth H; Schroeder, Andrew J; Gramates, L Sian; St Pierre, Susan E; Roark, Margaret; Wiley, Kenneth L; Kulathinal, Rob J; Zhang, Peili; Myrick, Kyl V; Antone, Jerry V; Celniker, Susan E; Gelbart, William M; Kellis, Manolis

    2007-12-01

    The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster.

  13. High-Diversity Genes in the Arabidopsis Genome

    OpenAIRE

    Cork, Jennifer M.; Purugganan, Michael D.

    2005-01-01

    High-diversity genes represent an important class of loci in organismal genomes. Since elevated levels of nucleotide variation are a key component of the molecular signature for balancing selection or local adaptation, high-diversity genes may represent loci whose alleles are selectively maintained as balanced polymorphisms. Comparison of 4300 random shotgun sequence fragments of the Arabidopsis thaliana Ler ecotype genome with the whole genomic sequence of the Col-0 ecotype identified 60 gen...

  14. Chicken rRNA Gene Cluster Structure.

    Directory of Open Access Journals (Sweden)

    Alexander G Dyomin

    Full Text Available Ribosomal RNA (rRNA genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5'ETS (1836 bp, 18S rRNA gene (1823 bp, ITS1 (2530 bp, 5.8S rRNA gene (157 bp, ITS2 (733 bp, 28S rRNA gene (4441 bp and 3'ETS (343 bp. The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region. The results have confirmed the chicken rRNA gene cluster validity.

  15. Evolution of closely linked gene pairs in vertebrate genomes

    NARCIS (Netherlands)

    Franck, E.; Hulsen, T.; Huynen, M.A.; Jong, de W.W.; Lunsen, N.H.; Madsen, O.

    2008-01-01

    The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of interge

  16. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

    Directory of Open Access Journals (Sweden)

    Orengo Christine A

    2007-03-01

    Full Text Available Abstract Background Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI, the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. Results In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. Conclusion This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.

  17. Current challenges in genome annotation through structural biology and bioinformatics.

    Science.gov (United States)

    Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M

    2012-10-01

    With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.

  18. Missing genes in the annotation of prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Feng Wu-chun

    2010-03-01

    Full Text Available Abstract Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting. Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations. The vast majority of the missing genes found are small (less than 100 aa. A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

  19. Evolution of paralogous genes: Reconstruction of genome rearrangements through comparison of multiple genomes within Staphylococcus aureus.

    Science.gov (United States)

    Tsuru, Takeshi; Kawai, Mikihiko; Mizutani-Ui, Yoko; Uchiyama, Ikuo; Kobayashi, Ichizo

    2006-06-01

    Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.

  20. Genome-wide characterization of the Pectate Lyase-like (PLL) genes in Brassica rapa.

    Science.gov (United States)

    Jiang, Jingjing; Yao, Lina; Miao, Ying; Cao, Jiashu

    2013-11-01

    Pectate lyases (PL) depolymerize demethylated pectin (pectate, EC 4.2.2.2) by catalyzing the eliminative cleavage of α-1,4-glycosidic linked galacturonan. Pectate Lyase-like (PLL) genes are one of the largest and most complex families in plants. However, studies on the phylogeny, gene structure, and expression of PLL genes are limited. To understand the potential functions of PLL genes in plants, we characterized their intron-exon structure, phylogenetic relationships, and protein structures, and measured their expression patterns in various tissues, specifically the reproductive tissues in Brassica rapa. Sequence alignments revealed two characteristic motifs in PLL genes. The chromosome location analysis indicated that 18 of the 46 PLL genes were located in the least fractionated sub-genome (LF) of B. rapa, while 16 were located in the medium fractionated sub-genome (MF1) and 12 in the more fractionated sub-genome (MF2). Quantitative RT-PCR analysis showed that BrPLL genes were expressed in various tissues, with most of them being expressed in flowers. Detailed qRT-PCR analysis identified 11 pollen specific PLL genes and several other genes with unique spatial expression patterns. In addition, some duplicated genes showed similar expression patterns. The phylogenetic analysis identified three PLL gene subfamilies in plants, among which subfamily II might have evolved from gene neofunctionalization or subfunctionalization. Therefore, this study opens the possibility for exploring the roles of PLL genes during plant development.

  1. Gene and genome duplication in Acanthamoeba polyphaga Mimivirus.

    Science.gov (United States)

    Suhre, Karsten

    2005-11-01

    Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well-recognized feature in large DNA viruses but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba polyphaga Mimivirus. Here, I present a systematic analysis of gene and genome duplication events in the mimivirus genome. I found that one-third of the mimivirus genes are related to at least one other gene in the mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past or through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the mimivirus genome. Using multiple alignments, together with remote-homology detection methods based on Hidden Markov Model comparison, I assign putative functions to some of the paralogous gene families. I suggest that a large part of the duplicated mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. My findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in regard to large DNA viruses.

  2. Phenotypic impact of genomic structural variation

    DEFF Research Database (Denmark)

    Weischenfeldt, Joachim; Symmons, Orsolya; Spitz, François;

    2013-01-01

    Genomic structural variants have long been implicated in phenotypic diversity and human disease, but dissecting the mechanisms by which they exert their functional impact has proven elusive. Recently however, developments in high-throughput DNA sequencing and chromosomal engineering technology have...... facilitated the analysis of structural variants in human populations and model systems in unprecedented detail. In this Review, we describe how structural variants can affect molecular and cellular processes, leading to complex organismal phenotypes, including human disease. We further present advances...

  3. Microfluidic gene arrays for rapid genomic profiling

    Science.gov (United States)

    West, Jay A.; Hukari, Kyle W.; Hux, Gary A.; Shepodd, Timothy J.

    2004-12-01

    Genomic analysis tools have recently become an indispensable tool for the evaluation of gene expression in a variety of experiment protocols. Two of the main drawbacks to this technology are the labor and time intensive process for sample preparation and the relatively long times required for target/probe hybridization. In order to overcome these two technological barriers we have developed a microfluidic chip to perform on chip sample purification and labeling, integrated with a high density genearray. Sample purification was performed using a porous polymer monolithic material functionalized with an oligo dT nucleotide sequence for the isolation of high purity mRNA. These purified mRNA"s can then rapidly labeled using a covalent fluorescent molecule which forms a selective covalent bond at the N7 position of guanine residues. These labeled mRNA"s can then released from the polymer monolith to allow for direct hybridization with oligonucletide probes deposited in microfluidic channel. To allow for rapid target/probe hybridization high density microarray were printed in microchannels. The channels can accommodate array densities as high as 4000 probes. When oligonucleotide deposition is complete, these channels are sealed using a polymer film which forms a pressure tight seal to allow sample reagent flow to the arrayed probes. This process will allow for real time target to probe hybridization monitoring using a top mounted CCD fiber bundle combination. Using this process we have been able to perform a multi-step sample preparation to labeled target/probe hybridization in less than 30 minutes. These results demonstrate the capability to perform rapid genomic screening on a high density microfluidic microarray of oligonucleotides.

  4. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF......Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...

  5. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  6. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.

    Science.gov (United States)

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-04-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs.

  7. Data structures of genome and protein sequences indexing

    Directory of Open Access Journals (Sweden)

    Adeleh asadi

    2016-03-01

    Full Text Available Data structure is a tool for storage and retrieval of information which is named logic and mathematic way of specific data organization. various sequences of genes and proteins in various creatures increases the amount of data in genome databases, and finding appropriate data structure and indexing are subject for many studies. String data structures are general data structure for genome indexing, and this article would review the many used three types of string data structure, suffix tree, suffix array, and Directed Acyclic Word Graphs. This paper is a review of the literature related to three types of data, including genome databases indexing field, tree, postfix, postfix and graphs spiral array directly introduces the word. Findings of this research show that suffix tree and Directed Acyclic Word Graph (DAWG structures need much space however suffix array need less space. Against the Directed Acyclic Word Graph, suffix array can be stored on Memory Stick. Suffix tree and Directed Acyclic Word Graph are a dynamic structures but as suffix array is a Sorted out structure, it could hardly be changed.

  8. A GeneTrek analysis of the maize genome.

    Science.gov (United States)

    Liu, Renyi; Vitte, Clémentine; Ma, Jianxin; Mahama, A Assibi; Dhliwayo, Thanda; Lee, Michael; Bennetzen, Jeffrey L

    2007-07-10

    Analysis of the sequences of 74 randomly selected BACs demonstrated that the maize nuclear genome contains approximately 37,000 candidate genes with homologues in other plant species. An additional approximately 5,500 predicted genes are severely truncated and probably pseudogenes. The distribution of genes is uneven, with approximately 30% of BACs containing no genes. BAC gene density varies from 0 to 7.9 per 100 kb, whereas most gene islands contain only one gene. The average number of genes per gene island is 1.7. Only 72% of these genes show collinearity with the rice genome. Particular LTR retrotransposon families (e.g., Gyma) are enriched on gene-free BACs, most of which do not come from pericentromeres or other large heterochromatic regions. Gene-containing BACs are relatively enriched in different families of LTR retrotransposons (e.g., Ji). Two major bursts of LTR retrotransposon activity in the last 2 million years are responsible for the large size of the maize genome, but only the more recent of these is well represented in gene-containing BACs, suggesting that LTR retrotransposons are more efficiently removed in these domains. The results demonstrate that sample sequencing and careful annotation of a few randomly selected BACs can provide a robust description of a complex plant genome.

  9. Genomic structure, chromosomal localization and expression profile of a novel melanoma differentiation associated (mda-7) gene with cancer specific growth suppressing and apoptosis inducing properties.

    Energy Technology Data Exchange (ETDEWEB)

    Huang, E. Y.; Madireddi, M. T.; Gopalkrishnan, R. V.; Leszczyniecka, M.; Su, Z. Z.; Lebedeva, I. V.; Kang, D. C.; Jian, H.; Lin, J. J.; Alexandre, D.; Chen, Y.; Vozhilla, N.; Mei, M. X.; Christiansen, K. A.; Sivo, F.; Goldstein, N. I.; Chada, S.; Huberman, E.; Pestka, S.; Fisher, P. B.; Biochip Technology Center; Columbia Univ.; Introgen Therapeutics Inc.; UMDNJ-Robert Wood Johnson Medical School

    2001-10-25

    Abnormalities in cellular differentiation are frequent occurrences in human cancers. Treatment of human melanoma cells with recombinant fibroblast interferon (IFN-beta) and the protein kinase C activator mezerein (MEZ) results in an irreversible loss in growth potential, suppression of tumorigenic properties and induction of terminal cell differentiation. Subtraction hybridization identified melanoma differentiation associated gene-7 (mda-7), as a gene induced during these physiological changes in human melanoma cells. Ectopic expression of mda-7 by means of a replication defective adenovirus results in growth suppression and induction of apoptosis in a broad spectrum of additional cancers, including melanoma, glioblastoma multiforme, osteosarcoma and carcinomas of the breast, cervix, colon, lung, nasopharynx and prostate. In contrast, no apparent harmful effects occur when mda-7 is expressed in normal epithelial or fibroblast cells. Human clones of mda-7 were isolated and its organization resolved in terms of intron/exon structure and chromosomal localization. Hu-mda-7 encompasses seven exons and six introns and encodes a protein with a predicted size of 23.8 kDa, consisting of 206 amino acids. Hu-mda-7 mRNA is stably expressed in the thymus, spleen and peripheral blood leukocytes. De novo mda-7 mRNA expression is also detected in human melanocytes and expression is inducible in cells of melanocyte/melanoma lineage and in certain normal and cancer cell types following treatment with a combination of IFN-beta plus MEZ. Mda-7 expression is also induced during megakaryocyte differentiation induced in human hematopoietic cells by treatment with TPA (12-O-tetradecanoyl phorbol-13-acetate). In contrast, de novo expression of mda-7 is not detected nor is it inducible by IFN-beta+MEZ in a spectrum of additional normal and cancer cells. No correlation was observed between induction of mda-7 mRNA expression and growth suppression following treatment with IFN-beta+MEZ and

  10. Identification of putative noncoding RNA genes in the Burkholderia cenocepacia J2315 genome

    DEFF Research Database (Denmark)

    Coenye, T.; Drevinek, P.; Mahenthiralingam, E.

    2007-01-01

    Noncoding RNA (ncRNA) genes are not involved in the production of mRNA and proteins, but produce transcripts that function directly as structural or regulatory RNAs. In the present study, the presence of ncRNA genes in the genome of Burkholderia cenocepacia J2315 was evaluated by combining compar...

  11. A Method for Identification of Selenoprotein Genes in Archaeal Genomes

    Institute of Scientific and Technical Information of China (English)

    Mingfeng Li; Yanzhao Huang; Yi Xiao

    2009-01-01

    The genetic codon UGA has a dual function: serving as a terminator and encoding selenocysteine. However, most popular gene annotation programs only take it as a stop signal, resulting in misannotation or completely missing selenoprotein genes. We developed a computational method named Asec-Prediction that is specific for the prediction of archaeal selenoprotein genes. To evaluate its effectiveness, we first applied it to 14 archaeal genomes with previously known selenoprotein genes, and Asec-Prediction identified all reported selenoprotein genes without redundant results. When we applied it to 12 archaeal genomes that had not been researched for selenoprotein genes, Asec-Prediction detected a novel selenoprotein gene in Methanosarcina acetivorans. Further evidence was also collected to support that the predicted gene should be a real selenoprotein gene. The result shows that Asec-Prediction is effective for the prediction of archaeal selenoprotein genes.

  12. Structure and sequence of the saimiriine herpesvirus 1 genome.

    Science.gov (United States)

    Tyler, Shaun; Severini, Alberto; Black, Darla; Walker, Matthew; Eberle, R

    2011-02-05

    We report here the complete genome sequence of the squirrel monkey α-herpesvirus saimiriine herpesvirus 1 (HVS1). Unlike the simplexviruses of other primate species, only the unique short region of the HVS1 genome is bounded by inverted repeats. While all Old World simian simplexviruses characterized to date lack the herpes simplex virus RL1 (γ34.5) gene, HVS1 has an RL1 gene. HVS1 lacks several genes that are present in other primate simplexviruses (US8.5, US10-12, UL43/43.5 and UL49A). Although the overall genome structure appears more like that of varicelloviruses, the encoded HVS1 proteins are most closely related to homologous proteins of the primate simplexviruses. Phylogenetic analyses confirm that HVS1 is a simplexvirus. Limited comparison of two HVS1 strains revealed a very low degree of sequence variation more typical of varicelloviruses. HVS1 is thus unique among the primate α-herpesviruses in that its genome has properties of both simplexviruses and varicelloviruses.

  13. Chromatin structure and evolution in the human genome

    Directory of Open Access Journals (Sweden)

    Dunlop Malcolm G

    2007-05-01

    Full Text Available Abstract Background Evolutionary rates are not constant across the human genome but genes in close proximity have been shown to experience similar levels of divergence and selection. The higher-order organisation of chromosomes has often been invoked to explain such phenomena but previously there has been insufficient data on chromosome structure to investigate this rigorously. Using the results of a recent genome-wide analysis of open and closed human chromatin structures we have investigated the global association between divergence, selection and chromatin structure for the first time. Results In this study we have shown that, paradoxically, synonymous site divergence (dS at non-CpG sites is highest in regions of open chromatin, primarily as a result of an increased number of transitions, while the rates of other traditional measures of mutation (intergenic, intronic and ancient repeat divergence as well as SNP density are highest in closed regions of the genome. Analysis of human-chimpanzee divergence across intron-exon boundaries indicates that although genes in relatively open chromatin generally display little selection at their synonymous sites, those in closed regions show markedly lower divergence at their fourfold degenerate sites than in neighbouring introns and intergenic regions. Exclusion of known Exonic Splice Enhancer hexamers has little affect on the divergence observed at fourfold degenerate sites across chromatin categories; however, we show that closed chromatin is enriched with certain classes of ncRNA genes whose RNA secondary structure may be particularly important. Conclusion We conclude that, overall, non-CpG mutation rates are lowest in open regions of the genome and that regions of the genome with a closed chromatin structure have the highest background mutation rate. This might reflect lower rates of DNA damage or enhanced DNA repair processes in regions of open chromatin. Our results also indicate that dS is a poor

  14. The multiple facets of homology and their use in comparative genomics to study the evolution of genes, genomes, and species.

    Science.gov (United States)

    Descorps-Declère, Stéphane; Lemoine, Frédéric; Sculo, Quentin; Lespinet, Olivier; Labedan, Bernard

    2008-04-01

    The incredible development of comparative genomics during the last decade has required a correct use of the concept of homology that was previously utilized only by evolutionary biologists. Unhappily, this concept has been often misunderstood and thus misused when exploited outside its evolutionary context. This review brings back to the correct definition of homology and explains how this definition has been progressively refined in order to adapt it to the various new kinds of analysis of gene properties and of their products that appear with the progress of comparative genomics. Then, we illustrate the power and the proficiency of such a concept when using the available genomics data in order to study the evolution of individual genes, of entire genomes and of species, respectively. After explaining how we detect homologues by an exhaustive comparison of a hundred of complete proteomes, we describe three main lines of research we have developed in the recent years. The first one exploits synteny and gene context data to better understand the mechanisms of genome evolution in prokaryotes. The second one is based on phylogenomics approaches to reconstruct the tree of life. The last one is devoted to reminding that protein homology is often limited to structural segments (SOH=segment of homology or module). Detecting and numbering modules allows tracing back protein history by identifying the events of gene duplication and gene fusion. We insist that one of the main present difficulties in such studies is a lack of a reliable method to identify genuine orthologues. Finally, we show how these homology studies are helpful to annotate genes and genomes and to study the complexity of the relationships between sequence and function of a gene.

  15. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 complex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  16. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 com-plex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  17. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    Science.gov (United States)

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  18. Genome-editing Technologies for Gene and Cell Therapy

    Science.gov (United States)

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  19. Genome-editing Technologies for Gene and Cell Therapy.

    Science.gov (United States)

    Maeder, Morgan L; Gersbach, Charles A

    2016-03-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed.

  20. A data management system for structural genomics

    Directory of Open Access Journals (Sweden)

    O'Toole Nicholas

    2004-06-01

    Full Text Available Abstract Background Structural genomics (SG projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. Results We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. Conclusion Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements.

  1. Profiling of gene duplication patterns of sequenced teleost genomes: evidence for rapid lineage-specific genome expansion mediated by recent tandem duplications

    Directory of Open Access Journals (Sweden)

    Lu Jianguo

    2012-06-01

    Full Text Available Abstract Background Gene duplication has had a major impact on genome evolution. Localized (or tandem duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes. Results Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks, and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish. Conclusions We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting

  2. Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

    Science.gov (United States)

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

    2014-11-25

    The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position

  3. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  4. Amplification and characterization of eukaryotic structural genes.

    Science.gov (United States)

    Maniatis, T; Efstratiadis, A; Sim, G K; Kafatos, F

    1978-05-01

    An approach to the study of eukaryotic structural genes which are differentially expressed during development is described. This approach involves the isolation and amplification of mRNA sequences by in vitro conversion of mRNA to double-stranded cDNA followed by molecular cloning in bacterial plasmids. This procedure provides highly specific hybridization probes that can be used to identify genes and their contiguous DNA sequences in genomic DNA, and to detect specific RNA transcripts during development. The nature of the method allows the isolation of individual mRNA sequences from a complex population of molecules at different stages of development.

  5. Interrogating the druggable genome with structural informatics.

    Science.gov (United States)

    Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A

    2006-08-01

    Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

  6. Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome.

    Science.gov (United States)

    Collins, R Eric; Higgs, Paul G

    2012-11-01

    When groups of related bacterial genomes are compared, the number of core genes found in all genomes is usually much less than the mean genome size, whereas the size of the pangenome (the set of genes found on at least one of the genomes) is much larger than the mean size of one genome. We analyze 172 complete genomes of Bacilli and compare the properties of the pangenomes and core genomes of monophyletic subsets taken from this group. We then assess the capabilities of several evolutionary models to predict these properties. The infinitely many genes (IMG) model is based on the assumption that each new gene can arise only once. The predictions of the model depend on the shape of the evolutionary tree that underlies the divergence of the genomes. We calculate results for coalescent trees, star trees, and arbitrary phylogenetic trees of predefined fixed branch length. On a star tree, the pangenome size increases linearly with the number of genomes, as has been suggested in some previous studies, whereas on a coalescent tree, it increases logarithmically. The coalescent tree gives a better fit to the data, for all the examples we consider. In some cases, a fixed phylogenetic tree proved better than the coalescent tree at reproducing structure in the gene frequency spectrum, but little improvement was gained in predictions of the core and pangenome sizes. Most of the data are well explained by a model with three classes of gene: an essential class that is found in all genomes, a slow class whose rate of origination and deletion is slow compared with the time of divergence of the genomes, and a fast class showing rapid origination and deletion. Although the majority of genes originating in a genome are in the fast class, these genes are not retained for long periods, and the majority of genes present in a genome are in the slow or essential classes. In general, we show that the IMG model is useful for comparison with experimental genome data both for species level and

  7. Genic regions of a large salamander genome contain long introns and novel genes

    Directory of Open Access Journals (Sweden)

    Bryant Susan V

    2009-01-01

    Full Text Available Abstract Background The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp were isolated and sequenced to characterize the structure of genic regions. Results Annotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86% of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! Conclusion This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.

  8. Complete female mitochondrial genome of Anodonta anatina (Mollusca: Unionidae): confirmation of a novel protein-coding gene (F ORF).

    Science.gov (United States)

    Soroka, Marianna; Burzyński, Artur

    2015-04-01

    Freshwater mussels are among animals having two different, gender-specific mitochondrial genomes. We sequenced complete female mitochondrial genomes from five individuals of Anodonta anatina, a bivalve species common in palearctic ecozone. The length of the genome was variable: 15,637-15,653 bp. This variation was almost entirely confined to the non-coding parts, which constituted approximately 5% of the genome. Nucleotide diversity was moderate, at 0.3%. Nucleotide composition was typically biased towards AT (66.0%). All genes normally seen in animal mtDNA were identified, as well as the ORF characteristic for unionid mitochondrial genomes, bringing the total number of genes present to 38. If this additional ORF does encode a protein, it must evolve under a very relaxed selection since all substitutions within this gene were non-synonymous. The gene order and structure of the genome were identical to those of all female mitochondrial genomes described in unionid bivalves except the Gonideini.

  9. Genome Variability and Gene Content in Chordopoxviruses: Dependence on Microsatellites

    Science.gov (United States)

    Hatcher, Eneida L.; Wang, Chunlin; Lefkowitz, Elliot J.

    2015-01-01

    To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites—simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution. PMID:25912716

  10. Online resources for genomic structural variation.

    Science.gov (United States)

    Sneddon, Tam P; Church, Deanna M

    2012-01-01

    Genomic structural variation (SV) can be thought of on a continuum from a single base pair insertion/deletion (INDEL) to large megabase-scale rearrangements involving insertions, deletions, duplications, inversions, or translocations of whole chromosomes or chromosome arms. These variants can occur in coding or noncoding DNA, they can be inherited or arise sporadically in the germline or somatic cells. Many of these events are segregating in the population and can be considered common alleles while others are new alleles and thus rare events. All species studied to date harbor structural variants and these may be benign, contributing to phenotypes such as sensory perception and immunity, or pathogenic resulting in genomic disorders including DiGeorge/velocardiofacial, Smith-Margenis, Williams-Beuren, and Prader-Willi syndromes. As structural variants are identified, validated, and their significance, origin, and prevalence are elucidated, it is of critical importance that these data be collected and collated in a way that can be easily accessed and analyzed. This chapter describes current structural variation online resources (see Fig. 1 and Table 1), highlights the challenges in capturing, storing, and displaying SV data, and discusses how dbVar and DGVa, the genomic structural variation databases developed at NCBI and EBI, respectively, were designed to address these issues.

  11. Genome engineering and gene expression control for bacterial strain development.

    Science.gov (United States)

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen.

  12. Recent Achievement in Gene Cloning and Functional Genomics in Soybean

    Directory of Open Access Journals (Sweden)

    Zhengjun Xia

    2013-01-01

    Full Text Available Soybean is a model plant for photoperiodism as well as for symbiotic nitrogen fixation. However, a rather low efficiency in soybean transformation hampers functional analysis of genes isolated from soybean. In comparison, rapid development and progress in flowering time and photoperiodic response have been achieved in Arabidopsis and rice. As the soybean genomic information has been released since 2008, gene cloning and functional genomic studies have been revived as indicated by successfully characterizing genes involved in maturity and nematode resistance. Here, we review some major achievements in the cloning of some important genes and some specific features at genetic or genomic levels revealed by the analysis of functional genomics of soybean.

  13. Structural biology sheds light on the puzzle of genomic ORFans.

    Science.gov (United States)

    Siew, Naomi; Fischer, Daniel

    2004-09-10

    Genomic ORFans are orphan open reading frames (ORFs) with no significant sequence similarity to other ORFs. ORFans comprise 20-30% of the ORFs of most completely sequenced genomes. Because nothing can be learnt about ORFans via sequence homology, the functions and evolutionary origins of ORFans remain a mystery. Furthermore, because relatively few ORFans have been experimentally characterized, it has been suggested that most ORFans are not likely to correspond to functional, expressed proteins, but rather to spurious ORFs, pseudo-genes or to rapidly evolving proteins with non-essential roles. As a snapshot view of current ORFan structural studies, we searched for ORFans among proteins whose three-dimensional structures have been recently determined. We find that functional and structural studies of ORFans are not as underemphasized as previously suggested. These recently determined structures correspond to ORFans from all Kingdoms of life, and include proteins that have previously been functionally characterized, as well as structural genomics targets of unknown function labeled as "hypothetical proteins". This suggests that many of the ORFans in the databases are likely to correspond to expressed, functional (and even essential) proteins. Furthermore, the recently determined structures include examples of the various types of ORFans, suggesting that the functions and evolutionary origins of ORFans are diverse. Although this survey sheds some light on the ORFan mystery, further experimental studies are required to gain a better understanding of the role and origins of the tens of thousands of ORFans awaiting characterization.

  14. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  15. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure.

    Science.gov (United States)

    Zuccolo, Andrea; Bowers, John E; Estill, James C; Xiong, Zhiyong; Luo, Meizhong; Sebastian, Aswathy; Goicoechea, José Luis; Collura, Kristi; Yu, Yeisoo; Jiao, Yuannian; Duarte, Jill; Tang, Haibao; Ayyampalayam, Saravanaraj; Rounsley, Steve; Kudrna, Dave; Paterson, Andrew H; Pires, J Chris; Chanderbali, Andre; Soltis, Douglas E; Chamala, Srikar; Barbazuk, Brad; Soltis, Pamela S; Albert, Victor A; Ma, Hong; Mandoli, Dina; Banks, Jody; Carlson, John E; Tomkins, Jeffrey; dePamphilis, Claude W; Wing, Rod A; Leebens-Mack, Jim

    2011-01-01

    Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.

  16. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  17. Genomic location and characterisation of MIC genes in cattle.

    Science.gov (United States)

    Birch, James; De Juan Sanjuan, Cristina; Guzman, Efrain; Ellis, Shirley A

    2008-08-01

    Major histocompatibility complex (MHC) class I chain-related (MIC) genes have been previously identified and characterised in human. They encode polymorphic class I-like molecules that are stress-inducible, and constitute one of the ligands of the activating natural killer cell receptor NKG2D. We have identified three MIC genes within the cattle genome, located close to three non-classical MHC class I genes. The genomic position relative to other genes is very similar to the arrangement reported in the pig MHC region. Analysis of MIC cDNA sequences derived from a range of cattle cell lines suggest there may be four MIC genes in total. We have investigated the presence of the genes in distinct and well-defined MHC haplotypes, and show that one gene is consistently present, while configuration of the other three genes appears variable.

  18. A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives

    Directory of Open Access Journals (Sweden)

    Price Nicholas

    2009-09-01

    Full Text Available Abstract The Israeli acute paralysis virus (IAPV is a honeybee-infecting virus that was found to be associated with colony collapse disorder. The IAPV genome contains two genes encoding a structural and a nonstructural polyprotein. We applied a recently developed method for the estimation of selection in overlapping genes to detect purifying selection and, hence, functionality. We provide evolutionary evidence for the existence of a functional overlapping gene, which is translated in the +1 reading frame of the structural polyprotein gene. Conserved orthologs of this putative gene, which we provisionally call pog (predicted overlapping gene, were also found in the genomes of a monophyletic clade of dicistroviruses that includes IAPV, acute bee paralysis virus, Kashmir bee virus, and Solenopsis invicta (red imported fire ant virus 1.

  19. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

    Directory of Open Access Journals (Sweden)

    Yubo Hou

    Full Text Available The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10-transformed protein-coding gene number (Y' versus log(10-transformed genome size (X', genome size in kbp were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p0.91. Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1% compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%. The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6 kbp contains 38,188 protein-coding (40,086 total genes and the largest (245x10(6 kbp 87,688 protein-coding (92,013 total genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

  20. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    Science.gov (United States)

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  1. Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.

    Science.gov (United States)

    Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V

    2017-09-30

    Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.

  2. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives

  3. Identification of conserved gene clusters in multiple genomes based on synteny and homology

    Directory of Open Access Journals (Sweden)

    Nikolski Macha

    2011-10-01

    Full Text Available Abstract Background Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams. Results Our primary contribution is a local sliding-window SYNS (SYNtenic teamS algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum. Conclusions The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.

  4. Evolution of genes and genomes on the Drosophila phylogeny.

    Science.gov (United States)

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R; Bergman, Casey M; Oliver, Brian; Markow, Therese A; Kaufman, Thomas C; Kellis, Manolis; Gelbart, William; Iyer, Venky N; Pollard, Daniel A; Sackton, Timothy B; Larracuente, Amanda M; Singh, Nadia D; Abad, Jose P; Abt, Dawn N; Adryan, Boris; Aguade, Montserrat; Akashi, Hiroshi; Anderson, Wyatt W; Aquadro, Charles F; Ardell, David H; Arguello, Roman; Artieri, Carlo G; Barbash, Daniel A; Barker, Daniel; Barsanti, Paolo; Batterham, Phil; Batzoglou, Serafim; Begun, Dave; Bhutkar, Arjun; Blanco, Enrico; Bosak, Stephanie A; Bradley, Robert K; Brand, Adrianne D; Brent, Michael R; Brooks, Angela N; Brown, Randall H; Butlin, Roger K; Caggese, Corrado; Calvi, Brian R; Bernardo de Carvalho, A; Caspi, Anat; Castrezana, Sergio; Celniker, Susan E; Chang, Jean L; Chapple, Charles; Chatterji, Sourav; Chinwalla, Asif; Civetta, Alberto; Clifton, Sandra W; Comeron, Josep M; Costello, James C; Coyne, Jerry A; Daub, Jennifer; David, Robert G; Delcher, Arthur L; Delehaunty, Kim; Do, Chuong B; Ebling, Heather; Edwards, Kevin; Eickbush, Thomas; Evans, Jay D; Filipski, Alan; Findeiss, Sven; Freyhult, Eva; Fulton, Lucinda; Fulton, Robert; Garcia, Ana C L; Gardiner, Anastasia; Garfield, David A; Garvin, Barry E; Gibson, Greg; Gilbert, Don; Gnerre, Sante; Godfrey, Jennifer; Good, Robert; Gotea, Valer; Gravely, Brenton; Greenberg, Anthony J; Griffiths-Jones, Sam; Gross, Samuel; Guigo, Roderic; Gustafson, Erik A; Haerty, Wilfried; Hahn, Matthew W; Halligan, Daniel L; Halpern, Aaron L; Halter, Gillian M; Han, Mira V; Heger, Andreas; Hillier, LaDeana; Hinrichs, Angie S; Holmes, Ian; Hoskins, Roger A; Hubisz, Melissa J; Hultmark, Dan; Huntley, Melanie A; Jaffe, David B; Jagadeeshan, Santosh; Jeck, William R; Johnson, Justin; Jones, Corbin D; Jordan, William C; Karpen, Gary H; Kataoka, Eiko; Keightley, Peter D; Kheradpour, Pouya; Kirkness, Ewen F; Koerich, Leonardo B; Kristiansen, Karsten; Kudrna, Dave; Kulathinal, Rob J; Kumar, Sudhir; Kwok, Roberta; Lander, Eric; Langley, Charles H; Lapoint, Richard; Lazzaro, Brian P; Lee, So-Jeong; Levesque, Lisa; Li, Ruiqiang; Lin, Chiao-Feng; Lin, Michael F; Lindblad-Toh, Kerstin; Llopart, Ana; Long, Manyuan; Low, Lloyd; Lozovsky, Elena; Lu, Jian; Luo, Meizhong; Machado, Carlos A; Makalowski, Wojciech; Marzo, Mar; Matsuda, Muneo; Matzkin, Luciano; McAllister, Bryant; McBride, Carolyn S; McKernan, Brendan; McKernan, Kevin; Mendez-Lago, Maria; Minx, Patrick; Mollenhauer, Michael U; Montooth, Kristi; Mount, Stephen M; Mu, Xu; Myers, Eugene; Negre, Barbara; Newfeld, Stuart; Nielsen, Rasmus; Noor, Mohamed A F; O'Grady, Patrick; Pachter, Lior; Papaceit, Montserrat; Parisi, Matthew J; Parisi, Michael; Parts, Leopold; Pedersen, Jakob S; Pesole, Graziano; Phillippy, Adam M; Ponting, Chris P; Pop, Mihai; Porcelli, Damiano; Powell, Jeffrey R; Prohaska, Sonja; Pruitt, Kim; Puig, Marta; Quesneville, Hadi; Ram, Kristipati Ravi; Rand, David; Rasmussen, Matthew D; Reed, Laura K; Reenan, Robert; Reily, Amy; Remington, Karin A; Rieger, Tania T; Ritchie, Michael G; Robin, Charles; Rogers, Yu-Hui; Rohde, Claudia; Rozas, Julio; Rubenfield, Marc J; Ruiz, Alfredo; Russo, Susan; Salzberg, Steven L; Sanchez-Gracia, Alejandro; Saranga, David J; Sato, Hajime; Schaeffer, Stephen W; Schatz, Michael C; Schlenke, Todd; Schwartz, Russell; Segarra, Carmen; Singh, Rama S; Sirot, Laura; Sirota, Marina; Sisneros, Nicholas B; Smith, Chris D; Smith, Temple F; Spieth, John; Stage, Deborah E; Stark, Alexander; Stephan, Wolfgang; Strausberg, Robert L; Strempel, Sebastian; Sturgill, David; Sutton, Granger; Sutton, Granger G; Tao, Wei; Teichmann, Sarah; Tobari, Yoshiko N; Tomimura, Yoshihiko; Tsolas, Jason M; Valente, Vera L S; Venter, Eli; Venter, J Craig; Vicario, Saverio; Vieira, Filipe G; Vilella, Albert J; Villasante, Alfredo; Walenz, Brian; Wang, Jun; Wasserman, Marvin; Watts, Thomas; Wilson, Derek; Wilson, Richard K; Wing, Rod A; Wolfner, Mariana F; Wong, Alex; Wong, Gane Ka-Shu; Wu, Chung-I; Wu, Gabriel; Yamamoto, Daisuke; Yang, Hsiao-Pei; Yang, Shiaw-Pyng; Yorke, James A; Yoshida, Kiyohito; Zdobnov, Evgeny; Zhang, Peili; Zhang, Yu; Zimin, Aleksey V; Baldwin, Jennifer; Abdouelleil, Amr; Abdulkadir, Jamal; Abebe, Adal; Abera, Brikti; Abreu, Justin; Acer, St Christophe; Aftuck, Lynne; Alexander, Allen; An, Peter; Anderson, Erica; Anderson, Scott; Arachi, Harindra; Azer, Marc; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Berlin, Aaron; Bessette, Daniel; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Bourzgui, Imane; Brown, Adam; Cahill, Patrick; Channer, Sheridon; Cheshatsang, Yama; Chuda, Lisa; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Costello, Maura; D'Aco, Katie; Daza, Riza; De Haan, Georgius; DeGray, Stuart; DeMaso, Christina; Dhargay, Norbu; Dooley, Kimberly; Dooley, Erin; Doricent, Missole; Dorje, Passang; Dorjee, Kunsang; Dupes, Alan; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Fisher, Sheila; Foley, Chelsea D; Franke, Alicia; Friedrich, Dennis; Gadbois, Loryn; Gearin, Gary; Gearin, Christina R; Giannoukos, Georgia; Goode, Tina; Graham, Joseph; Grandbois, Edward; Grewal, Sharleen; Gyaltsen, Kunsang; Hafez, Nabil; Hagos, Birhane; Hall, Jennifer; Henson, Charlotte; Hollinger, Andrew; Honan, Tracey; Huard, Monika D; Hughes, Leanne; Hurhula, Brian; Husby, M Erii; Kamat, Asha; Kanga, Ben; Kashin, Seva; Khazanovich, Dmitry; Kisner, Peter; Lance, Krista; Lara, Marcia; Lee, William; Lennon, Niall; Letendre, Frances; LeVine, Rosie; Lipovsky, Alex; Liu, Xiaohong; Liu, Jinlei; Liu, Shangtao; Lokyitsang, Tashi; Lokyitsang, Yeshi; Lubonja, Rakela; Lui, Annie; MacDonald, Pen; Magnisalis, Vasilia; Maru, Kebede; Matthews, Charles; McCusker, William; McDonough, Susan; Mehta, Teena; Meldrim, James; Meneus, Louis; Mihai, Oana; Mihalev, Atanas; Mihova, Tanya; Mittelman, Rachel; Mlenga, Valentine; Montmayeur, Anna; Mulrain, Leonidas; Navidi, Adam; Naylor, Jerome; Negash, Tamrat; Nguyen, Thu; Nguyen, Nga; Nicol, Robert; Norbu, Choe; Norbu, Nyima; Novod, Nathaniel; O'Neill, Barry; Osman, Sahal; Markiewicz, Eva; Oyono, Otero L; Patti, Christopher; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Raghuraman, Sujaa; Rege, Filip; Reyes, Rebecca; Rise, Cecil; Rogov, Peter; Ross, Keenan; Ryan, Elizabeth; Settipalli, Sampath; Shea, Terry; Sherpa, Ngawang; Shi, Lu; Shih, Diana; Sparrow, Todd; Spaulding, Jessica; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Strader, Christopher; Tesfaye, Senait; Thomson, Talene; Thoulutsang, Yama; Thoulutsang, Dawa; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Young, Geneva; Yu, Qing; Zembek, Lisa; Zhong, Danni; Zimmer, Andrew; Zwirko, Zac; Jaffe, David B; Alvarez, Pablo; Brockman, Will; Butler, Jonathan; Chin, CheeWhye; Gnerre, Sante; Grabherr, Manfred; Kleber, Michael; Mauceli, Evan; MacCallum, Iain

    2007-11-08

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

  5. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  6. Pinpointing disease genes through phenomic and genomic data fusion.

    Science.gov (United States)

    Jiang, Rui; Wu, Mengmeng; Li, Lianshuo

    2015-01-01

    Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology.

  7. The cavefish genome reveals candidate genes for eye loss

    Science.gov (United States)

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  8. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss.

    Science.gov (United States)

    den Bakker, Henk C; Cummings, Craig A; Ferreira, Vania; Vatta, Paolo; Orsi, Renato H; Degoricija, Lovorka; Barker, Melissa; Petrauskene, Olga; Furtado, Manohar R; Wiedmann, Martin

    2010-12-02

    The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides an example of a group of

  9. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    Directory of Open Access Journals (Sweden)

    Barker Melissa

    2010-12-01

    Full Text Available Abstract Background The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. Results To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Conclusions Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii conserved genome size (between 2.8 and 3.2 Mb, and (iii a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus

  10. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  11. Genome-wide gene expression analysis of anguillid herpesvirus 1

    NARCIS (Netherlands)

    Beurden, van S.J.; Peeters, B.P.H.; Rottier, P.J.M.; Davison, A.A.; Engelsma, M.Y.

    2013-01-01

    Background Whereas temporal gene expression in mammalian herpesviruses has been studied extensively, little is known about gene expression in fish herpesviruses. Here we report a genome-wide transcription analysis of a fish herpesvirus, anguillid herpesvirus 1, in cell culture, studied during the

  12. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    Josephine Erhiakporeh

    2016-07-06

    Jul 6, 2016 ... identification of a set of 75 candidate genes (42, 22 and 11 from Arabidopsis, potato and tomato, ... understanding on the genetic basis of drought tolerance by using the .... Comparative genomics and genes expression assay ... Primer code ... physiological and molecular responses to drought stress.

  13. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  14. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    Energy Technology Data Exchange (ETDEWEB)

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  15. Building phylogenetic trees by using gene Nucleotide Genomic Signals.

    Science.gov (United States)

    Cristea, Paul Dan

    2012-01-01

    Nucleotide genomic signal (NuGS) methodology allows a molecular level approach to determine distances between homologous genes or between conserved equivalent non-coding genome regions in various species or individuals of the same species. Therefore, distances between the genes of species or individuals can be computed and phylogenetic trees can be built. The paper illustrates the use of the nucleotide imbalance (N) and nucleotide pair imbalance (P) signals to determine the distances between the genes of several Hominidae. The results are in accordance with those of other genetic or phylogenetic approaches to establish distances between Hominidae species.

  16. Learning directed acyclic graphical structures with genetical genomics data.

    Science.gov (United States)

    Gao, Bin; Cui, Yuehua

    2015-12-15

    Large amount of research efforts have been focused on estimating gene networks based on gene expression data to understand the functional basis of a living organism. Such networks are often obtained by considering pairwise correlations between genes, thus may not reflect the true connectivity between genes. By treating gene expressions as quantitative traits while considering genetic markers, genetical genomics analysis has shown its power in enhancing the understanding of gene regulations. Previous works have shown the improved performance on estimating the undirected network graphical structure by incorporating genetic markers as covariates. Knowing that gene expressions are often due to directed regulations, it is more meaningful to estimate the directed graphical network. In this article, we introduce a covariate-adjusted Gaussian graphical model to estimate the Markov equivalence class of the directed acyclic graphs (DAGs) in a genetical genomics analysis framework. We develop a two-stage estimation procedure to first estimate the regression coefficient matrix by [Formula: see text] penalization. The estimated coefficient matrix is then used to estimate the mean values in our multi-response Gaussian model to estimate the regulatory networks of gene expressions using PC-algorithm. The estimation consistency for high dimensional sparse DAGs is established. Simulations are conducted to demonstrate our theoretical results. The method is applied to a human Alzheimer's disease dataset in which differential DAGs are identified between cases and controls. R code for implementing the method can be downloaded at http://www.stt.msu.edu/∼cui. R code for implementing the method is freely available at http://www.stt.msu.edu/∼cui/software.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. The evolution of chloroplast genome structure in ferns.

    Science.gov (United States)

    Wolf, Paul G; Roper, Jessie M; Duffy, Aaron M

    2010-09-01

    The plastid genome (plastome) is a rich source of phylogenetic and other comparative data in plants. Most land plants possess a plastome of similar structure. However, in a major group of plants, the ferns, a unique plastome structure has evolved. The gene order in ferns has been explained by a series of genomic inversions relative to the plastome organization of seed plants. Here, we examine for the first time the structure of the plastome across fern phylogeny. We used a PCR-based strategy to map and partially sequence plastomes. We found that a pair of partially overlapping inversions in the region of the inverted repeat occurred in the common ancestor of most ferns. However, the ancestral (seed plant) structure is still found in early diverging branches leading to the osmundoid and filmy fern lineages. We found that a second pair of overlapping inversions occurred on a branch leading to the core leptosporangiates. We also found that the unique placement of the gene matK in ferns (lacking a flanking intron) is not a result of a large-scale inversion, as previously thought. This is because the intron loss maps to an earlier point on the phylogeny than the nearby inversion. We speculate on why inversions may occur in pairs and what this may mean for the dynamics of plastome evolution.

  18. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    Science.gov (United States)

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

  19. Genome engineering using a synthetic gene circuit in Bacillus subtilis.

    Science.gov (United States)

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-03-31

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications.

  20. Expression of a transferred nuclear gene in a mitochondrial genome

    Directory of Open Access Journals (Sweden)

    Yichun Qiu

    2014-08-01

    Full Text Available Transfer of mitochondrial genes to the nucleus, and subsequent gain of regulatory elements for expression, is an ongoing evolutionary process in plants. Many examples have been characterized, which in some cases have revealed sources of mitochondrial targeting sequences and cis-regulatory elements. In contrast, there have been no reports of a nuclear gene that has undergone intracellular transfer to the mitochondrial genome and become expressed. Here we show that the orf164 gene in the mitochondrial genome of several Brassicaceae species, including Arabidopsis, is derived from the nuclear ARF17 gene that codes for an auxin responsive protein and is present across flowering plants. Orf164 corresponds to a portion of ARF17, and the nucleotide and amino acid sequences are 79% and 81% identical, respectively. Orf164 is transcribed in several organ types of Arabidopsis thaliana, as detected by RT-PCR. In addition, orf164 is transcribed in five other Brassicaceae within the tribes Camelineae, Erysimeae and Cardamineae, but the gene is not present in Brassica or Raphanus. This study shows that nuclear genes can be transferred to the mitochondrial genome and become expressed, providing a new perspective on the movement of genes between the genomes of subcellular compartments.

  1. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    Science.gov (United States)

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  2. Putative essential and core-essential genes in Mycoplasma genomes

    OpenAIRE

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first “synthetic life”, has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The al...

  3. Bacterial sigma factors: a historical, structural, and genomic perspective.

    Science.gov (United States)

    Feklístov, Andrey; Sharon, Brian D; Darst, Seth A; Gross, Carol A

    2014-01-01

    Transcription initiation is the crucial focal point of gene expression in prokaryotes. The key players in this process, sigma factors (σs), associate with the catalytic core RNA polymerase to guide it through the essential steps of initiation: promoter recognition and opening, and synthesis of the first few nucleotides of the transcript. Here we recount the key advances in σ biology, from their discovery 45 years ago to the most recent progress in understanding their structure and function at the atomic level. Recent data provide important structural insights into the mechanisms whereby σs initiate promoter opening. We discuss both the housekeeping σs, which govern transcription of the majority of cellular genes, and the alternative σs, which direct RNA polymerase to specialized operons in response to environmental and physiological cues. The review concludes with a genome-scale view of the extracytoplasmic function σs, the most abundant group of alternative σs.

  4. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    Science.gov (United States)

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  5. Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER

    Indian Academy of Sciences (India)

    Gautam Aggarwal; Ramakrishna Ramaswamy

    2002-02-01

    We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.

  6. Genomic organization and sequence analysis of the vomeronasal receptor V2R genes in mouse genome

    Institute of Scientific and Technical Information of China (English)

    YANG Hui; Zhang YaPing

    2007-01-01

    Two multigene superfamilies, named V1R and V2R, encoding seven-transmembrane-domain G-protein coupled receptors (GPCRs) have been identified as pheromone receptors in mammals. Three V2R gene families have been described in mouse and rat. Here we screened the updated mouse genome sequence database and finally retrieved 63 putative functional V2R genes including three newly identified genes which formed a new additional family. We described the genomic organization of these genes and also characterized the conservation of mouse V2R protein sequences. These genomic and sequence information we described are useful as part of the evidence to speculate the functional domain of V2Rs and should give aid to the functionality study in the future.

  7. Impact of chromatin structures on DNA processing for genomic analyses.

    Directory of Open Access Journals (Sweden)

    Leonid Teytelman

    Full Text Available Chromatin has an impact on recombination, repair, replication, and evolution of DNA. Here we report that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation (ChIP experiments. We initially discovered this effect at the Saccharomyces cerevisiae HMR locus, where we found that silenced chromatin was refractory to shearing, relative to euchromatin. Using input samples from ChIP-Seq studies, we detected a similar bias throughout the heterochromatic portions of the yeast genome. We also observed significant chromatin-related effects at telomeres, protein binding sites, and genes, reflected in the variation of input-Seq coverage. Experimental tests of candidate regions showed that chromatin influenced shearing at some loci, and that chromatin could also lead to enriched or depleted DNA levels in prepared samples, independently of shearing effects. Our results suggested that assays relying on immunoprecipitation of chromatin will be biased by intrinsic differences between regions packaged into different chromatin structures - biases which have been largely ignored to date. These results established the pervasiveness of this bias genome-wide, and suggested that this bias can be used to detect differences in chromatin structures across the genome.

  8. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    Science.gov (United States)

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  9. Genome-wide identification and analysis of the MADS-box gene family in sesame.

    Science.gov (United States)

    Wei, Xin; Wang, Linhai; Yu, Jingyin; Zhang, Yanxin; Li, Donghua; Zhang, Xiurong

    2015-09-10

    MADS-box genes encode transcription factors that play crucial roles in plant growth and development. Sesame (Sesamum indicum L.) is an oil crop that contributes to the daily oil and protein requirements of almost half of the world's population; therefore, a genome-wide analysis of the MADS-box gene family is needed. Fifty-seven MADS-box genes were identified from 14 linkage groups of the sesame genome. Analysis of phylogenetic relationships with Arabidopsis thaliana, Utricularia gibba and Solanum lycopersicum MADS-box genes was performed. Sesame MADS-box genes were clustered into four groups: 28 MIKC(c)-type, 5 MIKC(⁎)-type, 14 Mα-type and 10 Mγ-type. Gene structure analysis revealed from 1 to 22 exons of sesame MADS-box genes. The number of exons in type II MADS-box genes greatly exceeded the number in type I genes. Motif distribution analysis of sesame MADS-box genes also indicated that type II MADS-box genes contained more motifs than type I genes. These results suggested that type II sesame MADS-box genes had more complex structures. By analyzing expression profiles of MADS-box genes in seven sesame transcriptomes, we determined that MIKC(C)-type MADS-box genes played significant roles in sesame flower and seed development. Although most MADS-box genes in the same clade showed similar expression features, some gene functions were diversified from the orthologous Arabidopsis genes. This research will contribute to uncovering the role of MADS-box genes in sesame development. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

    OpenAIRE

    Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E; Bogdanove, Adam J.

    2015-01-01

    Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we ...

  11. Comparative genomic analysis of Drosophila melanogaster and vector mosquito developmental genes.

    Directory of Open Access Journals (Sweden)

    Susanta K Behura

    Full Text Available Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1 are components of developmental signaling pathways, 2 regulate fundamental developmental processes, 3 are critical for the development of tissues of vector importance, 4 function in developmental processes known to have diverged within insects, and 5 encode microRNAs (miRNAs that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.

  12. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    Science.gov (United States)

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-08

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

  13. Gene space dynamics during the evolution of Aegilops tauschii, Brachypodium distachyon, Oryza sativa, and Sorghum bicolor genomes.

    Science.gov (United States)

    Massa, A N; Wanjugi, H; Deal, K R; O'Brien, K; You, F M; Maiti, R; Chan, A P; Gu, Y Q; Luo, M C; Anderson, O D; Rabinowicz, P D; Dvorak, J; Devos, K M

    2011-09-01

    Nine different regions totaling 9.7 Mb of the 4.02 Gb Aegilops tauschii genome were sequenced using the Sanger sequencing technology and compared with orthologous Brachypodium distachyon, Oryza sativa (rice), and Sorghum bicolor (sorghum) genomic sequences. The ancestral gene content in these regions was inferred and used to estimate gene deletion and gene duplication rates along each branch of the phylogenetic tree relating the four species. The total gene number in the extant Ae. tauschii genome was estimated to be 36,371. The gene deletion and gene duplication rates and total gene numbers in the four genomes were used to estimate the total gene number in each node of the phylogenetic tree. The common ancestor of the Brachypodieae and Triticeae lineages was estimated to have had 28,558 genes, and the common ancestor of the Panicoideae, Ehrhartoideae, and Pooideae subfamilies was estimated to have had 27,152 or 28,350 genes, depending on the ancestral gene scenario. Relative to the Brachypodieae and Triticeae common ancestor, the gene number was reduced in B. distachyon by 3,026 genes and increased in Ae. tauschii by 7,813 genes. The sum of gene deletion and gene duplication rates, which reflects the rate of gene synteny loss, was correlated with the rate of structural chromosome rearrangements and was highest in the Ae. tauschii lineage and lowest in the rice lineage. The high rate of gene space evolution in the Ae. tauschii lineage accounts for the fact that, contrary to the expectations, the level of synteny between the phylogenetically more related Ae. tauschii and B. distachyon genomes is similar to the level of synteny between the Ae. tauschii genome and the genomes of the less related rice and sorghum. The ratio of gene duplication to gene deletion rates in these four grass species closely parallels both the total number of genes in a species and the overall genome size. Because the overall genome size is to a large extent a function of the repeated

  14. Structure and genome organization of AFV2, a novel archaeal lipothrixvirus with unusual terminal and core structures

    DEFF Research Database (Denmark)

    Häring, Monika; Vestergaard, Gisle Alberg; Brügger, Kim;

    2005-01-01

    A novel filamentous virus, AFV2, from the hyperthermophilic archaeal genus Acidianus shows structural similarity to lipothrixviruses but differs from them in its unusual terminal and core structures. The double-stranded DNA genome contains 31,787 bp and carries eight open reading frames homologou...... to those of other lipothrixviruses, a single tRNA(Lys) gene containing a 12-bp archaeal intron, and a 1,008-bp repeat-rich region near the center of the genome....

  15. The nuclear matrix: a structural milieu for genomic function.

    Science.gov (United States)

    Berezney, R; Mortillaro, M J; Ma, H; Wei, X; Samarabandu, J

    1995-01-01

    While significant progress has been made in elucidating molecular properties of specific genes and their regulation, our understanding of how the whole genome is coordinated has lagged behind. To understand how the genome functions as a coordinated whole, we must understand how the nucleus is put together and functions as a whole. An important step in that direction occurred with the isolation and characterization of the nuclear matrix. Aside from the plethora of functional properties associated with these isolated nuclear structures, they have enabled the first direct examination and molecular cloning of specific nuclear matrix proteins. The isolated nuclear matrix can be used for providing an in vitro model for understanding nuclear matrix organization in whole cells. Recent development of high-resolution and three-dimensional approaches for visualizing domains of genomic organization and function in situ has provided corroborative evidence for the nuclear matrix as the site of organization for replication, transcription, and post-transcriptional processing. As more is learned about these in situ functional sites, appropriate experiments could be designed to test molecular mechanisms with the in vitro nuclear matrix systems. This is illustrated in this chapter by the studies of nuclear matrix-associated DNA replication which have evolved from biochemical studies of in vitro nuclear matrix systems toward three-dimensional computer image analysis of replication sites for individual genes.

  16. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments

    Energy Technology Data Exchange (ETDEWEB)

    Cotmore, S.F.; McKie, V.C.; Anderson, L.J.; Astell, C.R.; Tattersall, P.

    1986-11-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights for 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus.

  17. Construction of gene targeting vectors from lambda KOS genomic libraries.

    Science.gov (United States)

    Wattler, S; Kelly, M; Nehls, M

    1999-06-01

    We describe a highly redundant murine genomic library in a new lambda phage, lambda knockout shuttle (lambda KOS) that facilitates the very rapid construction of replacement-type gene targeting vectors. The library consists of 94 individually amplified subpools, each containing an average of 40,000 independent genomic clones. The subpools are arrayed into a 96-well format that allows a PCR-based efficient recovery of independent genomic clones. The lambda KOS vector backbone permits the CRE-mediated conversion into high-copy number pKOS plasmids, wherein the genomic inserts are automatically flanked by negative-selection cassettes. The lambda KOS vector system exploits the yeast homologous recombination machinery to simplify the construction of replacement-type gene targeting vectors independent of restriction sites within the genomic insert. We outline procedures that allow the generation of simple and more sophisticated conditional gene targeting vectors within 3-4 weeks, beginning with the screening of the lambda KOS genomic library.

  18. Gene duplication in the genome of parasitic Giardia lamblia

    Directory of Open Access Journals (Sweden)

    Flores Roberto

    2010-02-01

    Full Text Available Abstract Background Giardia are a group of widespread intestinal protozoan parasites in a number of vertebrates. Much evidence from G. lamblia indicated they might be the most primitive extant eukaryotes. When and how such a group of the earliest branching unicellular eukaryotes developed the ability to successfully parasitize the latest branching higher eukaryotes (vertebrates is an intriguing question. Gene duplication has long been thought to be the most common mechanism in the production of primary resources for the origin of evolutionary novelties. In order to parse the evolutionary trajectory of Giardia parasitic lifestyle, here we carried out a genome-wide analysis about gene duplication patterns in G. lamblia. Results Although genomic comparison showed that in G. lamblia the contents of many fundamental biologic pathways are simplified and the whole genome is very compact, in our study 40% of its genes were identified as duplicated genes. Evolutionary distance analyses of these duplicated genes indicated two rounds of large scale duplication events had occurred in G. lamblia genome. Functional annotation of them further showed that the majority of recent duplicated genes are VSPs (Variant-specific Surface Proteins, which are essential for the successful parasitic life of Giardia in hosts. Based on evolutionary comparison with their hosts, it was found that the rapid expansion of VSPs in G. lamblia is consistent with the evolutionary radiation of placental mammals. Conclusions Based on the genome-wide analysis of duplicated genes in G. lamblia, we found that gene duplication was essential for the origin and evolution of Giardia parasitic lifestyle. The recent expansion of VSPs uniquely occurring in G. lamblia is consistent with the increment of its hosts. Therefore we proposed a hypothesis that the increment of Giradia hosts might be the driving force for the rapid expansion of VSPs.

  19. Regional genomic instability predisposes to complex dystrophin gene rearrangements.

    Science.gov (United States)

    Oshima, Junko; Magner, Daniel B; Lee, Jennifer A; Breman, Amy M; Schmitt, Eric S; White, Lisa D; Crowe, Carol A; Merrill, Michelle; Jayakar, Parul; Rajadhyaksha, Aparna; Eng, Christine M; del Gaudio, Daniela

    2009-09-01

    Mutations in the dystrophin gene (DMD) cause Duchenne and Becker muscular dystrophies and the majority of cases are due to DMD gene rearrangements. Despite the high incidence of these aberrations, little is known about their causative molecular mechanism(s). We examined 792 DMD/BMD clinical samples by oligonucleotide array-CGH and report on the junction sequence analysis of 15 unique deletion cases and three complex intragenic rearrangements to elucidate potential underlying mechanism(s). Furthermore, we present three cases with intergenic rearrangements involving DMD and neighboring loci. The cases with intragenic rearrangements include an inversion with flanking deleted sequences; a duplicated segment inserted in direct orientation into a deleted region; and a splicing mutation adjacent to a deletion. Bioinformatic analysis demonstrated that 7 of 12 breakpoints combined among 3 complex cases aligned with repetitive sequences, as compared to 4 of 30 breakpoints for the 15 deletion cases. Moreover, the inversion/deletion case may involve a stem-loop structure that has contributed to the initiation of this rearrangement. For the duplication/deletion and splicing mutation/deletion cases, the presence of the first mutation, either a duplication or point mutation, may have elicited the deletion events in an attempt to correct preexisting mutations. While NHEJ is one potential mechanism for these complex rearrangements, the highly complex junction sequence of the inversion/deletion case suggests the involvement of a replication-based mechanism. Our results support the notion that regional genomic instability, aided by the presence of repetitive elements, a stem-loop structure, and possibly preexisting mutations, may elicit complex rearrangements of the DMD gene.

  20. Plant DNA barcoding: from gene to genome.

    Science.gov (United States)

    Li, Xiwen; Yang, Yang; Henry, Robert J; Rossetto, Maurizio; Wang, Yitao; Chen, Shilin

    2015-02-01

    DNA barcoding is currently a widely used and effective tool that enables rapid and accurate identification of plant species; however, none of the available loci work across all species. Because single-locus DNA barcodes lack adequate variations in closely related taxa, recent barcoding studies have placed high emphasis on the use of whole-chloroplast genome sequences which are now more readily available as a consequence of improving sequencing technologies. While chloroplast genome sequencing can already deliver a reliable barcode for accurate plant identification it is not yet resource-effective and does not yet offer the speed of analysis provided by single-locus barcodes to unspecialized laboratory facilities. Here, we review the development of candidate barcodes and discuss the feasibility of using the chloroplast genome as a super-barcode. We advocate a new approach for DNA barcoding that, for selected groups of taxa, combines the best use of single-locus barcodes and super-barcodes for efficient plant identification. Specific barcodes might enhance our ability to distinguish closely related plants at the species and population levels.

  1. Identification of neural outgrowth genes using genome-wide RNAi.

    Directory of Open Access Journals (Sweden)

    Katharine J Sepp

    2008-07-01

    Full Text Available While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new

  2. Mapping and annotating obesity-related genes in pig and human genomes.

    Science.gov (United States)

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases.

  3. In vitro analysis of integrated global high-resolution DNA methylation profiling with genomic imbalance and gene expression in osteosarcoma.

    Directory of Open Access Journals (Sweden)

    Bekim Sadikovic

    Full Text Available Genetic and epigenetic changes contribute to deregulation of gene expression and development of human cancer. Changes in DNA methylation are key epigenetic factors regulating gene expression and genomic stability. Recent progress in microarray technologies resulted in developments of high resolution platforms for profiling of genetic, epigenetic and gene expression changes. OS is a pediatric bone tumor with characteristically high level of numerical and structural chromosomal changes. Furthermore, little is known about DNA methylation changes in OS. Our objective was to develop an integrative approach for analysis of high-resolution epigenomic, genomic, and gene expression profiles in order to identify functional epi/genomic differences between OS cell lines and normal human osteoblasts. A combination of Affymetrix Promoter Tilling Arrays for DNA methylation, Agilent array-CGH platform for genomic imbalance and Affymetrix Gene 1.0 platform for gene expression analysis was used. As a result, an integrative high-resolution approach for interrogation of genome-wide tumour-specific changes in DNA methylation was developed. This approach was used to provide the first genomic DNA methylation maps, and to identify and validate genes with aberrant DNA methylation in OS cell lines. This first integrative analysis of global cancer-related changes in DNA methylation, genomic imbalance, and gene expression has provided comprehensive evidence of the cumulative roles of epigenetic and genetic mechanisms in deregulation of gene expression networks.

  4. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  5. The Plasmodium apicoplast genome: conserved structure and close relationship of P. ovale to rodent malaria parasites.

    Science.gov (United States)

    Arisue, Nobuko; Hashimoto, Tetsuo; Mitsui, Hideya; Palacpac, Nirianne M Q; Kaneko, Akira; Kawai, Satoru; Hasegawa, Masami; Tanabe, Kazuyuki; Horii, Toshihiro

    2012-09-01

    Apicoplast, a nonphotosynthetic plastid derived from secondary symbiotic origin, is essential for the survival of malaria parasites of the genus Plasmodium. Elucidation of the evolution of the apicoplast genome in Plasmodium species is important to better understand the functions of the organelle. However, the complete apicoplast genome is available for only the most virulent human malaria parasite, Plasmodium falciparum. Here, we obtained the near-complete apicoplast genome sequences from eight Plasmodium species that infect a wide variety of vertebrate hosts and performed structural and phylogenetic analyses. We found that gene repertoire, gene arrangement, and other structural attributes were highly conserved. Phylogenetic reconstruction using 30 protein-coding genes of the apicoplast genome inferred, for the first time, a close relationship between P. ovale and rodent parasites. This close relatedness was robustly supported using multiple evolutionary assumptions and models. The finding suggests that an ancestral host switch occurred between rodent and human Plasmodium parasites.

  6. A genome-wide 20 K citrus microarray for gene expression analysis.

    Science.gov (United States)

    Martinez-Godoy, M Angeles; Mauri, Nuria; Juarez, Jose; Marques, M Carmen; Santiago, Julia; Forment, Javier; Gadea, Jose

    2008-07-03

    Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database 1 was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability. This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global studies in citrus by using it to catalogue genes expressed in

  7. A genome-wide 20 K citrus microarray for gene expression analysis

    Directory of Open Access Journals (Sweden)

    Gadea Jose

    2008-07-01

    Full Text Available Abstract Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database 1 was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability. Conclusion This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global

  8. Modeling chromosomes in mouse to explore the function of genes, genomic disorders, and chromosomal organization.

    Directory of Open Access Journals (Sweden)

    Véronique Brault

    2006-07-01

    Full Text Available One of the challenges of genomic research after the completion of the human genome project is to assign a function to all the genes and to understand their interactions and organizations. Among the various techniques, the emergence of chromosome engineering tools with the aim to manipulate large genomic regions in the mouse model offers a powerful way to accelerate the discovery of gene functions and provides more mouse models to study normal and pathological developmental processes associated with aneuploidy. The combination of gene targeting in ES cells, recombinase technology, and other techniques makes it possible to generate new chromosomes carrying specific and defined deletions, duplications, inversions, and translocations that are accelerating functional analysis. This review presents the current status of chromosome engineering techniques and discusses the different applications as well as the implication of these new techniques in future research to better understand the function of chromosomal organization and structures.

  9. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  10. Mining Bacterial Genomes for Secondary Metabolite Gene Clusters.

    Science.gov (United States)

    Adamek, Martina; Spohn, Marius; Stegmann, Evi; Ziemert, Nadine

    2017-01-01

    With the emergence of bacterial resistance against frequently used antibiotics, novel antibacterial compounds are urgently needed. Traditional bioactivity-guided drug discovery strategies involve laborious screening efforts and display high rediscovery rates. With the progress in next generation sequencing methods and the knowledge that the majority of antibiotics in clinical use are produced as secondary metabolites by bacteria, mining bacterial genomes for secondary metabolites with antimicrobial activity is a promising approach, which can guide a more time and cost-effective identification of novel compounds. However, what sounds easy to accomplish, comes with several challenges. To date, several tools for the prediction of secondary metabolite gene clusters are available, some of which are based on the detection of signature genes, while others are searching for specific patterns in gene content or regulation.Apart from the mere identification of gene clusters, several other factors such as determining cluster boundaries and assessing the novelty of the detected cluster are important. For this purpose, comparison of the predicted secondary metabolite genes with different cluster and compound databases is necessary. Furthermore, it is advisable to classify detected clusters into gene cluster families. So far, there is no standardized procedure for genome mining; however, different approaches to overcome all of these challenges exist and are addressed in this chapter. We give practical guidance on the workflow for secondary metabolite gene cluster identification, which includes the determination of gene cluster boundaries, addresses problems occurring with the use of draft genomes, and gives an outlook on the different methods for gene cluster classification. Based on comprehensible examples a protocol is set, which should enable the readers to mine their own genome data for interesting secondary metabolites.

  11. Analyses of the complete genome and gene expression of chloroplast of sweet potato [Ipomoea batata].

    Science.gov (United States)

    Yan, Lang; Lai, Xianjun; Li, Xuedan; Wei, Changhe; Tan, Xuemei; Zhang, Yizheng

    2015-01-01

    Sweet potato [Ipomoea batatas (L.) Lam] ranks among the top seven most important food crops cultivated worldwide and is hexaploid plant (2n=6x=90) in the Convolvulaceae family with a genome size between 2,200 to 3,000 Mb. The genomic resources for this crop are deficient due to its complicated genetic structure. Here, we report the complete nucleotide sequence of the chloroplast (cp) genome of sweet potato, which is a circular molecule of 161,303 bp in the typical quadripartite structure with large (LSC) and small (SSC) single-copy regions separated by a pair of inverted repeats (IRs). The chloroplast DNA contains a total of 145 genes, including 94 protein-encoding genes of which there are 72 single-copy and 11 double-copy genes. The organization and structure of the chloroplast genome (gene content and order, IR expansion/contraction, random repeating sequences, structural rearrangement) of sweet potato were compared with those of Ipomoea (L.) species and some basal important angiosperms, respectively. Some boundary gene-flow and gene gain-and-loss events were identified at intra- and inter-species levels. In addition, by comparing with the transcriptome sequences of sweet potato, the RNA editing events and differential expressions of the chloroplast functional-genes were detected. Moreover, phylogenetic analysis was conducted based on 77 protein-coding genes from 33 taxa and the result may contribute to a better understanding of the evolution progress of the genus Ipomoea (L.), including phylogenetic relationships, intraspecific differentiation and interspecific introgression.

  12. [Evolution of gene orders in genomes of cyanobacteria].

    Science.gov (United States)

    Markov, A V; Zakharov, I A

    2009-08-01

    Genomes of 23 strains of cyanobacteria were comparatively analyzed using quantitative methods of estimation of gene order similarity. It has been found that reconstructions of phylogenesis of cyanobacteria based on the comparison of the orders of genes in chromosomes and nucleotide sequences appear to be similar. This confirms the applicability of quantitative measures of similarity of gene orders for phylogenetic reconstructions. In the evolution of marine unicellular plankton cyanobacteria, genome rearrangements are fixed with a low rate (about 3% of gene order changes per 1% of 16S rRNA changes), whereas in other groups of cyanobacteria the gene order can change several times more rapidly. The gene orders in genomes of cyanobacteria and chloroplasts preserve a considerable degree of similarity. The closest relatives of chloroplasts among the analyzed cyanobacteria are likely to be strains from hot springs belonging to the genus Synechococcus. Comparative analysis of gene orders and nucleotide sequences strongly suggests that Synechococcus strains from diferent environments (sea, fresh waters, hot springs) are not related and belong to evolutionally distant lines.

  13. Gene mutations of acute myeloid leukemia in the genome era.

    Science.gov (United States)

    Naoe, Tomoki; Kiyoi, Hitoshi

    2013-02-01

    Ten years ago, gene mutations found in acute myeloid leukemia (AML) were conceptually grouped into class I mutation, which causes constitutive activation of intracellular signals that contribute to the growth and survival, and class II mutation, which blocks differentiation and/or enhance self-renewal by altered transcription factors. A cooperative model between two classes of mutations has been suggested by murine experiments and partly supported by epidemiological findings. In the last 5 years, comprehensive genomic analysis proceeded to find new gene mutations, which are found in the epigenome-associated enzymes and the molecules never noticed so far. These new mutations apparently increase the complexity and heterogeneity of AML. Although a long list of gene mutations might have been compiled, the entire picture of molecular pathogenesis in AML remains to be elucidated because gene rearrangement, gene copy number, DNA methylation and expression profiles are not fully studied in conjunction with gene mutations. Comprehensive genome research will deepen the understanding of AML to promote the development of new classification and treatment. This review focuses on gene mutations that were recently discovered by genome sequencing.

  14. Conservation of ribosomal protein gene ordering in 16 complete genomes

    Institute of Scientific and Technical Information of China (English)

    王宁; 陈润生; 王永雄

    2000-01-01

    The organization of ribosomal proteins in 16 prokaryotic genomes was studied as an example of comparative genome analyses of gene systems. Hypothetical ribosomal protein-containing operons were constructed. These operons also contained putative genes and other non-ribosomal genes. The correspondences among these genes across different organisms were clarified by sequence homology computations. In this way a cross tabulation of 70 ribosomal proteins genes was constructed. On average, these were organized into 9-14 operons in each genome. There were also 25 non-ribosomal or putative genes in these mainly ribosomal protein operons. Hence the table contains 95 genes in total. It was found that: (i) the conservation of the block of about 20 r-proteins in the L3 and L4 operons across almost the entire eubacteria and ar-chaebacteria is remarkable; (ii) some operons only belong to eubacteria or archaebacte-ria; (iii) although the ribosomal protein operons are highly conserved within domain, there are fine variat

  15. Conservation of ribosomal protein gene ordering in 16 complete genomes

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    The organization of ribosomal proteins in 16 prokaryotic genomes was studied as an example of comparative genome analyses of gene systems. Hypothetical ribosomal protein-containing operons were constructed. These operons also contained putative genes and other non-ribosomal genes. The correspondences among these genes across different organisms were clarified by sequence homology computations. In this way a cross tabulation of 70 ribosomal proteins genes was constructed. On average, these were organized into 9-14 operons in each genome. There were also 25 non-ribosomal or putative genes in these mainly ribosomal protein operons. Hence the table contains 95 genes in total. It was found that: (i) the conservation of the block of about 20 r-proteins in the L3 and L4 operons across almost the entire eubacteria and archaebacteria is remarkable; (ii) some operons only belong to eubacteria or archaebacteria; (iii) although the ribosomal protein operons are highly conserved within domain, there are fine variations in some operons across different organisms within each domain, and these variations are informative on the evolutionary relations among the organisms. This method provides a new potential for studying the origin and evolution of old species.

  16. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    Science.gov (United States)

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  17. Ancient signals: comparative genomics of plant MAPK and MAPKK gene families

    DEFF Research Database (Denmark)

    Hamel, Louis-Philippe; Nicole, Marie-Claude; Sritubtim, Somrudee;

    2006-01-01

    MAPK signal transduction modules play crucial roles in regulating many biological processes in plants, and their components are encoded by highly conserved genes. The recent availability of genome sequences for rice and poplar now makes it possible to examine how well the previously described...... Arabidopsis MAPK and MAPKK gene family structures represent the broader evolutionary situation in plants, and analysis of gene expression data for MPK and MKK genes in all three species allows further refinement of those families, based on functionality. The Arabidopsis MAPK nomenclature appears sufficiently...... robust to allow it to be usefully extended to other well-characterized plant systems....

  18. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  19. Chloroplast Genome Analysis of Resurrection Tertiary Relict Haberlea rhodopensis Highlights Genes Important for Desiccation Stress Response.

    Science.gov (United States)

    Ivanova, Zdravka; Sablok, Gaurav; Daskalova, Evelina; Zahmanova, Gergana; Apostolova, Elena; Yahubyan, Galina; Baev, Vesselin

    2017-01-01

    Haberlea rhodopensis is a paleolithic tertiary relict species, best known as a resurrection plant with remarkable tolerance to desiccation. When exposed to severe drought stress, H. rhodopensis shows an ability to maintain the structural integrity of its photosynthetic apparatus, which re-activates easily upon rehydration. We present here the results from the assembly and annotation of the chloroplast (cp) genome of H. rhodopensis, which was further subjected to comparative analysis with the cp genomes of closely related species. H. rhodopensis showed a cp genome size of 153,099 bp, harboring a pair of inverted repeats (IR) of 25,415 bp separated by small and large copy regions (SSC and LSC) of 17,826 and 84,443 bp. The genome structure, gene order, GC content and codon usage are similar to those of the typical angiosperm cp genomes. The genome hosts 137 genes representing 70.66% of the plastome, which includes 86 protein-coding genes, 36 tRNAs, and 4 rRNAs. A comparative plastome analysis with other closely related Lamiales members revealed conserved gene order in the IR and LSC/SSC regions. A phylogenetic analysis based on protein-coding genes from 33 species defines this species as belonging to the Gesneriaceae family. From an evolutionary point of view, a site-specific selection analysis detected positively selected sites in 17 genes, most of which are involved in photosynthesis (e.g., rbcL, ndhF, accD, atpE, etc.). The observed codon substitutions may be interpreted as being a consequence of molecular adaptation to drought stress, which ensures an evolutionary advantage to H. rhodopensis.

  20. Genomic analysis reveals extensive gene duplication within the bovine TRB locus

    Directory of Open Access Journals (Sweden)

    Law Andy

    2009-04-01

    Full Text Available Abstract Background Diverse TR and IG repertoires are generated by V(DJ somatic recombination. Genomic studies have been pivotal in cataloguing the V, D, J and C genes present in the various TR/IG loci and describing how duplication events have expanded the number of these genes. Such studies have also provided insights into the evolution of these loci and the complex mechanisms that regulate TR/IG expression. In this study we analyze the sequence of the third bovine genome assembly to characterize the germline repertoire of bovine TRB genes and compare the organization, evolution and regulatory structure of the bovine TRB locus with that of humans and mice. Results The TRB locus in the third bovine genome assembly is distributed over 5 scaffolds, extending to ~730 Kb. The available sequence contains 134 TRBV genes, assigned to 24 subgroups, and 3 clusters of DJC genes, each comprising a single TRBD gene, 5–7 TRBJ genes and a single TRBC gene. Seventy-nine of the TRBV genes are predicted to be functional. Comparison with the human and murine TRB loci shows that the gene order, as well as the sequences of non-coding elements that regulate TRB expression, are highly conserved in the bovine. Dot-plot analyses demonstrate that expansion of the genomic TRBV repertoire has occurred via a complex and extensive series of duplications, predominantly involving DNA blocks containing multiple genes. These duplication events have resulted in massive expansion of several TRBV subgroups, most notably TRBV6, 9 and 21 which contain 40, 35 and 16 members respectively. Similarly, duplication has lead to the generation of a third DJC cluster. Analyses of cDNA data confirms the diversity of the TRBV genes and, in addition, identifies a substantial number of TRBV genes, predominantly from the larger subgroups, which are still absent from the genome assembly. The observed gene duplication within the bovine TRB locus has created a repertoire of phylogenetically

  1. Bacterial genes in the aphid genome: absence of functional gene transfer from Buchnera to its host.

    Directory of Open Access Journals (Sweden)

    Naruo Nikoh

    2010-02-01

    Full Text Available Genome reduction is typical of obligate symbionts. In cellular organelles, this reduction partly reflects transfer of ancestral bacterial genes to the host genome, but little is known about gene transfer in other obligate symbioses. Aphids harbor anciently acquired obligate mutualists, Buchnera aphidicola (Gammaproteobacteria, which have highly reduced genomes (420-650 kb, raising the possibility of gene transfer from ancestral Buchnera to the aphid genome. In addition, aphids often harbor other bacteria that also are potential sources of transferred genes. Previous limited sampling of genes expressed in bacteriocytes, the specialized cells that harbor Buchnera, revealed that aphids acquired at least two genes from bacteria. The newly sequenced genome of the pea aphid, Acyrthosiphon pisum, presents the first opportunity for a complete inventory of genes transferred from bacteria to the host genome in the context of an ancient obligate symbiosis. Computational screening of the entire A. pisum genome, followed by phylogenetic and experimental analyses, provided strong support for the transfer of 12 genes or gene fragments from bacteria to the aphid genome: three LD-carboxypeptidases (LdcA1, LdcA2,psiLdcA, five rare lipoprotein As (RlpA1-5, N-acetylmuramoyl-L-alanine amidase (AmiD, 1,4-beta-N-acetylmuramidase (bLys, DNA polymerase III alpha chain (psiDnaE, and ATP synthase delta chain (psiAtpH. Buchnera was the apparent source of two highly truncated pseudogenes (psiDnaE and psiAtpH. Most other transferred genes were closely related to genes from relatives of Wolbachia (Alphaproteobacteria. At least eight of the transferred genes (LdcA1, AmiD, RlpA1-5, bLys appear to be functional, and expression of seven (LdcA1, AmiD, RlpA1-5 are highly upregulated in bacteriocytes. The LdcAs and RlpAs appear to have been duplicated after transfer. Our results excluded the hypothesis that genome reduction in Buchnera has been accompanied by gene transfer to the

  2. Gene discovery in the Acanthamoeba castellanii genome

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain J.; Watkins, Russell F.; Samuelson, John; Spencer,David F.; Majoros, William H.; Gray, Michael W.; Loftus, Brendan J.

    2005-08-01

    Acanthamoeba castellanii is a free-living amoeba found in soil, freshwater, and marine environments and an important predator of bacteria. Acanthamoeba castellanii is also an opportunistic pathogen of clinical interest, responsible for several distinct diseases in humans. In order to provide a genomic platform for the study of this ubiquitous and important protist, we generated a sequence survey of approximately 0.5 x coverage of the genome. The data predict that A. castellanii exhibits a greater biosynthetic capacity than the free-living Dictyostelium discoideum and the parasite Entamoeba histolytica, providing an explanation for the ability of A. castellanii to inhabit adversity of environments. Alginate lyase may provide access to bacteria within biofilms by breaking down the biofilm matrix, and polyhydroxybutyrate depolymerase may facilitate utilization of the bacterial storage compound polyhydroxybutyrate as a food source. Enzymes for the synthesis and breakdown of cellulose were identified, and they likely participate in encystation and excystation as in D. discoideum. Trehalose-6-phosphate synthase is present, suggesting that trehalose plays a role in stress adaptation. Detection and response to a number of stress conditions is likely accomplished with a large set of signal transduction histidine kinases and a set of putative receptorserine/threonine kinases similar to those found in E. histolytica. Serine, cysteine and metalloproteases were identified, some of which are likely involved in pathogenicity.

  3. Genomic structure and expression of immunoglobulins in Squamata.

    Science.gov (United States)

    Olivieri, David N; Garet, Elina; Estevez, Olivia; Sánchez-Espinel, Christian; Gambón-Deza, Francisco

    2016-04-01

    The Squamata order represents a major evolutionary reptile lineage, yet the structure and expression of immunoglobulins in this order has been scarcely studied in detail. From the genome sequences of four Squamata species (Gekko japonicus, Ophisaurus gracilis, Pogona vitticeps and Ophiophagus hannah) and RNA-seq datasets from 18 other Squamata species, we identified the immunoglobulins present in these animals as well as the tissues in which they are found. All Squamata have at least three immunoglobulin classes; namely, the immunoglobulins M, D, and Y. Unlike mammals, however, we provide evidence that some Squamata lineages possess more than one Cμ gene which is located downstream from the Cδ gene. The existence of two evolutionary lineages of immunoglobulin Y is shown. Additionally, it is demonstrated that while all Squamata species possess the λ light chain, only Iguanidae species possess the κ light chain.

  4. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    IP-seq and small RNA-seq, we delineated the landscape of the promoters with bidirectional transcriptions that yield steady-state RNA in only one directions (Paper III). A subsequent motif analysis enabled us to uncover specific DNA signals – early polyA sites – that make RNA on the reverse strand sensitive...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V......). Gene enrichment analysis on the detected NMD substrates revealed an unappreciated NMD-based regulatory mechanism of the genes hosting multiple intronic snoRNAs, which can facilitate differential expression of individual snoRNAs from a single host gene locus. Finally, supported by RNA-seq and small RNA-seq...

  5. Genome-wide patterns of Arabidopsis gene expression in nature.

    Directory of Open Access Journals (Sweden)

    Christina L Richards

    Full Text Available Organisms in the wild are subject to multiple, fluctuating environmental factors, and it is in complex natural environments that genetic regulatory networks actually function and evolve. We assessed genome-wide gene expression patterns in the wild in two natural accessions of the model plant Arabidopsis thaliana and examined the nature of transcriptional variation throughout its life cycle and gene expression correlations with natural environmental fluctuations. We grew plants in a natural field environment and measured genome-wide time-series gene expression from the plant shoot every three days, spanning the seedling to reproductive stages. We find that 15,352 genes were expressed in the A. thaliana shoot in the field, and accession and flowering status (vegetative versus flowering were strong components of transcriptional variation in this plant. We identified between ∼110 and 190 time-varying gene expression clusters in the field, many of which were significantly overrepresented by genes regulated by abiotic and biotic environmental stresses. The two main principal components of vegetative shoot gene expression (PC(veg correlate to temperature and precipitation occurrence in the field. The largest PC(veg axes included thermoregulatory genes while the second major PC(veg was associated with precipitation and contained drought-responsive genes. By exposing A. thaliana to natural environments in an open field, we provide a framework for further understanding the genetic networks that are deployed in natural environments, and we connect plant molecular genetics in the laboratory to plant organismal ecology in the wild.

  6. Genome-wide analysis of homeobox genes from Mesobuthus martensii reveals Hox gene duplication in scorpions.

    Science.gov (United States)

    Di, Zhiyong; Yu, Yao; Wu, Yingliang; Hao, Pei; He, Yawen; Zhao, Huabin; Li, Yixue; Zhao, Guoping; Li, Xuan; Li, Wenxin; Cao, Zhijian

    2015-06-01

    Homeobox genes belong to a large gene group, which encodes the famous DNA-binding homeodomain that plays a key role in development and cellular differentiation during embryogenesis in animals. Here, one hundred forty-nine homeobox genes were identified from the Asian scorpion, Mesobuthus martensii (Chelicerata: Arachnida: Scorpiones: Buthidae) based on our newly assembled genome sequence with approximately 248 × coverage. The identified homeobox genes were categorized into eight classes including 82 families: 67 ANTP class genes, 33 PRD genes, 11 LIM genes, five POU genes, six SINE genes, 14 TALE genes, five CUT genes, two ZF genes and six unclassified genes. Transcriptome data confirmed that more than half of the genes were expressed in adults. The homeobox gene diversity of the eight classes is similar to the previously analyzed Mandibulata arthropods. Interestingly, it is hypothesized that the scorpion M. martensii may have two Hox clusters. The first complete genome-wide analysis of homeobox genes in Chelicerata not only reveals the repertoire of scorpion, arachnid and chelicerate homeobox genes, but also shows some insights into the evolution of arthropod homeobox genes.

  7. Potential of gene drives with genome editing to increase genetic gain in livestock breeding programs.

    Science.gov (United States)

    Gonen, Serap; Jenko, Janez; Gorjanc, Gregor; Mileham, Alan J; Whitelaw, C Bruce A; Hickey, John M

    2017-01-04

    This paper uses simulation to explore how gene drives can increase genetic gain in livestock breeding programs. Gene drives are naturally occurring phenomena that cause a mutation on one chromosome to copy itself onto its homologous chromosome. We simulated nine different breeding and editing scenarios with a common overall structure. Each scenario began with 21 generations of selection, followed by 20 generations of selection based on true breeding values where the breeder used selection alone, selection in combination with genome editing, or selection with genome editing and gene drives. In the scenarios that used gene drives, we varied the probability of successfully incorporating the gene drive. For each scenario, we evaluated genetic gain, genetic variance [Formula: see text], rate of change in inbreeding ([Formula: see text]), number of distinct quantitative trait nucleotides (QTN) edited, rate of increase in favourable allele frequencies of edited QTN and the time to fix favourable alleles. Gene drives enhanced the benefits of genome editing in seven ways: (1) they amplified the increase in genetic gain brought about by genome editing; (2) they amplified the rate of increase in the frequency of favourable alleles and reduced the time it took to fix them; (3) they enabled more rapid targeting of QTN with lesser effect for genome editing; (4) they distributed fixed editing resources across a larger number of distinct QTN across generations; (5) they focussed editing on a smaller number of QTN within a given generation; (6) they reduced the level of inbreeding when editing a subset of the sires; and (7) they increased the efficiency of converting genetic variation into genetic gain. Genome editing in livestock breeding results in short-, medium- and long-term increases in genetic gain. The increase in genetic gain occurs because editing increases the frequency of favourable alleles in the population. Gene drives accelerate the increase in allele frequency

  8. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  9. In-silico human genomics with GeneCards

    Directory of Open Access Journals (Sweden)

    Stelzer Gil

    2011-10-01

    Full Text Available Abstract Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org. This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.

  10. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    Energy Technology Data Exchange (ETDEWEB)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  11. Variations and classification of toxic epitopes related to celiac disease among α-gliadin genes from four Aegilops genomes.

    Science.gov (United States)

    Li, Jie; Wang, Shunli; Li, Shanshan; Ge, Pei; Li, Xiaohui; Ma, Wujun; Zeller, F J; Hsam, Sai L K; Yan, Yueming

    2012-07-01

    The α-gliadins are associated with human celiac disease. A total of 23 noninterrupted full open reading frame α-gliadin genes and 19 pseudogenes were cloned and sequenced from C, M, N, and U genomes of four diploid Aegilops species. Sequence comparison of α-gliadin genes from Aegilops and Triticum species demonstrated an existence of extensive allelic variations in Gli-2 loci of the four Aegilops genomes. Specific structural features were found including the compositions and variations of two polyglutamine domains (QI and QII) and four T cell stimulatory toxic epitopes. The mean numbers of glutamine residues in the QI domain in C and N genomes and the QII domain in C, N, and U genomes were much higher than those in Triticum genomes, and the QI domain in C and N genomes and the QII domain in C, M, N, and U genomes displayed greater length variations. Interestingly, the types and numbers of four T cell stimulatory toxic epitopes in α-gliadins from the four Aegilops genomes were significantly less than those from Triticum A, B, D, and their progenitor genomes. Relationships between the structural variations of the two polyglutamine domains and the distributions of four T cell stimulatory toxic epitopes were found, resulting in the α-gliadin genes from the Aegilops and Triticum genomes to be classified into three groups.

  12. Daysleeper : from genomic parasite to indispensable gene

    NARCIS (Netherlands)

    Knip, Marijn

    2012-01-01

    In this thesis the evolutionary background, function and localization of the domesticated transposase DAYSLEEPER are described. We found that DAYSLEEPER-like genes can be found in angiosperms, but not in lower plants. We also found that DAYSLEEPER interacts with several proteins and is probably

  13. Diversity of 23S rRNA genes within individual prokaryotic genomes.

    Directory of Open Access Journals (Sweden)

    Anna Pei

    Full Text Available BACKGROUND: The concept of ribosomal constraints on rRNA genes is deduced primarily based on the comparison of consensus rRNA sequences between closely related species, but recent advances in whole-genome sequencing allow evaluation of this concept within organisms with multiple rRNA operons. METHODOLOGY/PRINCIPAL FINDINGS: Using the 23S rRNA gene as an example, we analyzed the diversity among individual rRNA genes within a genome. Of 184 prokaryotic species containing multiple 23S rRNA genes, diversity was observed in 113 (61.4% genomes (mean 0.40%, range 0.01%-4.04%. Significant (1.17%-4.04% intragenomic variation was found in 8 species. In 5 of the 8 species, the diversity in the primary structure had only minimal effect on the secondary structure (stem versus loop transition. In the remaining 3 species, the diversity significantly altered local secondary structure, but the alteration appears minimized through complex rearrangement. Intervening sequences (IVS, ranging between 9 and 1471 nt in size, were found in 7 species. IVS in Deinococcus radiodurans and Nostoc sp. encode transposases. T. tengcongensis was the only species in which intragenomic diversity >3% was observed among 4 paralogous 23S rRNA genes. CONCLUSIONS/SIGNIFICANCE: These findings indicate tight ribosomal constraints on individual 23S rRNA genes within a genome. Although classification using primary 23S rRNA sequences could be erroneous, significant diversity among paralogous 23S rRNA genes was observed only once in the 184 species analyzed, indicating little overall impact on the mainstream of 23S rRNA gene-based prokaryotic taxonomy.

  14. Genome Binding and Gene Regulation by Stem Cell Transcription Factors

    NARCIS (Netherlands)

    J.H. Brandsma (Johan)

    2016-01-01

    markdownabstractNearly all cells of an individual organism contain the same genome. However, each cell type transcribes a different set of genes due to the presence of different sets of cell type-specific transcription factors. Such transcription factors bind to regulatory regions such as promoters

  15. Gene hunting : molecular analysis of the chicken genome

    NARCIS (Netherlands)

    Crooijmans, R.P.M.A.

    2000-01-01

    This dissertation describes the development of molecular tools to identify genes that are involved in production and health traits in poultry. To unravel the chicken genome, fluorescent molecular markers (microsatellite markers) were developed and optimized to perform high throughput screening of re

  16. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  17. Re-Examining the Gene in Personalized Genomics

    Science.gov (United States)

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  18. Genome-Wide Analysis of the RNA Helicase Gene Family in Gossypium raimondii

    Directory of Open Access Journals (Sweden)

    Jie Chen

    2014-03-01

    Full Text Available The RNA helicases, which help to unwind stable RNA duplexes, and have important roles in RNA metabolism, belong to a class of motor proteins that play important roles in plant development and responses to stress. Although this family of genes has been the subject of systematic investigation in Arabidopsis, rice, and tomato, it has not yet been characterized in cotton. In this study, we identified 161 putative RNA helicase genes in the genome of the diploid cotton species Gossypium raimondii. We classified these genes into three subfamilies, based on the presence of either a DEAD-box (51 genes, DEAH-box (52 genes, or DExD/H-box (58 genes in their coding regions. Chromosome location analysis showed that the genes that encode RNA helicases are distributed across all 13 chromosomes of G. raimondii. Syntenic analysis revealed that 62 of the 161 G. raimondii helicase genes (38.5% are within the identified syntenic blocks. Sixty-six (40.99% helicase genes from G. raimondii have one or several putative orthologs in tomato. Additionally, GrDEADs have more conserved gene structures and more simple domains than GrDEAHs and GrDExD/Hs. Transcriptome sequencing data demonstrated that many of these helicases, especially GrDEADs, are highly expressed at the fiber initiation stage and in mature leaves. To our knowledge, this is the first report of a genome-wide analysis of the RNA helicase gene family in cotton.

  19. SINEs, evolution and genome structure in the opossum.

    Science.gov (United States)

    Gu, Wanjun; Ray, David A; Walker, Jerilyn A; Barnes, Erin W; Gentles, Andrew J; Samollow, Paul B; Jurka, Jerzy; Batzer, Mark A; Pollock, David D

    2007-07-01

    Short INterspersed Elements (SINEs) are non-autonomous retrotransposons, usually between 100 and 500 base pairs (bp) in length, which are ubiquitous components of eukaryotic genomes. Their activity, distribution, and evolution can be highly informative on genomic structure and evolutionary processes. To determine recent activity, we amplified more than one hundred SINE1 loci in a panel of 43 M. domestica individuals derived from five diverse geographic locations. The SINE1 family has expanded recently enough that many loci were polymorphic, and the SINE1 insertion-based genetic distances among populations reflected geographic distance. Genome-wide comparisons of SINE1 densities and GC content revealed that high SINE1 density is associated with high GC content in a few long and many short spans. Young SINE1s, whether fixed or polymorphic, showed an unbiased GC content preference for insertion, indicating that the GC preference accumulates over long time periods, possibly in periodic bursts. SINE1 evolution is thus broadly similar to human Alu evolution, although it has an independent origin. High GC content adjacent to SINE1s is strongly correlated with bias towards higher AT to GC substitutions and lower GC to AT substitutions. This is consistent with biased gene conversion, and also indicates that like chickens, but unlike eutherian mammals, GC content heterogeneity (isochore structure) is reinforced by substitution processes in the M. domestica genome. Nevertheless, both high and low GC content regions are apparently headed towards lower GC content equilibria, possibly due to a relative shift to lower recombination rates in the recent Monodelphis ancestral lineage. Like eutherians, metatherian (marsupial) mammals have evolved high CpG substitution rates, but this is apparently a convergence in process rather than a shared ancestral state.

  20. Diversity of 5S rRNA genes within individual prokaryotic genomes.

    Science.gov (United States)

    Pei, Anna; Li, Hongru; Oberdorf, William E; Alekseyenko, Alexander V; Parsons, Tamasha; Yang, Liying; Gerz, Erika A; Lee, Peng; Xiang, Charlie; Nossa, Carlos W; Pei, Zhiheng

    2012-10-01

    We examined intragenomic variation of paralogous 5S rRNA genes to evaluate the concept of ribosomal constraints. In a dataset containing 1161 genomes from 779 unique species, 96 species exhibited > 3% diversity. Twenty-seven species with > 10% diversity contained a total of 421 mismatches between all pairs of the most dissimilar copies of 5S rRNA genes. The large majority (401 of 421) of the diversified positions were conserved at the secondary structure level. The high diversity was associated with partial rRNA operon, split operon, or spacer length-related divergence. In total, these findings indicated that there are tight ribosomal constraints on paralogous 5S rRNA genes in a genome despite of the high degree of diversity at the primary structure level. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  1. The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types.

    Directory of Open Access Journals (Sweden)

    Zheng Wang

    Full Text Available The spatial conformation of a genome plays an important role in the long-range regulation of genome-wide gene expression and methylation, but has not been extensively studied due to lack of genome conformation data. The recently developed chromosome conformation capturing techniques such as the Hi-C method empowered by next generation sequencing can generate unbiased, large-scale, high-resolution chromosomal interaction (contact data, providing an unprecedented opportunity to investigate the spatial structure of a genome and its applications in gene regulation, genomics, epigenetics, and cell biology. In this work, we conducted a comprehensive, large-scale computational analysis of this new stream of genome conformation data generated for three different human leukemia cells or cell lines by the Hi-C technique. We developed and applied a set of bioinformatics methods to reliably generate spatial chromosomal contacts from high-throughput sequencing data and to effectively use them to study the properties of the genome structures in one-dimension (1D and two-dimension (2D. Our analysis demonstrates that Hi-C data can be effectively applied to study tissue-specific genome conformation, chromosome-chromosome interaction, chromosomal translocations, and spatial gene-gene interaction and regulation in a three-dimensional genome of primary tumor cells. Particularly, for the first time, we constructed genome-scale spatial gene-gene interaction network, transcription factor binding site (TFBS - TFBS interaction network, and TFBS-gene interaction network from chromosomal contact information. Remarkably, all these networks possess the properties of scale-free modular networks.

  2. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  3. The Rhodomonas salina mitochondrial genome: bacteria-like operons, compact gene arrangement and complex repeat region.

    Science.gov (United States)

    Hauth, Amy M; Maier, Uwe G; Lang, B Franz; Burger, Gertraud

    2005-01-01

    To gain insight into the mitochondrial genome structure and gene content of a putatively ancestral group of eukaryotes, the cryptophytes, we sequenced the complete mitochondrial DNA of Rhodomonas salina. The 48 063 bp circular-mapping molecule codes for 2 rRNAs, 27 tRNAs and 40 proteins including 23 components of oxidative phosphorylation, 15 ribosomal proteins and two subunits of tat translocase. One potential protein (ORF161) is without assigned function. Only two introns occur in the genome; both are present within cox1 belong to group II and contain RT open reading frames. Primitive genome features include bacteria-like rRNAs and tRNAs, ribosomal protein genes organized in large clusters resembling bacterial operons and the presence of the otherwise rare genes such as rps1 and tatA. The highly compact gene organization contrasts with the presence of a 4.7 kb long, repeat-containing intergenic region. Repeat motifs approximately 40-700 bp long occur up to 31 times, forming a complex repeat structure. Tandem repeats are the major arrangement but the region also includes a large, approximately 3 kb, inverted repeat and several potentially stable approximately 40-80 bp long hairpin structures. We provide evidence that the large repeat region is involved in replication and transcription initiation, predict a promoter motif that occurs in three locations and discuss two likely scenarios of how this highly structured repeat region might have evolved.

  4. Whole genome amplification of DNA for genotyping pharmacogenetics candidate genes.

    Directory of Open Access Journals (Sweden)

    Santosh ePhilips

    2012-03-01

    Full Text Available Whole genome amplification (WGA technologies can be used to amplify genomic DNA when only small amounts of DNA are available. The Multiple Displacement Amplification Phi polymerase based amplification has been shown to accurately amplify DNA for a variety of genotyping assays; however, it has not been tested for genotyping many of the clinically relevant genes important for pharmacogenetic studies, such as the cytochrome P450 genes, that are typically difficult to genotype due to multiple pseudogenes, copy number variations, and high similarity to other related genes. We evaluated whole genome amplified samples for Taqman™ genotyping of SNPs in a variety of pharmacogenetic genes. In 24 DNA samples from the Coriell human diversity panel, the call rates and concordance between amplified (~200-fold amplification and unamplified samples was 100% for two SNPs in CYP2D6 and one in ESR1. In samples from a breast cancer clinical trial (Trial 1, we compared the genotyping results in samples before and after WGA for four SNPs in CYP2D6, one SNP in CYP2C19, one SNP in CYP19A1, two SNPs in ESR1, and two SNPs in ESR2. The concordance rates were all >97%. Finally, we compared the allele frequencies of 143 SNPs determined in Trial 1 (whole genome amplified DNA to the allele frequencies determined in unamplified DNA samples from a separate trial (Trial 2 that enrolled a similar population. The call rates and allele frequencies between the two trials were 98% and 99.7%, respectively. We conclude that the whole genome amplified DNA is suitable for Taqman™ genotyping for a wide variety of pharmacogenetically relevant SNPs.

  5. Genome-Wide Associations of Gene Expression Variation in Humans.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  6. Genome-wide associations of gene expression variation in humans.

    Directory of Open Access Journals (Sweden)

    Barbara E Stranger

    2005-12-01

    Full Text Available The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis- to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

  7. Construction of genomic libraries of Cryptosporidium parvum and identification of antigen-encoding genes.

    Science.gov (United States)

    Dykstra, C C; Blagburn, B L; Tidwell, R R

    1991-01-01

    Genomic libraries have been constructed from bovine C. parvum DNA in the lambda ZAP and lambda DASH vectors. Based on an estimated genome size of 2 x 10(4) kilobases (kb), each recombinant library contains greater than 10 genomic equivalents. The average recombinant size for the lambda ZAP library is 2.1 kb and for the lambda DASH library is 14 kb. We have identified genes to major antigens recognized by hyperimmune bovine antiserum. These recombinants are currently being purified and characterized. Limited DNA sequence analysis of random C. parvum clones confirms suggestions that the genome is quite AT-rich. The DNA sequence of random lambda ZAP fusion proteins has identified a potential ATPase, a structural protein and a DNA-binding protein.

  8. Comparative genome analysis of deleted genes in Shigella flexneri 2a strain 301

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    Comparative genome analysis is performed between Shigella flexneri 2a strain 301 and its close relatives, the nonpathogenic E. Coli K-12 strain MG1655. Result shows that there are 136 DNA segments whose size is larger than 1000 bp absent from Shigella flexneri 2a strain 301, which is up to 717253 bp in total length. These deleted segments altogether contain 670 open reading frames (ORFs). Prediction of these ORFs indicates that there are 40% genes of unknown function. The other genes of definite functions encode metabolic enzymes, structure proteins, transcription regulatory factors and some elements correlated with horizontal transfer. Here we compare the complete genomic sequences of the two closely related species, which differ in pathogenic phenotype. To our knowledge, this not only reveals the difference of genomic sequence between the two important enteric pathogens for the first time, but also provides valuable clues to further researches in its process of physiological activity, pathogenesis and the evolution of enteric bacteria.

  9. Use of tiling array data and RNA secondary structure predictions to identify noncoding RNA genes

    DEFF Research Database (Denmark)

    Weile, Christian; Gardner, Paul P; Hedegaard, Mads M

    2007-01-01

    BACKGROUND: Within the last decade a large number of noncoding RNA genes have been identified, but this may only be the tip of the iceberg. Using comparative genomics a large number of sequences that have signals concordant with conserved RNA secondary structures have been discovered in the human...... genome. Moreover, genome wide transcription profiling with tiling arrays indicate that the majority of the genome is transcribed. RESULTS: We have combined tiling array data with genome wide structural RNA predictions to search for novel noncoding and structural RNA genes that are expressed in the human...... of 3 of the hairpin structures and 3 out of 9 high covariance structures in SK-N-AS cells. CONCLUSION: Our results demonstrate that many human noncoding, structured and conserved RNA genes remain to be discovered and that tissue specific tiling array data can be used in combination with computational...

  10. Corynebacterium diphtheriae: genome diversity, population structure and genotyping perspectives.

    Science.gov (United States)

    Mokrousov, Igor

    2009-01-01

    The epidemic re-emergence of diphtheria in Russia and the Newly Independent States (NIS) of the former Soviet Union in the 1990s demonstrated the continued threat of this thought to be rare disease. The bacteriophage encoded toxin is a main virulence factor of Corynebacterium diphtheriae, however, an analysis of the first complete genome sequence of C. diphtheriae revealed a recent acquisition of other pathogenicity factors including iron-uptake systems, adhesins and fimbrial proteins as indeed this extracellular pathogen has more possibilities for lateral gene transfer than, e.g., its close relative, mainly intracellular Mycobacterium tuberculosis. C. diphtheriae appears to have a phylogeographical structure mainly represented by area-specific variants whose circulation is under strong influence of human host factors, including health control measures, first of all, vaccination, and social economic conditions. This framework core population structure may be challenged by importation of the endemic and eventually toxigenic strains from new areas thus leading to localized or large epidemics caused directly by imported strains or by bacteriophage-lysogenized indigenous strains converted into toxin production. A feature of C. diphtheriae co-existence with humans is its periodicity: following large epidemic in the 1990s, the present period is marked by increasing heterogeneity of the circulating populations whereas re-emergence of new toxigenic variants along with persistent circulation of invasive non-toxigenic strains appear alarming. To identify and rapidly monitor subtle changes in the genome structure at an infraclonal level during and between epidemics, portable and discriminatory typing methods of C. diphtheriae are still needed. In this view, CRISPRs and minisatellites are promising genomic markers for development of high-resolution typing schemes and databasing of C. diphtheriae.

  11. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    Energy Technology Data Exchange (ETDEWEB)

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  12. Invisible cities: segregated domains in the yeast genome with distinct structural and functional attributes.

    Science.gov (United States)

    Nikolaou, Christoforos

    2017-08-05

    Recent advances in our understanding of the three-dimensional organization of the eukaryotic nucleus have rendered the spatial distribution of genes increasingly relevant. In a recent work (Tsochatzidou et al., Nucleic Acids Res 45:5818-5828, 2017), we proposed the existence of a functional compartmentalization of the yeast genome according to which, genes occupying the chromosomal regions at the nuclear periphery have distinct structural, functional and evolutionary characteristics compared to their centromeric-proximal counterparts. Around the same time, it was also shown that the genome of Saccharomyces cerevisiae is organized in topologically associated domains (TADs), which are largely associated with the replication timing. In this work, we proceed to investigate whether such units of three-dimensional genomic organization can be linked to transcriptional activity as a driving force for the shaping of genomic architecture. Through the application of a simple boundary-calling criterion in genome-wide 3C data, we define ~100 TAD-like domains which can be clustered in six different classes with radically different nucleosomal organizations, significant variations in transcription factor binding and uneven chromosomal distribution. Approximately ~20% of the genome is found to be confined in regions with "closed" chromatin structure around gene promoters. Most interestingly, we find both "open" and "closed" regions to be segregated, in the sense that they tend to avoid inter-chromosomal interactions. Our data further enforce the notion of a marked compartmentalization of the yeast genome in isolated territories, with implications in its function and evolution.

  13. Genome-Wide Characterization and Expression Profiles of the Superoxide Dismutase Gene Family in Gossypium

    Directory of Open Access Journals (Sweden)

    Jingbo Zhang

    2016-01-01

    Full Text Available Superoxide dismutase (SOD as a group of significant and ubiquitous enzymes plays a critical function in plant growth and development. Previously this gene family has been investigated in Arabidopsis and rice; it has not yet been characterized in cotton. In our study, it was the first time for us to perform a genome-wide analysis of SOD gene family in cotton. Our results showed that 10 genes of SOD gene family were identified in Gossypium arboreum and Gossypium raimondii, including 6 Cu-Zn-SODs, 2 Fe-SODs, and 2 Mn-SODs. The chromosomal distribution analysis revealed that SOD genes are distributed across 7 chromosomes in Gossypium arboreum and 8 chromosomes in Gossypium raimondii. Segmental duplication is predominant duplication event and major contributor for expansion of SOD gene family. Gene structure and protein structure analysis showed that SOD genes have conserved exon/intron arrangement and motif composition. Microarray-based expression analysis revealed that SOD genes have important function in abiotic stress. Moreover, the tissue-specific expression profile reveals the functional divergence of SOD genes in different organs development of cotton. Taken together, this study has imparted new insights into the putative functions of SOD gene family in cotton. Findings of the present investigation could help in understanding the role of SOD gene family in various aspects of the life cycle of cotton.

  14. Genomic architecture of MHC-linked odorant receptor gene repertoires among 16 vertebrate species.

    Science.gov (United States)

    Santos, Pablo Sandro Carvalho; Kellermann, Thomas; Uchanska-Ziegler, Barbara; Ziegler, Andreas

    2010-09-01

    The recent sequencing and assembly of the genomes of different organisms have shown that almost all vertebrates studied in detail so far have one or more clusters of genes encoding odorant receptors (OR) in close physical linkage to the major histocompatibility complex (MHC). It has been postulated that MHC-linked OR genes could be involved in MHC-influenced mate choice, comprising both pre- as well as post-copulatory mechanisms. We have therefore carried out a systematic comparison of protein sequences of these receptors from the genomes of man, chimpanzee, gorilla, orangutan, rhesus macaque, mouse, rat, dog, cat, cow, pig, horse, elephant, opossum, frog and zebra fish (amounting to a total of 559 protein sequences) in order to identify OR families exhibiting evolutionarily conserved MHC linkage. In addition, we compared the genomic structure of this region within these 16 species, accounting for presence or absence of OR gene families, gene order, transcriptional orientation and linkage to the MHC or framework genes. The results are presented in the form of gene maps and phylogenetic analyses that reveal largely concordant repertoires of gene families, at least among tetrapods, although each of the eight taxa studied (primates, rodents, ungulates, carnivores, proboscids, marsupials, amphibians and teleosts) exhibits a typical architecture of MHC (or MHC framework loci)-linked OR genes. Furthermore, the comparison of the genomic organization of this region has implications for phylogenetic relationships between closely related taxa, especially in disputed cases such as the evolutionary history of even- and odd-toed ungulates and carnivores. Finally, the largely conserved linkage between distinct OR genes and the MHC supports the concept that particular alleles within a given haplotype function in a concerted fashion during self-/non-self-discrimination processes in reproduction.

  15. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  16. Evolutionary maintenance of filovirus-like genes in bat genomes

    Directory of Open Access Journals (Sweden)

    Taylor Derek J

    2011-11-01

    Full Text Available Abstract Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats. We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained

  17. Genome-Wide Architecture of Disease Resistance Genes in Lettuce.

    Science.gov (United States)

    Christopoulou, Marilena; Wo, Sebastian Reyes-Chin; Kozik, Alex; McHale, Leah K; Truco, Maria-Jose; Wroblewski, Tadeusz; Michelmore, Richard W

    2015-10-08

    Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes. Copyright © 2015 Christopoulou et al.

  18. Genomic Analyses of Bacterial Porin-Cytochrome Gene Clusters

    Directory of Open Access Journals (Sweden)

    Liang eShi

    2014-11-01

    Full Text Available The porin-cytochrome (Pcc protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c-type cytochrome (c-Cyt and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteria from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr gene clusters of other Fe(III-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III and Mn(IV oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III and Mn(IV oxides.

  19. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  20. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  1. Systematically fragmented genes in a multipartite mitochondrial genome

    Science.gov (United States)

    Vlcek, Cestmir; Marande, William; Teijeiro, Shona; Lukeš, Julius; Burger, Gertraud

    2011-01-01

    Arguably, the most bizarre mitochondrial DNA (mtDNA) is that of the euglenozoan eukaryote Diplonema papillatum. The genome consists of numerous small circular chromosomes none of which appears to encode a complete gene. For instance, the cox1 coding sequence is spread out over nine different chromosomes in non-overlapping pieces (modules), which are transcribed separately and joined to a contiguous mRNA by trans-splicing. Here, we examine how many genes are encoded by Diplonema mtDNA and whether all are fragmented and their transcripts trans-spliced. Module identification is challenging due to the sequence divergence of Diplonema mitochondrial genes. By employing most sensitive protein profile search algorithms and comparing genomic with cDNA sequence, we recognize a total of 11 typical mitochondrial genes. The 10 protein-coding genes are systematically chopped up into three to 12 modules of 60–350 bp length. The corresponding mRNAs are all trans-spliced. Identification of ribosomal RNAs is most difficult. So far, we only detect the 3′-module of the large subunit ribosomal RNA (rRNA); it does not trans-splice with other pieces. The small subunit rRNA gene remains elusive. Our results open new intriguing questions about the biochemistry and evolution of mitochondrial trans-splicing in Diplonema. PMID:20935050

  2. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes.

    Science.gov (United States)

    Dunning Hotopp, Julie C; Grifantini, Renata; Kumar, Nikhil; Tzeng, Yih Ling; Fouts, Derrick; Frigimelica, Elisabetta; Draghi, Monia; Giuliani, Marzia Monica; Rappuoli, Rino; Stephens, David S; Grandi, Guido; Tettelin, Hervé

    2006-12-01

    To better understand Neisseria meningitidis genomes and virulence, microarray comparative genome hybridization (mCGH) data were collected from one Neisseria cinerea, two Neisseria lactamica, two Neisseria gonorrhoeae and 48 Neisseria meningitidis isolates. For N. meningitidis, these isolates are from diverse clonal complexes, invasive and carriage strains, and all major serogroups. The microarray platform represented N. meningitidis strains MC58, Z2491 and FAM18, and N. gonorrhoeae FA1090. By comparing hybridization data to genome sequences, the core N. meningitidis genome and insertions/deletions (e.g. capsule locus, type I secretion system) related to pathogenicity were identified, including further characterization of the capsule locus, bioinformatics analysis of a type I secretion system, and identification of some metabolic pathways associated with intracellular survival in pathogens. Hybridization data clustered meningococcal isolates from similar clonal complexes that were distinguished by the differential presence of six distinct islands of horizontal transfer. Several of these islands contained prophage or other mobile elements, including a novel prophage and a transposon carrying portions of a type I secretion system. Acquisition of some genetic islands appears to have occurred in multiple lineages, including transfer between N. lactamica and N. meningitidis. However, island acquisition occurs infrequently, such that the genomic-level relationship is not obscured within clonal complexes. The N. meningitidis genome is characterized by the horizontal acquisition of multiple genetic islands; the study of these islands reveals important sets of genes varying between isolates and likely to be related to pathogenicity.

  3. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes.

    Science.gov (United States)

    Chan, Patricia P; Lowe, Todd M

    2016-01-01

    Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/.

  4. Genome-wide analysis of homeobox gene family in legumes: identification, gene duplication and expression profiling.

    Science.gov (United States)

    Bhattacharjee, Annapurna; Ghangal, Rajesh; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Homeobox genes encode transcription factors that are known to play a major role in different aspects of plant growth and development. In the present study, we identified homeobox genes belonging to 14 different classes in five legume species, including chickpea, soybean, Medicago, Lotus and pigeonpea. The characteristic differences within homeodomain sequences among various classes of homeobox gene family were quite evident. Genome-wide expression analysis using publicly available datasets (RNA-seq and microarray) indicated that homeobox genes are differentially expressed in various tissues/developmental stages and under stress conditions in different legumes. We validated the differential expression of selected chickpea homeobox genes via quantitative reverse transcription polymerase chain reaction. Genome duplication analysis in soybean indicated that segmental duplication has significantly contributed in the expansion of homeobox gene family. The Ka/Ks ratio of duplicated homeobox genes in soybean showed that several members of this family have undergone purifying selection. Moreover, expression profiling indicated that duplicated genes might have been retained due to sub-functionalization. The genome-wide identification and comprehensive gene expression profiling of homeobox gene family members in legumes will provide opportunities for functional analysis to unravel their exact role in plant growth and development.

  5. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa.

    Directory of Open Access Journals (Sweden)

    Parameswari Paul

    Full Text Available Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa. Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309. Chromosomal mapping of the B. rapa Aux/IAA (BrIAA genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA-BrIAA and 36 cross species (BrIAA-AtIAA IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa.

  6. Transport genes and chemotaxis in Laribacter hongkongensis: a genome-wide analysis

    Directory of Open Access Journals (Sweden)

    Lau Susanna KP

    2011-08-01

    Full Text Available Abstract Background Laribacter hongkongensis is a Gram-negative, sea gull-shaped rod associated with community-acquired gastroenteritis. The bacterium has been found in diverse freshwater environments including fish, frogs and drinking water reservoirs. Using the complete genome sequence data of L. hongkongensis, we performed a comprehensive analysis of putative transport-related genes and genes related to chemotaxis, motility and quorum sensing, which may help the bacterium adapt to the changing environments and combat harmful substances. Results A genome-wide analysis using Transport Classification Database TCDB, similarity and keyword searches revealed the presence of a large diversity of transporters (n = 457 and genes related to chemotaxis (n = 52 and flagellar biosynthesis (n = 40 in the L. hongkongensis genome. The transporters included those from all seven major transporter categories, which may allow the uptake of essential nutrients or ions, and extrusion of metabolic end products and hazardous substances. L. hongkongensis is unique among closely related members of Neisseriaceae family in possessing higher number of proteins related to transport of ammonium, urea and dicarboxylate, which may reflect the importance of nitrogen and dicarboxylate metabolism in this assacharolytic bacterium. Structural modeling of two C4-dicarboxylate transporters showed that they possessed similar structures to the determined structures of other DctP-TRAP transporters, with one having an unusual disulfide bond. Diverse mechanisms for iron transport, including hemin transporters for iron acquisition from host proteins, were also identified. In addition to the chemotaxis and flagella-related genes, the L. hongkongensis genome also contained two copies of qseB/qseC homologues of the AI-3 quorum sensing system. Conclusions The large number of diverse transporters and genes involved in chemotaxis, motility and quorum sensing suggested that the bacterium may

  7. Genome-wide analysis of the R2R3-MYB transcription factor gene family in sweet orange (Citrus sinensis).

    Science.gov (United States)

    Liu, Chaoyang; Wang, Xia; Xu, Yuantao; Deng, Xiuxin; Xu, Qiang

    2014-10-01

    MYB transcription factor represents one of the largest gene families in plant genomes. Sweet orange (Citrus sinensis) is one of the most important fruit crops worldwide, and recently the genome has been sequenced. This provides an opportunity to investigate the organization and evolutionary characteristics of sweet orange MYB genes from whole genome view. In the present study, we identified 100 R2R3-MYB genes in the sweet orange genome. A comprehensive analysis of this gene family was performed, including the phylogeny, gene structure, chromosomal localization and expression pattern analyses. The 100 genes were divided into 29 subfamilies based on the sequence similarity and phylogeny, and the classification was also well supported by the highly conserved exon/intron structures and motif composition. The phylogenomic comparison of MYB gene family among sweet orange and related plant species, Arabidopsis, cacao and papaya suggested the existence of functional divergence during evolution. Expression profiling indicated that sweet orange R2R3-MYB genes exhibited distinct temporal and spatial expression patterns. Our analysis suggested that the sweet orange MYB genes may play important roles in different plant biological processes, some of which may be potentially involved in citrus fruit quality. These results will be useful for future functional analysis of the MYB gene family in sweet orange.

  8. Classical Oncogenes and Tumor Suppressor Genes: A Comparative Genomics Perspective

    Directory of Open Access Journals (Sweden)

    Oxana K. Pickeral

    2000-05-01

    Full Text Available We have curated a reference set of cancer-related genes and reanalyzed their sequences in the light of molecular information and resources that have become available since they were first cloned. Homology studies were carried out for human oncogenes and tumor suppressors, compared with the complete proteome of the nematode, Caenorhabditis elegans, and partial proteomes of mouse and rat and the fruit fly, Drosophila melanogaster. Our results demonstrate that simple, semi-automated bioinformatics approaches to identifying putative functionally equivalent gene products in different organisms may often be misleading. An electronic supplement to this article1 provides an integrated view of our comparative genomics analysis as well as mapping data, physical cDNA resources and links to published literature and reviews, thus creating a “window” into the genomes of humans and other organisms for cancer biology.

  9. Genomic discovery of potent chromatin insulators for human gene therapy.

    Science.gov (United States)

    Liu, Mingdong; Maurano, Matthew T; Wang, Hao; Qi, Heyuan; Song, Chao-Zhong; Navas, Patrick A; Emery, David W; Stamatoyannopoulos, John A; Stamatoyannopoulos, George

    2015-02-01

    Insertional mutagenesis and genotoxicity, which usually manifest as hematopoietic malignancy, represent major barriers to realizing the promise of gene therapy. Although insulator sequences that block transcriptional enhancers could mitigate or eliminate these risks, so far no human insulators with high functional potency have been identified. Here we describe a genomic approach for the identification of compact sequence elements that function as insulators. These elements are highly occupied by the insulator protein CTCF, are DNase I hypersensitive and represent only a small minority of the CTCF recognition sequences in the human genome. We show that the elements identified acted as potent enhancer blockers and substantially decreased the risk of tumor formation in a cancer-prone animal model. The elements are small, can be efficiently accommodated by viral vectors and have no detrimental effects on viral titers. The insulators we describe here are expected to increase the safety of gene therapy for genetic diseases.

  10. A whole genome RNAi screen identifies replication stress response genes.

    Science.gov (United States)

    Kavanaugh, Gina; Ye, Fei; Mohni, Kareem N; Luzwick, Jessica W; Glick, Gloria; Cortez, David

    2015-11-01

    Proper DNA replication is critical to maintain genome stability. When the DNA replication machinery encounters obstacles to replication, replication forks stall and the replication stress response is activated. This response includes activation of cell cycle checkpoints, stabilization of the replication fork, and DNA damage repair and tolerance mechanisms. Defects in the replication stress response can result in alterations to the DNA sequence causing changes in protein function and expression, ultimately leading to disease states such as cancer. To identify additional genes that control the replication stress response, we performed a three-parameter, high content, whole genome siRNA screen measuring DNA replication before and after a challenge with replication stress as well as a marker of checkpoint kinase signalling. We identified over 200 replication stress response genes and subsequently analyzed how they influence cellular viability in response to replication stress. These data will serve as a useful resource for understanding the replication stress response.

  11. Functional Genomics of Allergen Gene Families in Fruits

    Directory of Open Access Journals (Sweden)

    Fatemeh Maghuly

    2009-10-01

    Full Text Available Fruit consumption is encouraged for health reasons; however, fruits may harbour a series of allergenic proteins that may cause discomfort or even represent serious threats to certain individuals. Thus, the identification and characterization of allergens in fruits requires novel approaches involving genomic and proteomic tools. Since avoidance of fruits also negatively affects the quality of patients’ lives, biotechnological interventions are ongoing to produce low allergenic fruits by down regulating specific genes. In this respect, the control of proteins associated with allergenicity could be achieved by fine tuning the spatial and temporal expression of the relevant genes.

  12. Genomic organization and evolution of the ULBP genes in cattle.

    Science.gov (United States)

    Larson, Joshua H; Marron, Brandy M; Beever, Jonathan E; Roe, Bruce A; Lewin, Harris A

    2006-09-05

    The cattle UL16-binding protein 1 (ULBP1) and ULBP2 genes encode members of the MHC Class I superfamily that have homology to the human ULBP genes. Human ULBP1 and ULBP2 interact with the NKG2D receptor to activate effector cells in the immune system. The human cytomegalovirus UL16 protein is known to disrupt the ULBP-NKG2D interaction, thereby subverting natural killer cell-mediated responses. Previous Southern blotting experiments identified evidence of increased ULBP copy number within the genomes of ruminant artiodactyls. On the basis of these observations we hypothesized that the cattle ULBPs evolved by duplication and sequence divergence to produce a sufficient number and diversity of ULBP molecules to deliver an immune activation signal in the presence of immunogenic peptides. Given the importance of the ULBPs in antiviral immunity in other species, our goal was to determine the copy number and genomic organization of the ULBP genes in the cattle genome. Sequencing of cattle bacterial artificial chromosome genomic inserts resulted in the identification of 30 cattle ULBP loci existing in two gene clusters. Evidence of extensive segmental duplication and approximately 14 Kbp of novel repetitive sequences were identified within the major cluster. Ten ULBPs are predicted to be expressed at the cell surface. Substitution analysis revealed 11 outwardly directed residues in the predicted extracellular domains that show evidence of positive Darwinian selection. These positively selected residues have only one residue that overlaps with those proposed to interact with NKG2D, thus suggesting the interaction with molecules other than NKG2D. The ULBP loci in the cattle genome apparently arose by gene duplication and subsequent sequence divergence. Substitution analysis of the ULBP proteins provided convincing evidence for positive selection on extracellular residues that may interact with peptide ligands. These results support our hypothesis that the cattle ULBPs

  13. Genomic organization and evolution of the ULBP genes in cattle

    Directory of Open Access Journals (Sweden)

    Lewin Harris A

    2006-09-01

    Full Text Available Abstract Background The cattle UL16-binding protein 1 (ULBP1 and ULBP2 genes encode members of the MHC Class I superfamily that have homology to the human ULBP genes. Human ULBP1 and ULBP2 interact with the NKG2D receptor to activate effector cells in the immune system. The human cytomegalovirus UL16 protein is known to disrupt the ULBP-NKG2D interaction, thereby subverting natural killer cell-mediated responses. Previous Southern blotting experiments identified evidence of increased ULBP copy number within the genomes of ruminant artiodactyls. On the basis of these observations we hypothesized that the cattle ULBPs evolved by duplication and sequence divergence to produce a sufficient number and diversity of ULBP molecules to deliver an immune activation signal in the presence of immunogenic peptides. Given the importance of the ULBPs in antiviral immunity in other species, our goal was to determine the copy number and genomic organization of the ULBP genes in the cattle genome. Results Sequencing of cattle bacterial artificial chromosome genomic inserts resulted in the identification of 30 cattle ULBP loci existing in two gene clusters. Evidence of extensive segmental duplication and approximately 14 Kbp of novel repetitive sequences were identified within the major cluster. Ten ULBPs are predicted to be expressed at the cell surface. Substitution analysis revealed 11 outwardly directed residues in the predicted extracellular domains that show evidence of positive Darwinian selection. These positively selected residues have only one residue that overlaps with those proposed to interact with NKG2D, thus suggesting the interaction with molecules other than NKG2D. Conclusion The ULBP loci in the cattle genome apparently arose by gene duplication and subsequent sequence divergence. Substitution analysis of the ULBP proteins provided convincing evidence for positive selection on extracellular residues that may interact with peptide ligands. These

  14. Metabolic Genes within Cyanophage Genomes: Implications for Diversity and Evolution

    Directory of Open Access Journals (Sweden)

    E-Bin Gao

    2016-09-01

    Full Text Available Cyanophages, a group of viruses specifically infecting cyanobacteria, are genetically diverse and extensively abundant in water environments. As a result of selective pressure, cyanophages often acquire a range of metabolic genes from host genomes. The host-derived genes make a significant contribution to the ecological success of cyanophages. In this review, we summarize the host-derived metabolic genes, as well as their origin and roles in cyanophage evolution and important host metabolic pathways, such as the light-dependent reactions of photosynthesis, the pentose phosphate pathway, nutrient acquisition and nucleotide biosynthesis. We also discuss the suitability of the host-derived metabolic genes as potential diagnostic markers for the detection of genetic diversity of cyanophages in natural environments.

  15. Phylogeny, genomic organization and expression of lambda and kappa immunoglobulin light chain genes in a reptile, Anolis carolinensis.

    Science.gov (United States)

    Wu, Qian; Wei, Zhiguo; Yang, Zhi; Wang, Tao; Ren, Liming; Hu, Xiaoxiang; Meng, Qingyong; Guo, Ying; Zhu, Qinghong; Robert, Jacques; Hammarström, Lennart; Li, Ning; Zhao, Yaofeng

    2010-05-01

    The reptiles are the last major taxon of jawed vertebrates in which immunoglobulin light chain isotypes have not been well characterized. Using the recently released genome sequencing data, we show in this study that the reptile Anolis carolinensis expresses both lambda and kappa light chain genes. The genomic organization of both gene loci is structurally similar to their respective counterparts in mammals. The identified lambda locus contains three constant region genes each preceded by a joining gene segment, and a total of 37 variable gene segments. In contrast, the kappa locus contains only a single constant region gene, and two joining gene segments with a single family of 14 variable gene segments located upstream. Analysis of junctions of the recombined VJ transcripts reveals a paucity of N and P nucleotides in both expressed lambda and kappa sequences. These results help us to understand the generation of the immunoglobulin repertoire in reptiles and immunoglobulin evolution in vertebrates.

  16. Increased complexity of gene structure and base composition in vertebrates

    Institute of Scientific and Technical Information of China (English)

    Ying Wu; Huizhong Yuan; Shengjun Tan; Jian-Qun Chen; Dacheng Tian; Haiwang Yang

    2011-01-01

    How the structure and base composition of genes changed with the evolution of vertebrates remains a puzzling question. Here we analyzed 895 orthologous protein-coding genes in six multicellular animals: human, chicken, zebrafish, sea squirt, fruit fly, and worm. Our analyses reveal that many gene regions, particularly intron and 3' UTR, gradually expanded throughout the evolution of vertebrates from their invertebrate ancestors, and that the number of exons per gene increased. Studies based on all protein-coding genes in each genome provide consistent results.We also find that GC-content increased in many gene regions (especially 5' UTR) in the evolution of endotherms, except in coding-exons.Analysis of individual genomes shows that 3′ UTR demonstrated stronger length and CC-content correlation with intron than 5' UTR, and gene with large intron in all six species demonstrated relatively similar GC-content. Our data indicates a great increase in complexity in vertebrate genes and we propose that the requirement for morphological and functional changes is probably the driving force behind the evolution of structure and base composition complexity in multicellular animal genes.

  17. Regulatory Features for Odorant Receptor Genes in the Mouse Genome.

    Science.gov (United States)

    Degl'Innocenti, Andrea; D'Errico, Anna

    2017-01-01

    The odorant receptor genes, seven transmembrane receptor genes constituting the vastest mammalian gene multifamily, are expressed monogenically and monoallelicaly in each sensory neuron in the olfactory epithelium. This characteristic, often referred to as the one neuron-one receptor rule, is driven by mostly uncharacterized molecular dynamics, generally named odorant receptor gene choice. Much attention has been paid by the scientific community to the identification of sequences regulating the expression of odorant receptor genes within their loci, where related genes are usually arranged in genomic clusters. A number of studies identified transcription factor binding sites on odorant receptor promoter sequences. Similar binding sites were also found on a number of enhancers that regulate in cis their transcription, but have been proposed to form interchromosomal networks. Odorant receptor gene choice seems to occur via the local removal of strongly repressive epigenetic markings, put in place during the maturation of the sensory neuron on each odorant receptor locus. Here we review the fast-changing state of art for the study of regulatory features for odorant receptor genes.

  18. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  19. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  20. A genome-wide characterization of microRNA genes in maize.

    Directory of Open Access Journals (Sweden)

    Lifang Zhang

    2009-11-01

    Full Text Available MicroRNAs (miRNAs are small, non-coding RNAs that play essential roles in plant growth, development, and stress response. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling identified 150 high-confidence genes within 26 miRNA families. For 25 families, expression was verified by deep-sequencing of small RNA libraries that were prepared from an assortment of maize tissues. PCR-RACE amplification of 68 miRNA transcript precursors, representing 18 families conserved across several plant species, showed that splice variation and the use of alternative transcriptional start and stop sites is common within this class of genes. Comparison of sequence variation data from diverse maize inbred lines versus teosinte accessions suggest that the mature miRNAs are under strong purifying selection while the flanking sequences evolve equivalently to other genes. Since maize is derived from an ancient tetraploid, the effect of whole-genome duplication on miRNA evolution was examined. We found that, like protein-coding genes, duplicated miRNA genes underwent extensive gene-loss, with approximately 35% of ancestral sites retained as duplicate homoeologous miRNA genes. This number is higher than that observed with protein-coding genes. A search for putative miRNA targets indicated bias towards genes in regulatory and metabolic pathways. As maize is one of the principal models for plant growth and development, this study will serve as a foundation for future research into the functional roles of miRNA genes.

  1. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    Directory of Open Access Journals (Sweden)

    Brahmbhatt Sonal

    2008-11-01

    Full Text Available Abstract Background Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most widely studied groups of fish. Results 298,304 expressed sequence tags (ESTs from Atlantic salmon (69% of the total, 11,664 chinook, 10,813 sockeye, 10,051 brook trout, 10,975 grayling, 8,630 lake whitefish, and 3,624 northern pike ESTs were obtained in this study and have been deposited into the public databases. Contigs were built and putative full-length Atlantic salmon clones have been identified. A database containing ESTs, assemblies, consensus sequences, open reading frames, gene predictions and putative annotation is available. The overall similarity between Atlantic salmon ESTs and those of rainbow trout, chinook, sockeye, brook trout, grayling, lake whitefish, northern pike and rainbow smelt is 93.4, 94.2, 94.6, 94.4, 92.5, 91.7, 89.6, and 86.2% respectively. An analysis of 78 transcript sets show Salmo as a sister group to Oncorhynchus and Salvelinus within Salmoninae, and Thymallinae as a sister group to Salmoninae and Coregoninae within Salmonidae. Extensive gene duplication is consistent with a genome duplication in the common ancestor of salmonids. Using all of the available EST data, a new expanded salmonid cDNA microarray of 32,000 features was created. Cross-species hybridizations to this cDNA microarray indicate that this resource will be useful for studies of all 68 salmonid species. Conclusion An extensive collection and analysis of salmonid RNA putative transcripts indicate that Pacific salmon, Atlantic salmon and charr are 94–96% similar while the more distant whitefish, grayling, pike and smelt are 93, 92, 89 and 86% similar to salmon. The salmonid transcriptome reveals a complex history of gene duplication that is

  2. The function genomics study

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    @@ Genomics is a biology term appeared ten years ago, used to describe the researches of genomic mapping, sequencing, and structure analysis, etc. Genomics, the first journal for publishing papers on genomics research was born in 1986. In the past decade, the concept of genomics has been widely accepted by scientists who are engaging in biology research. Meanwhile, the research scope of genomics has been extended continuously, from simple gene mapping and sequencing to function genomics study. To reflect the change, genomics is divided into two parts now, the structure genomics and the function genomics.

  3. The MI bundle: enabling network and structural biology in genome visualization tools.

    Science.gov (United States)

    Céol, Arnaud; Müller, Heiko

    2015-11-15

    Prioritization of candidate genes emanating from large-scale screens requires integrated analyses at the genomics, molecular, network and structural biology levels. We have extended the Integrated Genome Browser (IGB) to facilitate these tasks. The graphical user interface greatly simplifies building disease networks and zooming in at atomic resolution to identify variations in molecular complexes that may affect molecular interactions in the context of genomic data. All results are summarized in genome tracks and can be visualized and analyzed at the transcript level. The MI Bundle is a plugin for the IGB. The plugin, help, video and tutorial are available at http://cru.genomics.iit.it/igbmibundle/ and https://github.com/CRUiit/igb-mi-bundle/wiki. The source code is released under the Apache License, Version 2. arnaud.ceol@iit.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  4. CCor: A whole genome network-based similarity measure between two genes.

    Science.gov (United States)

    Hu, Yiming; Zhao, Hongyu

    2016-12-01

    Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.

  5. Genomic structure of an economically important cyanobacterium, Arthrospira (Spirulina) platensis NIES-39.

    Science.gov (United States)

    Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki

    2010-04-01

    A filamentous non-N(2)-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca(2+)-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis.

  6. Deep genome-wide measurement of meiotic gene conversion using tetrad analysis in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Yujin Sun

    Full Text Available Gene conversion, the non-reciprocal exchange of genetic information, is one of the potential products of meiotic recombination. It can shape genome structure by acting on repetitive DNA elements, influence allele frequencies at the population level, and is known to be implicated in human disease. But gene conversion is hard to detect directly except in organisms, like fungi, that group their gametes following meiosis. We have developed a novel visual assay that enables us to detect gene conversion events directly in the gametes of the flowering plant Arabidopsis thaliana. Using this assay we measured gene conversion events across the genome of more than one million meioses and determined that the genome-wide average frequency is 3.5×10(-4 conversions per locus per meiosis. We also detected significant locus-to-locus variation in conversion frequency but no intra-locus variation. Significantly, we found one locus on the short arm of chromosome 4 that experienced 3-fold to 6-fold more gene conversions than the other loci tested. Finally, we demonstrated that we could modulate conversion frequency by varying experimental conditions.

  7. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  8. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Science.gov (United States)

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  9. Genome-wide identification of KANADI1 target genes.

    Directory of Open Access Journals (Sweden)

    Paz Merelo

    Full Text Available Plant organ development and polarity establishment is mediated by the action of several transcription factors. Among these, the KANADI (KAN subclade of the GARP protein family plays important roles in polarity-associated processes during embryo, shoot and root patterning. In this study, we have identified a set of potential direct target genes of KAN1 through a combination of chromatin immunoprecipitation/DNA sequencing (ChIP-Seq and genome-wide transcriptional profiling using tiling arrays. Target genes are over-represented for genes involved in the regulation of organ development as well as in the response to auxin. KAN1 affects directly the expression of several genes previously shown to be important in the establishment of polarity during lateral organ and vascular tissue development. We also show that KAN1 controls through its target genes auxin effects on organ development at different levels: transport and its regulation, and signaling. In addition, KAN1 regulates genes involved in the response to abscisic acid, jasmonic acid, brassinosteroids, ethylene, cytokinins and gibberellins. The role of KAN1 in organ polarity is antagonized by HD-ZIPIII transcription factors, including REVOLUTA (REV. A comparison of their target genes reveals that the REV/KAN1 module acts in organ patterning through opposite regulation of shared targets. Evidence of mutual repression between closely related family members is also shown.

  10. Genetics and Genomics of Single-Gene Cardiovascular Diseases : Common Hereditary Cardiomyopathies as Prototypes of Single-Gene Disorders

    NARCIS (Netherlands)

    Marian, Ali J; van Rooij, Eva; Roberts, Robert

    2016-01-01

    This is the first of 2 review papers on genetics and genomics appearing as part of the series on "omics." Genomics pertains to all components of an organism's genes, whereas genetics involves analysis of a specific gene or genes in the context of heredity. The paper provides introductory comments,

  11. Genetics and Genomics of Single-Gene Cardiovascular Diseases : Common Hereditary Cardiomyopathies as Prototypes of Single-Gene Disorders

    NARCIS (Netherlands)

    Marian, Ali J.; van Rooij, Eva; Roberts, Robert

    2016-01-01

    This is the first of 2 review papers on genetics and genomics appearing as part of the series on “omics.” Genomics pertains to all components of an organism's genes, whereas genetics involves analysis of a specific gene or genes in the context of heredity. The paper provides introductory comments,

  12. Regulation of disease-associated gene expression in the 3D genome.

    Science.gov (United States)

    Krijger, Peter Hugo Lodewijk; de Laat, Wouter

    2016-12-01

    Genetic variation associated with disease often appears in non-coding parts of the genome. Understanding the mechanisms by which this phenomenon leads to disease is necessary to translate results from genetic association studies to the clinic. Assigning function to this type of variation is notoriously difficult because the human genome harbours a complex regulatory landscape with a dizzying array of transcriptional regulatory sequences, such as enhancers that have unpredictable, promiscuous and context-dependent behaviour. In this Review, we discuss how technological advances have provided increasingly detailed information on genome folding; for example, genome folding forms loops that bring enhancers and target genes into close proximity. We also now know that enhancers function within topologically associated domains, which are structural and functional units of chromosomes. Studying disease-associated mutations and chromosomal rearrangements in the context of the 3D genome will enable the identification of dysregulated target genes and aid the progression from descriptive genetic association results to discovering molecular mechanisms underlying disease.

  13. Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes.

    Directory of Open Access Journals (Sweden)

    María José Aranzana

    2005-11-01

    Full Text Available There is currently tremendous interest in the possibility of using genome-wide association mapping to identify genes responsible for natural variation, particularly for human disease susceptibility. The model plant Arabidopsis thaliana is in many ways an ideal candidate for such studies, because it is a highly selfing hermaphrodite. As a result, the species largely exists as a collection of naturally occurring inbred lines, or accessions, which can be genotyped once and phenotyped repeatedly. Furthermore, linkage disequilibrium in such a species will be much more extensive than in a comparable outcrossing species. We tested the feasibility of genome-wide association mapping in A. thaliana by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome-wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies.

  14. Genome size diversity in angiosperms and its influence on gene space.

    Science.gov (United States)

    Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

    2015-12-01

    Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Microcollinearity in an ethylene receptor coding gene region of the Coffea canephora genome is extensively conserved with Vitis vinifera and other distant dicotyledonous sequenced genomes

    Directory of Open Access Journals (Sweden)

    Campa Claudine

    2009-02-01

    Full Text Available Abstract Background Coffea canephora, also called Robusta, belongs to the Rubiaceae, the fourth largest angiosperm family. This diploid species (2x = 2n = 22 has a fairly small genome size of ≈ 690 Mb and despite its extreme economic importance, particularly for developing countries, knowledge on the genome composition, structure and evolution remain very limited. Here, we report the 160 kb of the first C. canephora Bacterial Artificial Chromosome (BAC clone ever sequenced and its fine analysis. Results This clone contains the CcEIN4 gene, encoding an ethylene receptor, and twenty other predicted genes showing a high gene density of one gene per 7.8 kb. Most of them display perfect matches with C. canephora expressed sequence tags or show transcriptional activities through PCR amplifications on cDNA libraries. Twenty-three transposable elements, mainly Class II transposon derivatives, were identified at this locus. Most of these Class II elements are Miniature Inverted-repeat Transposable Elements (MITE known to be closely associated with plant genes. This BAC composition gives a pattern similar to those found in gene rich regions of Solanum lycopersicum and Medicago truncatula genomes indicating that the CcEIN4 regions may belong to a gene rich region in the C. canephora genome. Comparative sequence analysis indicated an extensive conservation between C. canephora and most of the reference dicotyledonous genomes studied in this work, such as tomato (S. lycopersicum, grapevine (V. vinifera, barrel medic M. truncatula, black cottonwood (Populus trichocarpa and Arabidopsis thaliana. The higher degree of microcollinearity was found between C. canephora and V. vinifera, which belong respectively to the Asterids and Rosids, two clades that diverged more than 114 million years ago. Conclusion This study provides a first glimpse of C. canephora genome composition and evolution. Our data revealed a remarkable conservation of the microcollinearity

  16. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Changwei Bi

    2016-09-01

    Full Text Available WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III, with five subgroups (IIa–IIe in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the

  17. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  18. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  19. Genomic structure of metabotropic glutamate receptor 7 and comparison of genomic structures of extracellular domains of mGluR family

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Metabotropic glutamate receptor 7, coupled with a chemical neurotransmitter L-glutamate, plays an important role in the development of many psychiatric and neurological disorders. To study the biological and genetic mechanism of the mGluR7-related diseases, a physical map covering the full-length mGluR7 genomic sequence has been constructed through seed clone screening and fingerprinting database searching. These BAC clones in the physical map have been sequenced with shotgun strategy and assembled by Phred-Phrap-Consed software; the error rate of the final genomic sequence is less than 0.01%. mGluR7 spans 880 kb genomic region, the GC content and repeat content of mGluR7 genomic sequence are 38% and 37.5% respectively. mGluR7 has a typical "house-keeping" promoter and consists of 11 exons, with introns ranging from 6 kb to 285 kb. mGluR7a and mGluR7b are two known alternatively splicing variants. Comparing the genomic structures of extracellular domains of mGluR family, their genomic structures can be subdivided into three groups, which are consistent with that of proteins. Although the genomic organization of mGluR7's group is conserved, the majority of introns in the extracellular segments vary dramatically. It is an obvious trend of the increasing intron size inverse proportion to phylogenetic time. Variation of genomic structure is higher than that of protein, which is attributed to the species characteristic regulation of gene expression.

  20. Computational prediction of microRNA genes in silkworm genome

    Institute of Scientific and Technical Information of China (English)

    TONG Chuan-zhou; JIN Yong-feng; ZHANG Yao-zhou

    2006-01-01

    MicroRNAs (miRNAs) constitute a novel, extensive class of small RNAs (~21 nucleotides), and play important gene-regulation roles during growth and development in various organisms. Here we conducted a homology search to identify homologs of previously validated miRNAs from silkworm genome. We identified 24 potential miRNA genes, and gave each of them a name according to the common criteria. Interestingly, we found that a great number of newly identified miRNAs were conserved in silkworm and Drosophila, and family alignment revealed that miRNA families might possess single nucleotide polymorphisms. miRNA gene clusters and possible functions of complement miRNA pairs are discussed.

  1. Genomic and gene variation in Mycoplasma hominis strains

    DEFF Research Database (Denmark)

    Christiansen, Gunna; Andersen, H; Birkelund, Svend

    1987-01-01

    DNAs from 14 strains of Mycoplasma hominis isolated from various habitats, including strain PG21, were analyzed for genomic heterogeneity. DNA-DNA filter hybridization values were from 51 to 91%. Restriction endonuclease digestion patterns, analyzed by agarose gel electrophoresis, revealed...... no identity or cluster formation between strains. Variation within M. hominis rRNA genes was analyzed by Southern hybridization of EcoRI-cleaved DNA hybridized with a cloned fragment of the rRNA gene from the mycoplasma strain PG50. Five of the M. hominis strains showed identical hybridization patterns....... These hybridization patterns were compared with those of 12 other mycoplasma species, which showed a much more complex band pattern. Cloned nonribosomal RNA gene fragments of M. hominis PG21 DNA were analyzed, and the fragments were used to demonstrate heterogeneity among the strains. A monoclonal antibody against...

  2. Genome-Wide Identification and Functional Classification of Tomato (Solanum lycopersicum) Aldehyde Dehydrogenase (ALDH) Gene Superfamily.

    Science.gov (United States)

    Jimenez-Lopez, Jose C; Lopez-Valverde, Francisco J; Robles-Bolivar, Paula; Lima-Cabello, Elena; Gachomo, Emma W; Kotchoni, Simeon O

    2016-01-01

    Aldehyde dehydrogenases (ALDHs) is a protein superfamily that catalyzes the oxidation of aldehyde molecules into their corresponding non-toxic carboxylic acids, and responding to different environmental stresses, offering promising genetic approaches for improving plant adaptation. The aim of the current study is the functional analysis for systematic identification of S. lycopersicum ALDH gene superfamily. We performed genome-based ALDH genes identification and functional classification, phylogenetic relationship, structure and catalytic domains analysis, and microarray based gene expression. Twenty nine unique tomato ALDH sequences encoding 11 ALDH families were identified, including a unique member of the family 19 ALDH. Phylogenetic analysis revealed 13 groups, with a conserved relationship among ALDH families. Functional structure analysis of ALDH2 showed a catalytic mechanism involving Cys-Glu couple. However, the analysis of ALDH3 showed no functional gene duplication or potential neo-functionalities. Gene expression analysis reveals that particular ALDH genes might respond to wounding stress increasing the expression as ALDH2B7. Overall, this study reveals the complexity of S. lycopersicum ALDH gene superfamily and offers new insights into the structure-functional features and evolution of ALDH gene families in vascular plants. The functional characterization of ALDHs is valuable and promoting molecular breeding in tomato for the improvement of stress tolerance and signaling.

  3. Genome-wide identification and phylogenetic analysis of the ERF gene family in cucumbers

    Directory of Open Access Journals (Sweden)

    Lifang Hu

    2011-01-01

    Full Text Available Members of the ERF transcription-factor family participate in a number of biological processes, viz., responses to hormones, adaptation to biotic and abiotic stress, metabolism regulation, beneficial symbiotic interactions, cell differentiation and developmental processes. So far, no tissue-expression profile of any cucumber ERF protein has been reported in detail. Recent completion of the cucumber full-genome sequence has come to facilitate, not only genome-wide analysis of ERF family members in cucumbers themselves, but also a comparative analysis with those in Arabidopsis and rice. In this study, 103 hypothetical ERF family genes in the cucumber genome were identified, phylogenetic analysis indicating their classification into 10 groups, designated I to X. Motif analysis further indicated that most of the conserved motifs outside the AP2/ERF domain, are selectively distributed among the specific clades in the phylogenetic tree. From chromosomal localization and genome distribution analysis, it appears that tandem-duplication may have contributed to CsERF gene expansion. Intron/exon structure analysis indicated that a few CsERFs still conserved the former intron-position patterns existent in the common ancestor of monocots and eudicots. Expression analysis revealed the widespread distribution of the cucumber ERF gene family within plant tissues, thereby implying the probability of their performing various roles therein. Furthermore, members of some groups presented mutually similar expression patterns that might be related to their phylogenetic groups.

  4. Coevolution of aah: A dps-Like Gene with the Host Bacterium Revealed by Comparative Genomic Analysis

    Directory of Open Access Journals (Sweden)

    Liyan Ping

    2012-01-01

    Full Text Available A protein named AAH was isolated from the bacterium Microbacterium arborescens SE14, a gut commensal of the lepidopteran larvae. It showed not only a high sequence similarity to Dps-like proteins (DNA-binding proteins from starved cell but also reversible hydrolase activity. A comparative genomic analysis was performed to gain more insights into its evolution. The GC profile of the aah gene indicated that it was evolved from a low GC ancestor. Its stop codon usage was also different from the general pattern of Actinobacterial genomes. The phylogeny of dps-like proteins showed strong correlation with the phylogeny of host bacteria. A conserved genomic synteny was identified in some taxonomically related Actinobacteria, suggesting that the ancestor genes had incorporated into the genome before the divergence of Micrococcineae from other families. The aah gene had evolved new function but still retained the typical dodecameric structure.

  5. Genomic and gene expression signature of the pre-invasive testicular carcinoma in situ

    DEFF Research Database (Denmark)

    Almstrup, Kristian; Ottesen, Anne Marie; Sonne, Si Brask

    2005-01-01

    on the pre-invasive CIS and its possible fetal origin by reviewing recent data originating from DNA microarrays and comparative genomic hybridisations. A comparison of gene expression and genomic aberrations reveal chromosomal "hot spots" with mutual clustering of gene expression and genomic amplification...

  6. Three-dimensional Structure of a Viral Genome-delivery Portal Vertex

    Energy Technology Data Exchange (ETDEWEB)

    A Olia; P Prevelige Jr.; J Johnson; G Cingolani

    2011-12-31

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here, we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25-{angstrom}-resolution structure of the portal-protein core bound to 12 copies of gene product 4 (gp4) reveals a {approx}1.1-MDa assembly formed by 24 proteins. Unexpectedly, a lower-resolution structure of the full-length portal protein unveils the unique topology of the C-terminal domain, which forms a {approx}200-{angstrom}-long {alpha}-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell.

  7. Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes.

    Science.gov (United States)

    Zhou, T; Wang, Y; Chen, J-Q; Araki, H; Jing, Z; Jiang, K; Shen, J; Tian, D

    2004-05-01

    A complete set of candidate disease resistance ( R) genes encoding nucleotide-binding sites (NBSs) was identified in the genome sequence of japonica rice ( Oryza sativaL. var. Nipponbare). These putative R genes were characterized with respect to structural diversity, phylogenetic relationships and chromosomal distribution, and compared with those in Arabidopsis thaliana. We found 535 NBS-coding sequences, including 480 non-TIR (Toll/IL-1 receptor) NBS-LRR (Leucine Rich Repeat) genes. TIR NBS-LRR genes, which are common in A. thaliana, have not been identified in the rice genome. The number of non-TIR NBS-LRR genes in rice is 8.7 times higher than that in A. thaliana, and they account for about 1% of all of predicted ORFs in the rice genome. Some 76% of the NBS genes were located in 44 gene clusters or in 57 tandem arrays, and 16 apparent gene duplications were detected in these regions. Phylogenetic analyses based both NBS and N-terminal regions classified the genes into about 200 groups, but no deep clades were detected, in contrast to the two distinct clusters found in A. thaliana. The structural and genetic diversity that exists among NBS-LRR proteins in rice is remarkable, and suggests that diversifying selection has played an important role in the evolution of R genes in this agronomically important species. (Supplemental material is available online at http://gattaca.nju.edu.cn.)

  8. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    Science.gov (United States)

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  9. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling.

    Science.gov (United States)

    Inoue, Jun; Sato, Yukuto; Sinclair, Robert; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-12-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post-teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70-80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis.

  10. Human bZIP transcription factor gene NRL: structure, genomic sequence, and fine linkage mapping at 14q11.2 and negative mutation analysis in patients with retinal degeneration.

    Science.gov (United States)

    Farjo, Q; Jackson, A; Pieke-Dahl, S; Scott, K; Kimberling, W J; Sieving, P A; Richards, J E; Swaroop, A

    1997-10-15

    The NRL gene encodes an evolutionarily conserved basic motif-leucine zipper transcription factor that is implicated in regulating the expression of the photoreceptor-specific gene rhodopsin. NRL is expressed in postmitotic neuronal cells and in lens during embryonic development, but exhibits a retina-specific pattern of expression in the adult. To understand regulation of NRL expression and to investigate its possible involvement in retinopathies, we have determined the complete sequence of the human NRL gene, identified a polymorphic (CA)n repeat (identical to D14S64) within the NRL-containing cosmid, and refined its location by linkage analysis. Since a locus for autosomal recessive retinitis pigmentosa (arRP) has been linked to markers at 14q11 and since mutations in rhodopsin can lead to RP, we sequenced genomic PCR products of the NRL gene and of the rhodopsin-Nrl response element from a panel of patients representing independent families with inherited retinal degeneration. The analysis did not reveal any causative mutations in this group of patients. These investigations provide the basis for delineating the DNA sequence elements that regulate NRL expression in distinct neuronal cell types and should assist in the analysis of NRL as a candidate gene for inherited diseases/syndromes affecting visual function. Copyright 1997 Academic Press.

  11. Evolutionary genomics and adaptive evolution of the Hedgehog gene family (Shh, Ihh and Dhh) in vertebrates.

    Science.gov (United States)

    Pereira, Joana; Johnson, Warren E; O'Brien, Stephen J; Jarvis, Erich D; Zhang, Guojie; Gilbert, M Thomas P; Vasconcelos, Vitor; Antunes, Agostinho

    2014-01-01

    The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog--Shh; Indian hedgehog--Ihh; and Desert hedgehog--Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots.

  12. Evolutionary genomics and adaptive evolution of the Hedgehog gene family (Shh, Ihh and Dhh in vertebrates.

    Directory of Open Access Journals (Sweden)

    Joana Pereira

    Full Text Available The Hedgehog (Hh gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog--Shh; Indian hedgehog--Ihh; and Desert hedgehog--Dhh, each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots.

  13. Evolutionary Genomics and Adaptive Evolution of the Hedgehog Gene Family (Shh, Ihh and Dhh) in Vertebrates

    Science.gov (United States)

    Pereira, Joana; Johnson, Warren E.; O’Brien, Stephen J.; Jarvis, Erich D.; Zhang, Guojie; Gilbert, M. Thomas P.; Vasconcelos, Vitor; Antunes, Agostinho

    2014-01-01

    The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog – Shh; Indian hedgehog – Ihh; and Desert hedgehog – Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots. PMID:25549322

  14. Genome-enabled Discovery of Carbon Sequestration Genes

    Energy Technology Data Exchange (ETDEWEB)

    Tuskan, Gerald A [ORNL; Tschaplinski, Timothy J [ORNL; Kalluri, Udaya C [ORNL; Yin, Tongming [ORNL; Yang, Xiaohan [ORNL; Zhang, Xinye [ORNL; Engle, Nancy L [ORNL; Ranjan, Priya [ORNL; Basu, Manojit M [ORNL; Gunter, Lee E [ORNL; Jawdy, Sara [ORNL; Martin, Madhavi Z [ORNL; Campbell, Alina S [ORNL; DiFazio, Stephen P [ORNL; Davis, John M [University of Florida; Hinchee, Maud [ORNL; Pinnacchio, Christa [U.S. Department of Energy, Joint Genome Institute; Meilan, R [Purdue University; Busov, V. [Michigan Technological University; Strauss, S [Oregon State University

    2009-01-01

    The fate of carbon below ground is likely to be a major factor determining the success of carbon sequestration strategies involving plants. Despite their importance, molecular processes controlling belowground C allocation and partitioning are poorly understood. This project is leveraging the Populus trichocarpa genome sequence to discover genes important to C sequestration in plants and soils. The focus is on the identification of genes that provide key control points for the flow and chemical transformations of carbon in roots, concentrating on genes that control the synthesis of chemical forms of carbon that result in slower turnover rates of soil organic matter (i.e., increased recalcitrance). We propose to enhance carbon allocation and partitioning to roots by 1) modifying the auxin signaling pathway, and the invertase family, which controls sucrose metabolism, and by 2) increasing root proliferation through transgenesis with genes known to control fine root proliferation (e.g., ANT), 3) increasing the production of recalcitrant C metabolites by identifying genes controlling secondary C metabolism by a major mQTL-based gene discovery effort, and 4) increasing aboveground productivity by enhancing drought tolerance to achieve maximum C sequestration. This broad, integrated approach is aimed at ultimately enhancing root biomass as well as root detritus longevity, providing the best prospects for significant enhancement of belowground C sequestration.

  15. Genome Diversification Mechanism of Rodent and Lagomorpha Chemokine Genes

    Directory of Open Access Journals (Sweden)

    Kanako Shibata

    2013-01-01

    Full Text Available Chemokines are a large family of small cytokines that are involved in host defence and body homeostasis through recruitment of cells expressing their receptors. Their genes are known to undergo rapid evolution. Therefore, the number and content of chemokine genes can be quite diverse among the different species, making the orthologous relationships often ambiguous even between closely related species. Given that rodents and rabbit are useful experimental models in medicine and drug development, we have deduced the chemokine genes from the genome sequences of several rodent species and rabbit and compared them with those of human and mouse to determine the orthologous relationships. The interspecies differences should be taken into consideration when experimental results from animal models are extrapolated into humans. The chemokine gene lists and their orthologous relationships presented here will be useful for studies using these animal models. Our analysis also enables us to reconstruct possible gene duplication processes that generated the different sets of chemokine genes in these species.

  16. Genome-Wide Analysis of the Aquaporin Gene Family in Chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Deokar, Amit A; Tar'an, Bunyamin

    2016-01-01

    Aquaporins (AQPs) are essential membrane proteins that play critical role in the transport of water and many other solutes across cell membranes. In this study, a comprehensive genome-wide analysis identified 40 AQP genes in chickpea (Cicer arietinum L.). A complete overview of the chickpea AQP (CaAQP) gene family is presented, including their chromosomal locations, gene structure, phylogeny, gene duplication, conserved functional motifs, gene expression, and conserved promoter motifs. To understand AQP's evolution, a comparative analysis of chickpea AQPs with AQP orthologs from soybean, Medicago, common bean, and Arabidopsis was performed. The chickpea AQP genes were found on all of the chickpea chromosomes, except chromosome 7, with a maximum of six genes on chromosome 6, and a minimum of one gene on chromosome 5. Gene duplication analysis indicated that the expansion of chickpea AQP gene family might have been due to segmental and tandem duplications. CaAQPs were grouped into four subfamilies including 15 NOD26-like intrinsic proteins (NIPs), 13 tonoplast intrinsic proteins (TIPs), eight plasma membrane intrinsic proteins (PIPs), and four small basic intrinsic proteins (SIPs) based on sequence similarities and phylogenetic position. Gene structure analysis revealed a highly conserved exon-intron pattern within CaAQP subfamilies supporting the CaAQP family classification. Functional prediction based on conserved Ar/R selectivity filters, Froger's residues, and specificity-determining positions suggested wide differences in substrate specificity among the subfamilies of CaAQPs. Expression analysis of the AQP genes indicated that some of the genes are tissue-specific, whereas few other AQP genes showed differential expression in response to biotic and abiotic stresses. Promoter profiling of CaAQP genes for conserved cis-acting regulatory elements revealed enrichment of cis-elements involved in circadian control, light response, defense and stress responsiveness

  17. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

    Directory of Open Access Journals (Sweden)

    Senanu M Spring-Pearson

    Full Text Available The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.

  18. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

    Science.gov (United States)

    Spring-Pearson, Senanu M; Stone, Joshua K; Doyle, Adina; Allender, Christopher J; Okinaka, Richard T; Mayo, Mark; Broomall, Stacey M; Hill, Jessica M; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; McNew, Lauren A; Rosenzweig, C Nicole; Gibbons, Henry S; Currie, Bart J; Wagner, David M; Keim, Paul; Tuanyok, Apichai

    2015-01-01

    The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.

  19. Census of solo LuxR genes in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Sanjarbek eHudaiberdiev

    2015-03-01

    Full Text Available luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation.

  20. Localizing F(ST) outliers on a QTL map reveals evidence for large genomic regions of reduced gene exchange during speciation-with-gene-flow.

    Science.gov (United States)

    Via, Sara; Conte, Gina; Mason-Foley, Casey; Mills, Kelly

    2012-11-01

    Populations that maintain phenotypic divergence in sympatry typically show a mosaic pattern of genomic divergence, requiring a corresponding mosaic of genomic isolation (reduced gene flow). However, mechanisms that could produce the genomic isolation required for divergence-with-gene-flow have barely been explored, apart from the traditional localized effects of selection and reduced recombination near centromeres or inversions. By localizing F(ST) outliers from a genome scan of wild pea aphid host races on a Quantitative Trait Locus (QTL) map of key traits, we test the hypothesis that between-population recombination and gene exchange are reduced over large 'divergence hitchhiking' (DH) regions. As expected under divergence hitchhiking, our map confirms that QTL and divergent markers cluster together in multiple large genomic regions. Under divergence hitchhiking, the nonoutlier markers within these regions should show signs of reduced gene exchange relative to nonoutlier markers in genomic regions where ongoing gene flow is expected. We use this predicted difference among nonoutliers to perform a critical test of divergence hitchhiking. Results show that nonoutlier markers within clusters of F(ST) outliers and QTL resolve the genetic population structure of the two host races nearly as well as the outliers themselves, while nonoutliers outside DH regions reveal no population structure, as expected if they experience more gene flow. These results provide clear evidence for divergence hitchhiking, a mechanism that may dramatically facilitate the process of speciation-with-gene-flow. They also show the power of integrating genome scans with genetic analyses of the phenotypic traits involved in local adaptation and population divergence. © 2012 Blackwell Publishing Ltd.

  1. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    Science.gov (United States)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  2. Sampling Daphnia's expressed genes: preservation, expansion and invention of crustacean genes with reference to insect genomes

    Directory of Open Access Journals (Sweden)

    Bauer Darren J

    2007-07-01

    Full Text Available Abstract Background Functional and comparative studies of insect genomes have shed light on the complement of genes, which in part, account for shared morphologies, developmental programs and life-histories. Contrasting the gene inventories of insects to those of the nematodes provides insight into the genomic changes responsible for their diversification. However, nematodes have weak relationships to insects, as each belongs to separate animal phyla. A better outgroup to distinguish lineage specific novelties would include other members of Arthropoda. For example, crustaceans are close allies to the insects (together forming Pancrustacea and their fascinating aquatic lifestyle provides an important comparison for understanding the genetic basis of adaptations to life on land versus life in water. Results This study reports on the first characterization of cDNA libraries and sequences for the model crustacean Daphnia pulex. We analyzed 1,546 ESTs of which 1,414 represent approximately 787 nuclear genes, by measuring their sequence similarities with insect and nematode proteomes. The provisional annotation of genes is supported by expression data from microarray studies described in companion papers. Loci expected to be shared between crustaceans and insects because of their mutual biological features are identified, including genes for reproduction, regulation and cellular processes. We identify genes that are likely derived within Pancrustacea or lost within the nematodes. Moreover, lineage specific gene family expansions are identified, which suggest certain biological demands associated with their ecological setting. In particular, up to seven distinct ferritin loci are found in Daphnia compared to three in most insects. Finally, a substantial fraction of the sampled gene transcripts shares no sequence similarity with those from other arthropods. Genes functioning during development and reproduction are comparatively well conserved between

  3. MADS-box gene evolution - structure and transcription patterns

    DEFF Research Database (Denmark)

    Johansen, Bo; Pedersen, Louise Buchholt; Skipper, Martin;

    2002-01-01

    Mads-box genes, ABC model, Evolution, Phylogeny, Transcription patterns, Gene structure, Conserved motifs......Mads-box genes, ABC model, Evolution, Phylogeny, Transcription patterns, Gene structure, Conserved motifs...

  4. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    Science.gov (United States)

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  5. [Genomic diversity and population structure of Helicobacter pylori isolates in China].

    Science.gov (United States)

    You, Y H; He, L H; Peng, X H; Sun, L; Zhang, J Z

    2016-10-10

    Objective: To learn about the overall genomic characteristics and population structure of Helicobacter pylori isolated in China. Methods: In this study, we used 10 public available genome sequences of H. pylori strains isolated in China, combined with other H. pylori sequences from GenBank, to analyzed the overall genomic characteristics of H. pylori isolated in China. Core genes and strain specific genes were determined for a further function definition. Results: A total of 1 203 core genes were found among all sequenced China H. pylori isolates. The number of strain specific genes ranged from 19 to 32. These genes mainly encodes hypothetical proteins which might play an important role in adaption to different hosts. Genomic variation regions were mainly in genes encoding type four secretion systems and restriction modification systems. All the China isolates belong to hpEastAsia group, hspEAsia subgroup. Prophages sequences were found in three China H. pylori strains, carrying key elements required for phage assembly. Conclusion: China H. pylori isolates belong to hpEastAsia group, hspEAsia subgroup, and some isolates contain prophages.

  6. CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes

    Directory of Open Access Journals (Sweden)

    Bazzicalupo Marco

    2011-06-01

    Full Text Available Abstract Recent developments in sequencing technologies have given the opportunity to sequence many bacterial genomes with limited cost and labor, compared to previous techniques. However, a limiting step of genome sequencing is the finishing process, needed to infer the relative position of each contig and close sequencing gaps. An additional degree of complexity is given by bacterial species harboring more than one replicon, which are not contemplated by the currently available programs. The availability of a large number of bacterial genomes allows geneticists to use complete genomes (possibly from the same species as templates for contigs mapping. Here we present CONTIGuator, a software tool for contigs mapping over a reference genome which allows the visualization of a map of contigs, underlining loss and/or gain of genetic elements and permitting to finish multipartite genomes. The functionality of CONTIGuator was tested using four genomes, demonstrating its improved performances compared to currently available programs. Our approach appears efficient, with a clear visualization, allowing the user to perform comparative structural genomics analysis on draft genomes. CONTIGuator is a Python script for Linux environments and can be used on normal desktop machines and can be downloaded from http://contiguator.sourceforge.net.

  7. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Science.gov (United States)

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  8. The vacuolar protein sorting genes in insects: A comparative genome view.

    Science.gov (United States)

    Li, Zhaofei; Blissard, Gary

    2015-07-01

    In eukaryotic cells, regulated vesicular trafficking is critical for directing protein transport and for recycling and degradation of membrane lipids and proteins. Through carefully regulated transport vesicles, the endomembrane system performs a large and important array of dynamic cellular functions while maintaining the integrity of the cellular membrane system. Genetic studies in yeast Saccharomyces cerevisiae have identified approximately 50 vacuolar protein sorting (VPS) genes involved in vesicle trafficking, and most of these genes are also characterized in mammals. The VPS proteins form distinct functional complexes, which include complexes known as ESCRT, retromer, CORVET, HOPS, GARP, and PI3K-III. Little is known about the orthologs of VPS proteins in insects. Here, with the newly annotated Manduca sexta genome, we carried out genomic comparative analysis of VPS proteins in yeast, humans, and 13 sequenced insect genomes representing the Orders Hymenoptera, Diptera, Hemiptera, Phthiraptera, Lepidoptera, and Coleoptera. Amino acid sequence alignments and domain/motif structure analyses reveal that most of the components of ESCRT, retromer, CORVET, HOPS, GARP, and PI3K-III are evolutionarily conserved across yeast, insects, and humans. However, in contrast to the VPS gene expansions observed in the human genome, only four VPS genes (VPS13, VPS16, VPS33, and VPS37) were expanded in the six insect Orders. Additionally, VPS2 was expanded only in species from Phthiraptera, Lepidoptera, and Coleoptera. These studies provide a baseline for understanding the evolution of vesicular trafficking across yeast, insect, and human genomes, and also provide a basis for further addressing specific functional roles of VPS proteins in insects. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Conserved TAAATG sequence at the transcriptional and translational initiation sites of vaccinia virus late genes deduced by structural and functional analysis of the HindIII H genome fragment.

    Science.gov (United States)

    Rosel, J L; Earl, P L; Weir, J P; Moss, B

    1986-11-01

    The sequence of the 8,600-base-pair HindIII H fragment, located at the center of the vaccinia virus genome, was determined to analyze several late genes. Seven major complete open reading frames (ORFs) and two that started from or continued into adjacent DNA segments were identified. ORFs were closely spaced and present on both DNA strands. Some adjacent ORFs had oppositely oriented overlapping termination codons or contiguous stop and start codons. Nucleotide compositional analysis indicated that the A-T frequency was consistently lowest in the first codon position. The sizes of the polypeptides predicted from the DNA sequence were compared with those determined by polyacrylamide gel electrophoresis of cell-free translation products of mRNAs selected by hybridization to cloned single-stranded DNA segments or synthesized in vitro by bacteriophage T7 RNA polymerase. Six transcripts that initiated within the HindIII H DNA fragment were detected, and of these, four were synthesized only at late times, one was synthesized only early, and one was synthesized early and late. The sites on the genome corresponding to the 5' ends of the transcripts were located by high-resolution nuclease S1 analysis. For late genes, the transcriptional and translational initiation sites mapped within a few nucleotides of each other, and in each case the sequence TAAATGG occurred at the start of the ORF. The extremely short leader and the absence of A or G in the -3 position, relative to the first nucleotide of the initiation codon, distinguishes the majority of vaccinia virus late genes from eucaryotic and vaccinia virus early genes.

  10. The complete mitochondrial genome of the house dust mite Dermatophagoides pteronyssinus (Trouessart: a novel gene arrangement among arthropods

    Directory of Open Access Journals (Sweden)

    Vanholme Bartel

    2009-03-01

    Full Text Available Abstract Background The apparent scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks. This subclass encompasses over 48,000 species and forms the largest group within the Arachnida. Although mitochondrial genomes are widely utilised for phylogenetic and population genetic studies, only 20 mitochondrial genomes of Acari have been determined, of which only one belongs to the diverse order of the Sarcoptiformes. In this study, we describe the mitochondrial genome of the European house dust mite Dermatophagoides pteronyssinus, the most important member of this largely neglected group. Results The mitochondrial genome of D. pteronyssinus is a circular DNA molecule of 14,203 bp. It contains the complete set of 37 genes (13 protein coding genes, 2 rRNA genes and 22 tRNA genes, usually present in metazoan mitochondrial genomes. The mitochondrial gene order differs considerably from that of other Acari mitochondrial genomes. Compared to the mitochondrial genome of Limulus polyphemus, considered as the ancestral arthropod pattern, only 11 of the 38 gene boundaries are conserved. The majority strand has a 72.6% AT-content but a GC-skew of 0.194. This skew is the reverse of that normally observed for typical animal mitochondrial genomes. A microsatellite was detected in a large non-coding region (286 bp, which probably functions as the control region. Almost all tRNA genes lack a T-arm, provoking the formation of canonical cloverleaf tRNA-structures, and both rRNA genes are considerably reduced in size. Finally, the genomic sequence was used to perform a phylogenetic study. Both maximum likelihood and Bayesian inference analysis clustered D. pteronyssinus with Steganacarus magnus, forming a sistergroup of the Trombidiformes. Conclusion Although the mitochondrial genome of D. pteronyssinus shares different features with previously characterised Acari mitochondrial genomes, it is unique in many ways. Gene

  11. Towards fully automated structure-based function prediction in structural genomics: a case study.

    Science.gov (United States)

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  12. The first complete mitochondrial genome sequences of Amblypygi (Chelicerata: Arachnida) reveal conservation of the ancestral arthropod gene order.

    Science.gov (United States)

    Fahrein, Kathrin; Masta, Susan E; Podsiadlowski, Lars

    2009-05-01

    Amblypygi (whip spiders) are terrestrial chelicerates inhabiting the subtropics and tropics. In morphological and rRNA-based phylogenetic analyses, Amblypygi cluster with Uropygi (whip scorpions) and Araneae (spiders) to form the taxon Tetrapulmonata, but there is controversy regarding the interrelationship of these three taxa. Mitochondrial genomes provide an additional large data set of phylogenetic information (sequences, gene order, RNA secondary structure), but in arachnids, mitochondrial genome data are missing for some of the major orders. In the course of an ongoing project concerning arachnid mitochondrial genomics, we present the first two complete mitochondrial genomes from Amblypygi. Both genomes were found to be typical circular duplex DNA molecules with all 37 genes usually present in bilaterian mitochondrial genomes. In both species, gene order is identical to that of Limulus polyphemus (Xiphosura), which is assumed to reflect the putative arthropod ground pattern. All tRNA gene sequences have the potential to fold into structures that are typical of metazoan mitochondrial tRNAs, except for tRNA-Ala, which lacks the D arm in both amblypygids, suggesting the loss of this feature early in amblypygid evolution. Phylogenetic analysis resulted in weak support for Uropygi being the sister group of Amblypygi.

  13. Evolutionary genomics and population structure of Entamoeba histolytica

    Science.gov (United States)

    Das, Koushik; Ganguly, Sandipan

    2014-01-01

    Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba. PMID:25505504

  14. The compact Selaginella genome identifies changes in gene content associated with the evolution of vascular plants

    Energy Technology Data Exchange (ETDEWEB)

    Grigoriev, Igor V.; Banks, Jo Ann; Nishiyama, Tomoaki; Hasebe, Mitsuyasu; Bowman, John L.; Gribskov, Michael; dePamphilis, Claude; Albert, Victor A.; Aono, Naoki; Aoyama, Tsuyoshi; Ambrose, Barbara A.; Ashton, Neil W.; Axtell, Michael J.; Barker, Elizabeth; Barker, Michael S.; Bennetzen, Jeffrey L.; Bonawitz, Nicholas D.; Chapple, Clint; Cheng, Chaoyang; Correa, Luiz Gustavo Guedes; Dacre, Michael; DeBarry, Jeremy; Dreyer, Ingo; Elias, Marek; Engstrom, Eric M.; Estelle, Mark; Feng, Liang; Finet, Cedric; Floyd, Sandra K.; Frommer, Wolf B.; Fujita, Tomomichi; Gramzow, Lydia; Gutensohn, Michael; Harholt, Jesper; Hattori, Mitsuru; Heyl, Alexander; Hirai, Tadayoshi; Hiwatashi, Yuji; Ishikawa, Masaki; Iwata, Mineko; Karol, Kenneth G.; Koehler, Barbara; Kolukisaoglu, Uener; Kubo, Minoru; Kurata, Tetsuya; Lalonde, Sylvie; Li, Kejie; Li, Ying; Litt, Amy; Lyons, Eric; Manning, Gerard; Maruyama, Takeshi; Michael, Todd P.; Mikami, Koji; Miyazaki, Saori; Morinaga, Shin-ichi; Murata, Takashi; Mueller-Roeber, Bernd; Nelson, David R.; Obara, Mari; Oguri, Yasuko; Olmstead, Richard G.; Onodera, Naoko; Petersen, Bent Larsen; Pils, Birgit; Prigge, Michael; Rensing, Stefan A.; Riano-Pachon, Diego Mauricio; Roberts, Alison W.; Sato, Yoshikatsu; Scheller, Henrik Vibe; Schulz, Burkhard; Schulz, Christian; Shakirov, Eugene V.; Shibagaki, Nakako; Shinohara, Naoki; Shippen, Dorothy E.; Sorensen, Iben; Sotooka, Ryo; Sugimoto, Nagisa; Sugita, Mamoru; Sumikawa, Naomi; Tanurdzic, Milos; Theilsen, Gunter; Ulvskov, Peter; Wakazuki, Sachiko; Weng, Jing-Ke; Willats, William W.G.T.; Wipf, Daniel; Wolf, Paul G.; Yang, Lixing; Zimmer, Andreas D.; Zhu, Qihui; Mitros, Therese; Hellsten, Uffe; Loque, Dominique; Otillar, Robert; Salamov, Asaf; Schmutz, Jeremy; Shapiro, Harris; Lindquist, Erika; Lucas, Susan; Rokhsar, Daniel

    2011-04-28

    We report the genome sequence of the nonseed vascular plant, Selaginella moellendorffii, and by comparative genomics identify genes that likely played important roles in the early evolution of vascular plants and their subsequent evolution

  15. Comparative analysis of codon usage patterns and identification of predicted highly expressed genes in five Salmonella genomes

    Directory of Open Access Journals (Sweden)

    Mondal U

    2008-01-01

    Full Text Available Purpose: To anlyse codon usage patterns of five complete genomes of Salmonella , predict highly expressed genes, examine horizontally transferred pathogenicity-related genes to detect their presence in the strains, and scrutinize the nature of highly expressed genes to infer upon their lifestyle. Methods: Protein coding genes, ribosomal protein genes, and pathogenicity-related genes were analysed with Codon W and CAI (codon adaptation index Calculator. Results: Translational efficiency plays a role in codon usage variation in Salmonella genes. Low bias was noticed in most of the genes. GC3 (guanine cytosine at third position composition does not influence codon usage variation in the genes of these Salmonella strains. Among the cluster of orthologous groups (COGs, translation, ribosomal structure biogenesis [J], and energy production and conversion [C] contained the highest number of potentially highly expressed (PHX genes. Correspondence analysis reveals the conserved nature of the genes. Highly expressed genes were detected. Conclusions: Selection for translational efficiency is the major source of variation of codon usage in the genes of Salmonella . Evolution of pathogenicity-related genes as a unit suggests their ability to infect and exist as a pathogen. Presence of a lot of PHX genes in the information and storage-processing category of COGs indicated their lifestyle and revealed that they were not subjected to genome reduction.

  16. Bidirectional promoters of insects: genome-wide comparison, evolutionary implication and influence on gene expression.

    Science.gov (United States)

    Behura, Susanta K; Severson, David W

    2015-01-30

    Bidirectional promoters are widespread in insect genomes. By analyzing 23 insect genomes we show that the frequency of bidirectional gene pairs varies according to genome compactness and density of genes among the species. The density of bidirectional genes expected based on number of genes per megabase of genome explains the observed density suggesting that bidirectional pairing of genes may be due to random event. We identified specific transcription factor binding motifs that are enriched in bidirectional promoters across insect species. Furthermore, we observed that bidirectional promoters may act as transcriptional hotspots in insect genomes where protein coding genes tend to aggregate in significantly biased (p promoters. Natural selection seems to have an association with the extent of bidirectionality of genes among the species. The rate of non-synonymous-to-synonymous changes (dN/dS) shows a second-order polynomial distribution with bidirectionality between species indicating that bidirectionality is dependent upon evolutionary pressure acting on the genomes. Analysis of genome-wide microarray expression data of multiple insect species suggested that bidirectionality has a similar association with transcriptome variation across species. Furthermore, bidirectional promoters show significant association with correlated expression of the divergent gene pairs depending upon their motif composition. Analysis of gene ontology showed that bidirectional genes tend to have a common association with functions related to "binding" (including ion binding, nucleotide binding and protein binding) across genomes. Such functional constraint of bidirectional genes may explain their widespread persistence in genome of diverse insect species.

  17. Genome-wide identification and expression profiling of auxin response factor (ARF gene family in maize

    Directory of Open Access Journals (Sweden)

    Zhang Yirong

    2011-04-01

    Full Text Available Abstract Background Auxin signaling is vital for plant growth and development, and plays important role in apical dominance, tropic response, lateral root formation, vascular differentiation, embryo patterning and shoot elongation. Auxin Response Factors (ARFs are the transcription factors that regulate the expression of auxin responsive genes. The ARF genes are represented by a large multigene family in plants. The first draft of full maize genome assembly has recently been released, however, to our knowledge, the ARF gene family from maize (ZmARF genes has not been characterized in detail. Results In this study, 31 maize (Zea mays L. genes that encode ARF proteins were identified in maize genome. It was shown that maize ARF genes fall into related sister pairs and chromosomal mapping revealed that duplication of ZmARFs was associated with the chromosomal block duplications. As expected, duplication of some ZmARFs showed a conserved intron/exon structure, whereas some others were more divergent, suggesting the possibility of functional diversification for these genes. Out of these 31 ZmARF genes, 14 possess auxin-responsive element in their promoter region, among which 7 appear to show small or negligible response to exogenous auxin. The 18 ZmARF genes were predicted to be the potential targets of small RNAs. Transgenic analysis revealed that increased miR167 level could cause degradation of transcripts of six potential targets (ZmARF3, 9, 16, 18, 22 and 30. The expressions of maize ARF genes are responsive to exogenous auxin treatment. Dynamic expression patterns of ZmARF genes were observed in different stages of embryo development. Conclusions Maize ARF gene family is expanded (31 genes as compared to Arabidopsis (23 genes and rice (25 genes. The expression of these genes in maize is regulated by auxin and small RNAs. Dynamic expression patterns of ZmARF genes in embryo at different stages were detected which suggest that maize ARF genes may

  18. Genome-Wide Screening of Genes Required for Glycosylphosphatidylinositol Biosynthesis.

    Directory of Open Access Journals (Sweden)

    Yao Rong

    Full Text Available Glycosylphosphatidylinositol (GPI is synthesized and transferred to proteins in the endoplasmic reticulum (ER. GPI-anchored proteins are then transported from the ER to the plasma membrane through the Golgi apparatus. To date, at least 17 steps have been identified to be required for the GPI biosynthetic pathway. Here, we aimed to establish a comprehensive screening method to identify genes involved in GPI biosynthesis using mammalian haploid screens. Human haploid cells were mutagenized by the integration of gene trap vectors into the genome. Mutagenized cells were then treated with a bacterial pore-forming toxin, aerolysin, which binds to GPI-anchored proteins for targeting to the cell membrane. Cells that showed low surface expression of CD59, a GPI-anchored protein, were further enriched for. Gene trap insertion sites in the non-selected population and in the enriched population were determined by deep sequencing. This screening enriched 23 gene regions among the 26 known GPI biosynthetic genes, which when mutated are expected to decrease the surface expression of GPI-anchored proteins. Our results indicate that the forward genetic approach using haploid cells is a useful and powerful technique to identify factors involved in phenotypes of interest.

  19. Population Structure Analysis of Bull Genomes of European and Western Ancestry

    Science.gov (United States)

    Chung, Neo Christopher; Szyda, Joanna; Frąszczak, Magdalena; Fries, Hans Rudolf; SandøLund, Mogens; Guldbrandtsen, Bernt; Boichard, Didier; Stothard, Paul; Veerkamp, Roel; Goddard, Michael; Van Tassell, Curtis P.; Hayes, Ben

    2017-01-01

    Since domestication, population bottlenecks, breed formation, and selective breeding have radically shaped the genealogy and genetics of Bos taurus. In turn, characterization of population structure among diverse bull (males of Bos taurus) genomes enables detailed assessment of genetic resources and origins. By analyzing 432 unrelated bull genomes from 13 breeds and 16 countries, we demonstrate genetic diversity and structural complexity among the European/Western cattle population. Importantly, we relaxed a strong assumption of discrete or admixed population, by adapting latent variable models for individual-specific allele frequencies that directly capture a wide range of complex structure from genome-wide genotypes. As measured by magnitude of differentiation, selection pressure on SNPs within genes is substantially greater than that on intergenic regions. Additionally, broad regions of chromosome 6 harboring largest genetic differentiation suggest positive selection underlying population structure. We carried out gene set analysis using SNP annotations to identify enriched functional categories such as energy-related processes and multiple development stages. Our population structure analysis of bull genomes can support genetic management strategies that capture structural complexity and promote sustainable genetic breadth. PMID:28084449

  20. Comparative genomics of Geobacter chemotaxis genes reveals diverse signaling function

    Directory of Open Access Journals (Sweden)

    Antommattei Frances M

    2008-10-01

    Full Text Available Abstract Background Geobacter species are δ-Proteobacteria and are often the predominant species in a variety of sedimentary environments where Fe(III reduction is important. Their ability to remediate contaminated environments and produce electricity makes them attractive for further study. Cell motility, biofilm formation, and type IV pili all appear important for the growth of Geobacter in changing environments and for electricity production. Recent studies in other bacteria have demonstrated that signaling pathways homologous to the paradigm established for Escherichia coli chemotaxis can regulate type IV pili-dependent motility, the synthesis of flagella and type IV pili, the production of extracellular matrix material, and biofilm formation. The classification of these pathways by comparative genomics improves the ability to understand how Geobacter thrives in natural environments and better their use in microbial fuel cells. Results The genomes of G. sulfurreducens, G. metallireducens, and G. uraniireducens contain multiple (~70 homologs of chemotaxis genes arranged in several major clusters (six, seven, and seven, respectively. Unlike the single gene cluster of E. coli, the Geobacter clusters are not all located near the flagellar genes. The probable functions of some Geobacter clusters are assignable by homology to known pathways; others appear to be unique to the Geobacter sp. and contain genes of unknown function. We identified large numbers of methyl-accepting chemotaxis protein (MCP homologs that have diverse sensing domain architectures and generate a potential for sensing a great variety of environmental signals. We discuss mechanisms for class-specific segregation of the MCPs in the cell membrane, which serve to maintain pathway specificity and diminish crosstalk. Finally, the regulation of gene expression in Geobacter differs from E. coli. The sequences of predicted promoter elements suggest that the alternative sigma factors