WorldWideScience

Sample records for gene family annotation

  1. Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists

    Directory of Open Access Journals (Sweden)

    Masseroli Marco

    2007-03-01

    Full Text Available Abstract Background The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within GFINDer, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining. Results Exploiting protein information in Pfam and InterPro databanks, we developed and added in GFINDer original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the Statistics Protein Families&Domains module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the Logistic Regression module allows identifying protein functional signatures that better explain the considered gene classification. Conclusion Novel GFINDer modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes.

  2. Comparative genomic analysis of the family Iridoviridae: re-annotating and defining the core set of iridovirus genes

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2007-01-01

    Full Text Available Abstract Background Members of the family Iridoviridae can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. In the present study, we describe the re-analysis of the Iridoviridae family of complex DNA viruses using a variety of comparative genomic tools to yield a greater consensus among the annotated sequences of its members. Results A series of genomic sequence comparisons were made among, and between the Ranavirus and Megalocytivirus genera in order to identify novel conserved ORFs. Of these two genera, the Megalocytivirus genomes required the greatest number of altered annotations. Prior to our re-analysis, the Megalocytivirus species orange-spotted grouper iridovirus and rock bream iridovirus shared 99% sequence identity, but only 82 out of 118 potential ORFs were annotated; in contrast, we predict that these species share an identical complement of genes. These annotation changes allowed the redefinition of the group of core genes shared by all iridoviruses. Seven new core genes were identified, bringing the total number to 26. Conclusion Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

  3. Ixodes scapularis tick serine proteinase inhibitor (serpin gene family; annotation and transcriptional analysis

    Directory of Open Access Journals (Sweden)

    Chalaire Katelyn C

    2009-05-01

    Full Text Available Abstract Background Serine proteinase inhibitors (Serpins are a large superfamily of structurally related, but functionally diverse proteins that control essential proteolytic pathways in most branches of life. Given their importance in the biology of many organisms, the concept that ticks might utilize serpins to evade host defenses and immunizing against or disrupting their functions as targets for tick control is an appealing option. Results A sequence homology search strategy has allowed us to identify at least 45 tick serpin genes in the Ixodes scapularis genome that are structurally segregated into 32 intronless and 13 intron-containing genes. Nine of the intron-containing serpins occur in a cluster of 11 genes that span 170 kb of DNA sequence. Based on consensus amino acid residues in the reactive center loop (RCL and signal peptide scanning, 93% are putatively inhibitory while 82% are putatively extracellular. Among the 11 different amino acid residues that are predicted at the P1 sites, 16 sequences possess basic amino acid (R/K residues. Temporal and spatial expression analyses revealed that 40 of the 45 serpins are differentially expressed in salivary glands (SG and/or midguts (MG of unfed and partially fed ticks. Ten of the 38 serpin genes were expressed from six to 24 hrs of feeding while six and fives genes each are predominantly or exclusively expressed in either MG and SG respectively. Conclusion Given the diversity among tick species, sizes of tick serpin families are likely to be variable. However this study provides insight on the potential sizes of serpin protein families in ticks. Ticks must overcome inflammation, complement activation and blood coagulation to complete feeding. Since these pathways are regulated by serpins that have basic residues at their P1 sites, we speculate that I. scapularis may utilize some of the serpins reported in this study to manipulate host defense. We have discussed our data in the context of

  4. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  5. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

  6. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera) Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Science.gov (United States)

    2010-01-01

    Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS) were predicted by in silico analysis of the grapevine (Vitis vinifera) genome assembly [1]. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information about gene structure and

  7. Annotation, Phylogeny and Expression Analysis of the Nuclear Factor Y Gene Families in Common Bean (Phaseolus vulgaris

    Directory of Open Access Journals (Sweden)

    Carolina eRípodas

    2015-01-01

    Full Text Available In the past decade, plant nuclear factor Y (NF-Y genes have gained major interest due to their roles in many biological processes in plant development or adaptation to environmental conditions, particularly in the root nodule symbiosis established between legume plants and nitrogen fixing bacteria. NF-Ys are heterotrimeric transcriptional complexes composed of three subunits, NF-YA, NF-YB and NF-YC, which bind with high affinity and specificity to the CCAAT box, a cis element present in many eukaryotic promoters. In plants, NF-Y subunits consist of gene families with about ten members each. In this study, we have identified and characterized the NF-Y gene families of common bean (Phaseolus vulgaris, a grain legume of worldwide economical importance and the main source of dietary protein of developing countries. Expression analysis showed that some members of each family are up-regulated at early or late stages of the nitrogen fixing symbiotic interaction with its partner Rhizobium etli. We also showed that some genes are differentially accumulated in response to inoculation with high or less efficient R. etli strains, constituting excellent candidates to participate in the strain-specific response during symbiosis. Genes of the NF-YA family exhibit a highly structured intron-exon organization. Moreover, this family is characterized by the presence of upstream ORFs when introns in the 5' UTR are retained and miRNA target sites in their 3' UTR, suggesting that these genes might be subjected to a complex post-transcriptional regulation. Multiple protein alignments indicated the presence of highly conserved domains in each of the NF-Y families, presumably involved in subunit interactions and DNA binding. The analysis presented here constitutes a starting point to understand the regulation and biological function of individual members of the NF-Y families in different developmental processes in this grain legume.

  8. COGNATE: comparative gene annotation characterizer.

    Science.gov (United States)

    Wilbrandt, Jeanne; Misof, Bernhard; Niehuis, Oliver

    2017-07-17

    The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https

  9. Metagenomic gene annotation by a homology-independent approach

    Energy Technology Data Exchange (ETDEWEB)

    Froula, Jeff; Zhang, Tao; Salmeen, Annette; Hess, Matthias; Kerfeld, Cheryl A.; Wang, Zhong; Du, Changbin

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.

  10. Missing genes in the annotation of prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Feng Wu-chun

    2010-03-01

    Full Text Available Abstract Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting. Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations. The vast majority of the missing genes found are small (less than 100 aa. A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

  11. Discovering gene annotations in biomedical text databases

    Directory of Open Access Journals (Sweden)

    Ozsoyoglu Gultekin

    2008-03-01

    Full Text Available Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i automating the annotation of genomic entities with Gene Ontology concepts, and (ii providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate

  12. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.

    Science.gov (United States)

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; Wang, Yadong; Rhee, Seung Y; Chen, Jin

    2015-02-14

    Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited. Supplementary information and software are available at http://www.msu.edu/~jinchen/NETSIM .

  13. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL.

    Science.gov (United States)

    Jupp, Simon; Stevens, Robert; Hoehndorf, Robert

    2012-04-24

    Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of

  14. A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs proteins family example

    Directory of Open Access Journals (Sweden)

    Blanc M

    2005-11-01

    Full Text Available Abstract Background large scale and reliable proteins' functional annotation is a major challenge in modern biology. Phylogenetic analyses have been shown to be important for such tasks. However, up to now, phylogenetic annotation did not take into account expression data (i.e. ESTs, Microarrays, SAGE, .... Therefore, integrating such data, like ESTs in phylogenetic annotation could be a major advance in post genomic analyses. We developed an approach enabling the combination of expression data and phylogenetic analysis. To illustrate our method, we used an example protein family, the peptidyl arginine deiminases (PADs, probably implied in Rheumatoid Arthritis. Results the analysis was performed as follows: we built a phylogeny of PAD proteins from the NCBI's NR protein database. We completed the phylogenetic reconstruction of PADs using an enlarged sequence database containing translations of ESTs contigs. We then extracted all corresponding expression data contained in EST database This analysis allowed us 1/To extend the spectrum of homologs-containing species and to improve the reconstruction of genes' evolutionary history. 2/To deduce an accurate gene expression pattern for each member of this protein family. 3/To show a correlation between paralogous sequences' evolution rate and pattern of tissular expression. Conclusion coupling phylogenetic reconstruction and expression data is a promising way of analysis that could be applied to all multigenic families to investigate the relationship between molecular and transcriptional evolution and to improve functional annotation.

  15. A robust data-driven approach for gene ontology annotation

    OpenAIRE

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For th...

  16. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  17. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  18. Ontology-Based Prediction and Prioritization of Gene Functional Annotations.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2016-01-01

    Genes and their protein products are essential molecular units of a living organism. The knowledge of their functions is key for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. The association of a gene or protein with its functions, described by controlled terms of biomolecular terminologies or ontologies, is named gene functional annotation. Very many and valuable gene annotations expressed through terminologies and ontologies are available. Nevertheless, they might include some erroneous information, since only a subset of annotations are reviewed by curators. Furthermore, they are incomplete by definition, given the rapidly evolving pace of biomolecular knowledge. In this scenario, computational methods that are able to quicken the annotation curation process and reliably suggest new annotations are very important. Here, we first propose a computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations; then, we introduce a new semantic prioritization rule to categorize the predicted annotations by their likelihood of being correct. Our tests and validations proved the effectiveness of our pipeline and prioritization of predicted annotations, by selecting as most likely manifold predicted annotations that were later confirmed.

  19. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  20. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  1. Cellular functions of genetically imprinted genes in human and mouse as annotated in the gene ontology.

    Science.gov (United States)

    Hamed, Mohamed; Ismael, Siba; Paulsen, Martina; Helms, Volkhard

    2012-01-01

    By analyzing the cellular functions of genetically imprinted genes as annotated in the Gene Ontology for human and mouse, we found that imprinted genes are often involved in developmental, transport and regulatory processes. In the human, paternally expressed genes are enriched in GO terms related to the development of organs and of anatomical structures. In the mouse, maternally expressed genes regulate cation transport as well as G-protein signaling processes. Furthermore, we investigated if imprinted genes are regulated by common transcription factors. We identified 25 TF families that showed an enrichment of binding sites in the set of imprinted genes in human and 40 TF families in mouse. In general, maternally and paternally expressed genes are not regulated by different transcription factors. The genes Nnat, Klf14, Blcap, Gnas and Ube3a contribute most to the enrichment of TF families. In the mouse, genes that are maternally expressed in placenta are enriched for AP1 binding sites. In the human, we found that these genes possessed binding sites for both, AP1 and SP1.

  2. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  3. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  4. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  5. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome...... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms......-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....

  6. HMM-Based Gene Annotation Methods

    Energy Technology Data Exchange (ETDEWEB)

    Haussler, David; Hughey, Richard; Karplus, Keven

    1999-09-20

    Development of new statistical methods and computational tools to identify genes in human genomic DNA, and to provide clues to their functions by identifying features such as transcription factor binding sites, tissue, specific expression and splicing patterns, and remove homologies at the protein level with genes of known function.

  7. Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae

    Directory of Open Access Journals (Sweden)

    He Ningjia

    2008-01-01

    Full Text Available Abstract Background The most abundant family of insect cuticular proteins, the CPR family, is recognized by the R&R Consensus, a domain of about 64 amino acids that binds to chitin and is present throughout arthropods. Several species have now been shown to have more than 100 CPR genes, inviting speculation as to the functional importance of this large number and diversity. Results We have identified 156 genes in Anopheles gambiae that code for putative cuticular proteins in this CPR family, over 1% of the total number of predicted genes in this species. Annotation was verified using several criteria including identification of TATA boxes, INRs, and DPEs plus support from proteomic and gene expression analyses. Two previously recognized CPR classes, RR-1 and RR-2, form separate, well-supported clades with the exception of a small set of genes with long branches whose relationships are poorly resolved. Several of these outliers have clear orthologs in other species. Although both clades are under purifying selection, the RR-1 variant of the R&R Consensus is evolving at twice the rate of the RR-2 variant and is structurally more labile. In contrast, the regions flanking the R&R Consensus have diversified in amino-acid composition to a much greater extent in RR-2 genes compared with RR-1 genes. Many genes are found in compact tandem arrays that may include similar or dissimilar genes but always include just one of the two classes. Tandem arrays of RR-2 genes frequently contain subsets of genes coding for highly similar proteins (sequence clusters. Properties of the proteins indicated that each cluster may serve a distinct function in the cuticle. Conclusion The complete annotation of this large gene family provides insight on the mechanisms of gene family evolution and clues about the need for so many CPR genes. These data also should assist annotation of other Anopheles genes.

  8. Computational annotation of genes differentially expressed along olive fruit development

    Directory of Open Access Journals (Sweden)

    Martinelli Federico

    2009-10-01

    Full Text Available Abstract Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1, completed pit hardening (stage 2 and veraison (stage 3] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B, and 2 and 3 (libraries C and D. All sequenced clones (1,132 in total were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO: cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was

  9. Novel definition files for human GeneChips based on GeneAnnot

    Directory of Open Access Journals (Sweden)

    Ferrari Sergio

    2007-11-01

    Full Text Available Abstract Background Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. Results We developed a novel set of custom Chip Definition Files (CDF and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene. Conclusion GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results.

  10. KEGG as a reference resource for gene and protein annotation.

    Science.gov (United States)

    Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2016-01-04

    KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.

  11. Construction of coffee transcriptome networks based on gene annotation semantics.

    Science.gov (United States)

    Castillo, Luis F; Galeano, Narmer; Isaza, Gustavo A; Gaitán, Alvaro

    2012-07-24

    Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST) and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF) using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.

  12. Information theory applied to the sparse gene ontology annotation network to predict novel gene function

    Science.gov (United States)

    Tao, Ying; Li, Jianrong

    2010-01-01

    Motivation Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches. Results We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003. Availability The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at http://phenos.bsd.uchicago.edu/mphenogo/prediction_result_2005.txt. PMID:17646340

  13. The Anopheles gambiae glutathione transferase supergene family: annotation, phylogeny and expression profiles

    Directory of Open Access Journals (Sweden)

    Rossiter Louise C

    2003-08-01

    Full Text Available Abstract Background Twenty-eight genes putatively encoding cytosolic glutathione transferases have been identified in the Anopheles gambiae genome. We manually annotated these genes and then confirmed the annotation by sequencing of A. gambiae cDNAs. Phylogenetic analysis with the 37 putative GST genes from Drosophila and representative GSTs from other taxa was undertaken to develop a nomenclature for insect GSTs. The epsilon class of insect GSTs has previously been implicated in conferring insecticide resistance in several insect species. We compared the expression level of all members of this GST class in two strains of A. gambiae to determine whether epsilon GST expression is correlated with insecticide resistance status. Results Two A. gambiae GSTs are alternatively spliced resulting in a maximum number of 32 transcripts encoding cytosolic GSTs. We detected cDNAs for 31 of these in adult mosquitoes. There are at least six different classes of GSTs in insects but 20 of the A. gambiae GSTs belong to the two insect specific classes, delta and epsilon. Members of these two GST classes are clustered on chromosome arms 2L and 3R respectively. Two members of the GST supergene family are intronless. Amongst the remainder, there are 13 unique introns positions but within the epsilon and delta class, there is considerable conservation of intron positions. Five of the eight epsilon GSTs are overexpressed in a DDT resistant strain of A. gambiae. Conclusions The GST supergene family in A. gambiae is extensive and regulation of transcription of these genes is complex. Expression profiling of the epsilon class supports earlier predictions that this class is important in conferring insecticide resistance.

  14. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  15. A collection of bioconductor methods to visualize gene-list annotations

    Directory of Open Access Journals (Sweden)

    Kibbe Warren A

    2010-01-01

    Full Text Available Abstract Background Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes. Findings We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories, and concept-gene connections (a.k.a. annotations: the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists. Conclusions These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.

  16. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks.

  17. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  18. A method for increasing expressivity of Gene Ontology annotations using a compositional approach.

    Science.gov (United States)

    Huntley, Rachael P; Harris, Midori A; Alam-Faruque, Yasmin; Blake, Judith A; Carbon, Seth; Dietze, Heiko; Dimmer, Emily C; Foulger, Rebecca E; Hill, David P; Khodiyar, Varsha K; Lock, Antonia; Lomax, Jane; Lovering, Ruth C; Mutowo-Meullenet, Prudence; Sawford, Tony; Van Auken, Kimberly; Wood, Valerie; Mungall, Christopher J

    2014-05-21

    The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.

  19. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  20. Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome

    Directory of Open Access Journals (Sweden)

    Dougan Gordon

    2009-12-01

    Full Text Available Abstract Background Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region. Results The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI, and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC to establish a standardized naming scheme for alpha-defensins. Conclusions Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene

  1. Family Life: Literature and Films. An Annotated Bibliography. Supplement to Fourth Revision.

    Science.gov (United States)

    Pitzer, Ronald L., Ed.

    This supplement to the fourth edition of "Family Life Literature and Films: An Annotated Bibliography" includes materials produced since the publication of the fourth edition (see AC 012 492). The materials are listed under nine topic headings, as follows: I. The American Family: Theoretical, Historical, and Subcultural Perspectives; II. Human…

  2. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  3. An improved method for specificity annotation shows a distinct evolutionary divergence among the microbial enzymes of the cholylglycine hydrolase family.

    Science.gov (United States)

    Panigrahi, Priyabrata; Sule, Manas; Sharma, Ranu; Ramasamy, Sureshkumar; Suresh, C G

    2014-06-01

    Bile salt hydrolases (BSHs) are gut microbial enzymes that play a significant role in the bile acid modification pathway. Penicillin V acylases (PVAs) are enzymes produced by environmental microbes, having a possible role in pathogenesis or scavenging of phenolic compounds in their microbial habitats. The correct annotation of such physiologically and industrially important enzymes is thus vital. The current methods relying solely on sequence homology do not always provide accurate annotations for these two members of the cholylglycine hydrolase (CGH) family as BSH/PVA enzymes. Here, we present an improved method [binding site similarity (BSS)-based scoring system] for the correct annotation of the CGH family members as BSH/PVA enzymes, which along with the phylogenetic information incorporates the substrate specificity as well as the binding site information. The BSS scoring system was developed through the analysis of the binding sites and binding modes of the available BSH/PVA structures with substrates glycocholic acid and penicillin V. The 198 sequences in the dataset were then annotated accurately using BSS scores as BSH/PVA enzymes. The dataset presented contained sequences from Gram-positive bacteria, Gram-negative bacteria and archaea. The clustering obtained for the dataset using the method described above showed a clear distinction in annotation of Gram-positive bacteria and Gram-negative bacteria. Based on this clustering and a detailed analysis of the sequences of the CGH family in the dataset, we could infer that the CGH genes might have evolved in accordance with the hypothesis stating the evolution of diderms and archaea from the monoderms.

  4. ParsEval: parallel comparison and analysis of gene structure annotations

    Directory of Open Access Journals (Sweden)

    Standage Daniel S

    2012-08-01

    Full Text Available Abstract Background Accurate gene structure annotation is a fundamental but somewhat elusive goal of genome projects, as witnessed by the fact that (model genomes typically undergo several cycles of re-annotation. In many cases, it is not only different versions of annotations that need to be compared but also different sources of annotation of the same genome, derived from distinct gene prediction workflows. Such comparisons are of interest to annotation providers, prediction software developers, and end-users, who all need to assess what is common and what is different among distinct annotation sources. We developed ParsEval, a software application for pairwise comparison of sets of gene structure annotations. ParsEval calculates several statistics that highlight the similarities and differences between the two sets of annotations provided. These statistics are presented in an aggregate summary report, with additional details provided as individual reports specific to non-overlapping, gene-model-centric genomic loci. Genome browser styled graphics embedded in these reports help visualize the genomic context of the annotations. Output from ParsEval is both easily read and parsed, enabling systematic identification of problematic gene models for subsequent focused analysis. Results ParsEval is capable of analyzing annotations for large eukaryotic genomes on typical desktop or laptop hardware. In comparison to existing methods, ParsEval exhibits a considerable performance improvement, both in terms of runtime and memory consumption. Reports from ParsEval can provide relevant biological insights into the gene structure annotations being compared. Conclusions Implemented in C, ParsEval provides the quickest and most feature-rich solution for genome annotation comparison to date. The source code is freely available (under an ISC license at http://parseval.sourceforge.net/.

  5. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    Directory of Open Access Journals (Sweden)

    Burgun Anita

    2006-05-01

    Full Text Available Abstract Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence or text-mining of the published scientific literature (literature profiling. Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2 and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells. Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions.

  6. Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation

    Energy Technology Data Exchange (ETDEWEB)

    Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcin P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew W.; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

    2008-10-27

    Hypothetical and conserved hypothetical genes account for>30percent of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved hypothetical (9.5percent) along with 887 hypothetical genes (24.4percent). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 hypothetical and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC-MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. 1212 of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations.

  7. The Literature on Military Families, 1980: An Annotated Bibliography.

    Science.gov (United States)

    1980-08-01

    author acknowledged that the military family has emerged as an important focal point of research. Just as industry and business have acknowledged that a...enhance the family’s ability to cope with and endure the repeated demands of farmily separations in the armed services. The research underscores the value

  8. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  9. Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER

    Indian Academy of Sciences (India)

    Gautam Aggarwal; Ramakrishna Ramaswamy

    2002-02-01

    We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.

  10. NuChart: an R package to study gene spatial neighbourhoods with multi-omics annotations.

    Directory of Open Access Journals (Sweden)

    Ivan Merelli

    Full Text Available Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expression and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C measurements performed with high throughput sequencing (Hi-C and molecular dynamics studies show that there is a large correlation between colocalization and coregulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. Here, we describe NuChart, an R package that allows the user to annotate and statistically analyse a list of input genes with information relying on Hi-C data, integrating knowledge about genomic features that are involved in the chromosome spatial organization. NuChart works directly with sequenced reads to identify the related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. Predictions about CTCF binding sites, isochores and cryptic Recombination Signal Sequences are provided directly with the package for mapping, although other annotation data in bed format can be used (such as methylation profiles and histone patterns. Gene expression data can be automatically retrieved and processed from the Gene Expression Omnibus and ArrayExpress repositories to highlight the expression profile of genes in the identified neighbourhood. Moreover, statistical inferences about the graph structure and correlations between its topology and multi-omics features can be performed using Exponential-family Random Graph Models. The Hi-C fragment visualisation provided by NuChart allows the comparisons of cells in different conditions, thus providing the possibility of novel biomarkers identification. NuChart is compliant with the Bioconductor standard and it is freely

  11. Transcriptome assembly, gene annotation and tissue gene expression atlas of the rainbow trout.

    Directory of Open Access Journals (Sweden)

    Mohamed Salem

    Full Text Available Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%, including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000-32,000 genes (35-71% of the identified genes accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the

  12. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

    Directory of Open Access Journals (Sweden)

    Norihiro Maeda

    2006-04-01

    Full Text Available The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts, providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

  13. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  14. InterPro in 2017—beyond protein family and domain annotations

    Science.gov (United States)

    Finn, Robert D.; Attwood, Teresa K.; Babbitt, Patricia C.; Bateman, Alex; Bork, Peer; Bridge, Alan J.; Chang, Hsin-Yu; Dosztányi, Zsuzsanna; El-Gebali, Sara; Fraser, Matthew; Gough, Julian; Haft, David; Holliday, Gemma L.; Huang, Hongzhan; Huang, Xiaosong; Letunic, Ivica; Lopez, Rodrigo; Lu, Shennan; Marchler-Bauer, Aron; Mi, Huaiyu; Mistry, Jaina; Natale, Darren A.; Necci, Marco; Nuka, Gift; Orengo, Christine A.; Park, Youngmi; Pesseat, Sebastien; Piovesan, Damiano; Potter, Simon C.; Rawlings, Neil D.; Redaschi, Nicole; Richardson, Lorna; Rivoire, Catherine; Sangrador-Vegas, Amaia; Sigrist, Christian; Sillitoe, Ian; Smithers, Ben; Squizzato, Silvano; Sutton, Granger; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C. E.; Wu, Cathy H.; Xenarios, Ioannis; Yeh, Lai-Su; Young, Siew-Yit; Mitchell, Alex L.

    2017-01-01

    InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences. PMID:27899635

  15. Annotation and comparative analysis of the glycoside hydrolase genes in Brachypodium distachyon

    Energy Technology Data Exchange (ETDEWEB)

    Tyler, Ludmila [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany; Bragg, Jennifer [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany; Wu, Jiajie [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany; Yang, Xiaohan [ORNL; Tuskan, Gerald A [ORNL; Vogel, John [United States Department of Agriculture (USDA), Western Regional Research Center (WRRC), Albany

    2010-01-01

    Background Glycoside hydrolases cleave the bond between a carbohydrate and another carbohydrate, a protein, lipid or other moiety. Genes encoding glycoside hydrolases are found in a wide range of organisms, from archea to animals, and are relatively abundant in plant genomes. In plants, these enzymes are involved in diverse processes, including starch metabolism, defense, and cell-wall remodeling. Glycoside hydrolase genes have been previously cataloged for Oryza sativa (rice), the model dicotyledonous plant Arabidopsis thaliana, and the fast-growing tree Populus trichocarpa (poplar). To improve our understanding of glycoside hydrolases in plants generally and in grasses specifically, we annotated the glycoside hydrolase genes in the grasses Brachypodium distachyon (an emerging monocotyledonous model) and Sorghum bicolor (sorghum). We then compared the glycoside hydrolases across species, both at the whole-genome level and at the level of individual glycoside hydrolase families. Results We identified 356 glycoside hydrolase genes in Brachypodium and 404 in sorghum. The corresponding proteins fell into the same 34 families that are represented in rice, Arabidopsis, and poplar, helping to define a glycoside hydrolase family profile which may be common to flowering plants. Examination of individual glycoside hydrolase familes (GH5, GH13, GH18, GH19, GH28, and GH51) revealed both similarities and distinctions between monocots and dicots, as well as between species. Shared evolutionary histories appear to be modified by lineage-specific expansions or deletions. Within families, the Brachypodium and sorghum proteins generally cluster with those from other monocots. Conclusions This work provides the foundation for further comparative and functional analyses of plant glycoside hydrolases. Defining the Brachypodium glycoside hydrolases sets the stage for Brachypodium to be a monocot model for investigations of these enzymes and their diverse roles in planta. Insights

  16. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

    Science.gov (United States)

    Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W

    2013-08-01

    Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

  17. Software Suite for Gene and Protein Annotation Prediction and Similarity Search.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2015-01-01

    In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages both inferred and available annotations to search for semantically similar genes. The suite consists of three components. BioAnnotationPredictor is a computational software module to predict new gene-functions based upon Singular Value Decomposition of available annotations. SimilBio is a Web module that leverages annotations available or predicted by BioAnnotationPredictor to discover similarities between genes via LSI. The suite includes also SemSim, a new Web service built upon these modules to allow accessing them programmatically. We integrated SemSim in the Bio Search Computing framework (http://www.bioinformatics.deib. polimi.it/bio-seco/seco/), where users can exploit the Search Computing technology to run multi-topic complex queries on multiple integrated Web services. Accordingly, researchers may obtain ranked answers involving the computation of the functional similarity between genes in support of biomedical knowledge discovery.

  18. Visual presentation as a welcome alternative to textual presentation of gene annotation information.

    Science.gov (United States)

    Desai, Jairav; Flatow, Jared M; Song, Jie; Zhu, Lihua J; Du, Pan; Huang, Chiang-Ching; Lu, Hui; Lin, Simon M; Kibbe, Warren A

    2010-01-01

    The functions of a gene are traditionally annotated textually using either free text (Gene Reference Into Function or GeneRIF) or controlled vocabularies (e.g., Gene Ontology or Disease Ontology). Inspired by the latest word cloud tools developed by the Information Visualization Group at IBM Research, we have prototyped a visual system for capturing gene annotations, which we named Gene Graph Into Function or GeneGIF. Fully developing the GeneGIF system would be a significant effort. To justify the necessity and to specify the design requirements of GeneGIF, we first surveyed the end-user preferences. From 53 responses, we found that a majority (64%, p 0.05) of the users favored visual presentation of information (GeneGIF) compared to textual (GeneRIF) information. The results of this study indicate that a visual presentation tool, such as GeneGIF, can complement standard textual presentation of gene annotations. Moreover, the survey participants provided many constructive comments that will specify the development of a phase-two project (http://128.248.174.241/) to visually annotate each gene in the human genome.

  19. Evaluation of clustering algorithms for gene expression data using gene ontology annotations

    Institute of Scientific and Technical Information of China (English)

    MA Ning; ZHANG Zheng-guo

    2012-01-01

    Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.

  20. Gene Cluster Statistics with Gene Families

    Science.gov (United States)

    Durand, Dannie

    2009-01-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such “gene clusters” is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  1. goSTAG: gene ontology subtrees to tag and annotate genes within a set.

    Science.gov (United States)

    Bennett, Brian D; Bushel, Pierre R

    2017-01-01

    Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. goSTAG converts gene lists from genomic analyses into biological themes

  2. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, D. [UCLA; Casero, D. [UCLA; Cokus, S. J. [UCLA; Merchant, S. S. [UCLA; Pellegrini, M. [UCLA

    2012-07-01

    The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.

  3. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

    Directory of Open Access Journals (Sweden)

    Garzón-Martínez Gina A

    2012-04-01

    Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the

  4. Annotation and Re-Sequencing of Genes from De Novo Transcriptome Assembly of Abies alba (Pinaceae

    Directory of Open Access Journals (Sweden)

    Anna M. Roschanski

    2013-01-01

    Full Text Available Premise of the study: We present a protocol for the annotation of transcriptome sequence data and the identification of candidate genes therein using the example of the nonmodel conifer Abies alba. Methods and Results: A normalized cDNA library was built from an A. alba seedling. The sequencing on a 454 platform yielded more than 1.5 million reads that were de novo assembled into 25 149 contigs. Two complementary approaches were applied to annotate gene fragments that code for (1 well-known proteins and (2 proteins that are potentially adaptively relevant. Primer development and testing yielded 88 amplicons that could successfully be resequenced from genomic DNA. Conclusions: The annotation workflow offers an efficient way to identify potential adaptively relevant genes from the large quantity of transcriptome sequence data. The primer set presented should be prioritized for single-nucleotide polymorphism detection in adaptively relevant genes in A. alba.

  5. High-performance web services for querying gene and variant annotation.

    Science.gov (United States)

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  6. SpectroGene: A Tool for Proteogenomic Annotations Using Top-Down Spectra.

    Science.gov (United States)

    Kolmogorov, Mikhail; Liu, Xiaowen; Pevzner, Pavel A

    2016-01-01

    In the past decade, proteogenomics has emerged as a valuable technique that contributes to the state-of-the-art in genome annotation; however, previous proteogenomic studies were limited to bottom-up mass spectrometry and did not take advantage of top-down approaches. We show that top-down proteogenomics allows one to address the problems that remained beyond the reach of traditional bottom-up proteogenomics. In particular, we show that top-down proteogenomics leads to the discovery of previously unannotated genes even in extensively studied bacterial genomes and present SpectroGene, a software tool for genome annotation using top-down tandem mass spectra. We further show that top-down proteogenomics searches (against the six-frame translation of a genome) identify nearly all proteoforms found in traditional top-down proteomics searches (against the annotated proteome). SpectroGene is freely available at http://github.com/fenderglass/SpectroGene .

  7. Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data

    Directory of Open Access Journals (Sweden)

    Kennedy Breandan

    2010-01-01

    Full Text Available Abstract Background The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data. Results Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation. Conclusions By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.

  8. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

    Science.gov (United States)

    Kuppuswamy, Usha; Ananthasubramanian, Seshan; Wang, Yanli; Balakrishnan, Narayanaswamy; Ganapathiraju, Madhavi K

    2014-04-03

    The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in

  9. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  10. PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.

    Science.gov (United States)

    Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A

    2015-01-01

    Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.

  11. PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data

    Directory of Open Access Journals (Sweden)

    Steven N. Hart

    2015-05-01

    Full Text Available Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations. Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user’s own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.

  12. CATH FunFHMMer web server: protein functional annotations using functional family assignments.

    Science.gov (United States)

    Das, Sayoni; Sillitoe, Ian; Lee, David; Lees, Jonathan G; Dawson, Natalie L; Ward, John; Orengo, Christine A

    2015-07-01

    The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.

  13. Re-annotation of the CAZy genes of Trichoderma reesei and transcription in the presence of lignocellulosic substrates

    Directory of Open Access Journals (Sweden)

    Häkkinen Mari

    2012-10-01

    Full Text Available Abstract Background Trichoderma reesei is a soft rot Ascomycota fungus utilised for industrial production of secreted enzymes, especially lignocellulose degrading enzymes. About 30 carbohydrate active enzymes (CAZymes of T. reesei have been biochemically characterised. Genome sequencing has revealed a large number of novel candidates for CAZymes, thus increasing the potential for identification of enzymes with novel activities and properties. Plenty of data exists on the carbon source dependent regulation of the characterised hydrolytic genes. However, information on the expression of the novel CAZyme genes, especially on complex biomass material, is very limited. Results In this study, the CAZyme gene content of the T. reesei genome was updated and the annotations of the genes refined using both computational and manual approaches. Phylogenetic analysis was done to assist the annotation and to identify functionally diversified CAZymes. The analyses identified 201 glycoside hydrolase genes, 22 carbohydrate esterase genes and five polysaccharide lyase genes. Updated or novel functional predictions were assigned to 44 genes, and the phylogenetic analysis indicated further functional diversification within enzyme families or groups of enzymes. GH3 β-glucosidases, GH27 α-galactosidases and GH18 chitinases were especially functionally diverse. The expression of the lignocellulose degrading enzyme system of T. reesei was studied by cultivating the fungus in the presence of different inducing substrates and by subjecting the cultures to transcriptional profiling. The substrates included both defined and complex lignocellulose related materials, such as pretreated bagasse, wheat straw, spruce, xylan, Avicel cellulose and sophorose. The analysis revealed co-regulated groups of CAZyme genes, such as genes induced in all the conditions studied and also genes induced preferentially by a certain set of substrates. Conclusions In this study, the CAZyme

  14. Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data.

    Directory of Open Access Journals (Sweden)

    Laura Miozzi

    Full Text Available BACKGROUND: High-throughput gene expression data can predict gene function through the "guilt by association" principle: coexpressed genes are likely to be functionally associated. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG, small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. CONCLUSIONS/SIGNIFICANCE: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.

  15. Guidelines for the functional annotation of microRNAs using the Gene Ontology.

    Science.gov (United States)

    Huntley, Rachael P; Sitnikov, Dmitry; Orlic-Milacic, Marija; Balakrishnan, Rama; D'Eustachio, Peter; Gillespie, Marc E; Howe, Doug; Kalea, Anastasia Z; Maegdefessel, Lars; Osumi-Sutherland, David; Petri, Victoria; Smith, Jennifer R; Van Auken, Kimberly; Wood, Valerie; Zampetaki, Anna; Mayr, Manuel; Lovering, Ruth C

    2016-05-01

    MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual). © 2016 Huntley et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  16. A relation based measure of semantic similarity for Gene Ontology annotations

    Directory of Open Access Journals (Sweden)

    Gaudin Benoit

    2008-11-01

    Full Text Available Abstract Background Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description. Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other. Results We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy. Conclusion We derive a measure of semantic

  17. CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.

    Science.gov (United States)

    Park, Julie; Costanzo, Maria C; Balakrishnan, Rama; Cherry, J Michael; Hong, Eurie L

    2012-01-01

    The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

  18. Canine candidate genes for dilated cardiomyopathy: annotation of and polymorphic markers for 14 genes

    Directory of Open Access Journals (Sweden)

    van Oost Bernard A

    2007-10-01

    Full Text Available Abstract Background Dilated cardiomyopathy is a myocardial disease occurring in humans and domestic animals and is characterized by dilatation of the left ventricle, reduced systolic function and increased sphericity of the left ventricle. Dilated cardiomyopathy has been observed in several, mostly large and giant, dog breeds, such as the Dobermann and the Great Dane. A number of genes have been identified, which are associated with dilated cardiomyopathy in the human, mouse and hamster. These genes mainly encode structural proteins of the cardiac myocyte. Results We present the annotation of, and marker development for, 14 of these genes of the dog genome, i.e. α-cardiac actin, caveolin 1, cysteine-rich protein 3, desmin, lamin A/C, LIM-domain binding factor 3, myosin heavy polypeptide 7, phospholamban, sarcoglycan δ, titin cap, α-tropomyosin, troponin I, troponin T and vinculin. A total of 33 Single Nucleotide Polymorphisms were identified for these canine genes and 11 polymorphic microsatellite repeats were developed. Conclusion The presented polymorphisms provide a tool to investigate the role of the corresponding genes in canine Dilated Cardiomyopathy by linkage analysis or association studies.

  19. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results......: We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of about 84% and specificity of about 97%. We additionally annotate three pairwise alignments of the more distantly related HIV1...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  20. Lynx web services for annotations and systems analysis of multi-gene disorders.

    Science.gov (United States)

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform.

  1. SelenoDB 2.0: annotation of selenoprotein genes in animals and their genetic diversity in humans.

    Science.gov (United States)

    Romagné, Frédéric; Santesmasses, Didac; White, Louise; Sarangi, Gaurab K; Mariotti, Marco; Hübler, Ron; Weihmann, Antje; Parra, Genís; Gladyshev, Vadim N; Guigó, Roderic; Castellano, Sergi

    2014-01-01

    SelenoDB (http://www.selenodb.org) aims to provide high-quality annotations of selenoprotein genes, proteins and SECIS elements. Selenoproteins are proteins that contain the amino acid selenocysteine (Sec) and the first release of the database included annotations for eight species. Since the release of SelenoDB 1.0 many new animal genomes have been sequenced. The annotations of selenoproteins in new genomes usually contain many errors in major databases. For this reason, we have now fully annotated selenoprotein genes in 58 animal genomes. We provide manually curated annotations for human selenoproteins, whereas we use an automatic annotation pipeline to annotate selenoprotein genes in other animal genomes. In addition, we annotate the homologous genes containing cysteine (Cys) instead of Sec. Finally, we have surveyed genetic variation in the annotated genes in humans. We use exon capture and resequencing approaches to identify single-nucleotide polymorphisms in more than 50 human populations around the world. We thus present a detailed view of the genetic divergence of Sec- and Cys-containing genes in animals and their diversity in humans. The addition of these datasets into the second release of the database provides a valuable resource for addressing medical and evolutionary questions in selenium biology.

  2. Gene fusions and gene duplications: relevance to genomic annotation and functional analysis

    Directory of Open Access Journals (Sweden)

    Riley Monica

    2005-03-01

    Full Text Available Abstract Background Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular proteins consist of two or more components (modules encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. Results Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. Conclusion The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes

  3. Algal functional annotation tool

    Energy Technology Data Exchange (ETDEWEB)

    2012-07-12

    Abstract BACKGROUND: Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION: The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes

  4. Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H

    2006-01-01

    This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci...

  5. Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval

    Directory of Open Access Journals (Sweden)

    Yuan Lei

    2012-05-01

    Full Text Available Abstract Background Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords. Results In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes. Conclusions We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.

  6. The Insect SNMP Gene Family

    Science.gov (United States)

    2009-01-01

    B 1 ( b o v ) Clade 3 - SNMPs Clade 2 Clade 1 CD36 Insect (Holometabola) CD36 Gene family Holometabola Phylogeny (11 Orders) Tribolium castaneum...melanogaster genes (see Nichols and Vogt, 2008). Bootstrap support (1000 replicates) is indicated for the major clades. B. Phylogeny of holometabolous...A. aegypti eggs were graciously provided by Mark Brown (University of Georgia, Department of Entomology) and raised on a larval diet (pond fish food

  7. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...... allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  8. Formal modeling of Gene Ontology annotation predictions based on factor graphs

    Science.gov (United States)

    Spetale, Flavio; Murillo, Javier; Tapia, Elizabeth; Arce, Débora; Ponce, Sergio; Bulacio, Pilar

    2016-04-01

    Gene Ontology (GO) is a hierarchical vocabulary for gene product annotation. Its synergy with machine learning classification methods has been widely used for the prediction of protein functions. Current classification methods rely on heuristic solutions to check the consistency with some aspects of the underlying GO structure. In this work we formalize the GO is-a relationship through predicate logic. Moreover, an ontology model based on Forney Factor Graph (FFG) is shown on a general fragment of Cellular Component GO.

  9. The plant ADH gene family.

    Science.gov (United States)

    Strommer, Judith

    2011-04-01

    The structures, evolution and functions of alcohol dehydrogenase gene families and their products have been scrutinized for half a century. Our understanding of the enzyme structure and catalytic activity of plant alcohol dehydrogenase (ADH-P) is based on the vast amount of information available for its animal counterpart. The probable origins of the enzyme from a simple β-coil and eventual emergence from a glutathione-dependent formaldehyde dehydrogenase have been well described. There is compelling evidence that the small ADH gene families found in plants today are the survivors of multiple rounds of gene expansion and contraction. To the probable original function of their products in the terminal reaction of anaerobic fermentation have been added roles in yeast-like aerobic fermentation and the production of characteristic scents that act to attract animals that serve as pollinators or agents of seed dispersal and to protect against herbivores.

  10. Annotating novel genes by integrating synthetic lethals and genomic information

    Directory of Open Access Journals (Sweden)

    Faty Mahamadou

    2008-01-01

    Full Text Available Abstract Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process.

  11. Assessing identity, redundancy and confounds in Gene Ontology annotations over time.

    Science.gov (United States)

    Gillis, Jesse; Pavlidis, Paul

    2013-02-15

    The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. Data available at http://chibi.ubc.ca/assessGO.

  12. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA

  13. Proteomic detection of non-annotated protein-coding genes in Pseudomonas fluorescens Pf0-1.

    Science.gov (United States)

    Kim, Wook; Silby, Mark W; Purvine, Sam O; Nicoll, Julie S; Hixson, Kim K; Monroe, Matt; Nicora, Carrie D; Lipton, Mary S; Levy, Stuart B

    2009-12-24

    Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.

  14. Proteomic detection of non-annotated protein-coding genes in Pseudomonas fluorescens Pf0-1.

    Directory of Open Access Journals (Sweden)

    Wook Kim

    Full Text Available Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.

  15. Proteomic Detection of Non-Annotated Protein-Coding Genes in Pseudomonas fluorescens Pf0-1

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Wook; Silby, Mark W.; Purvine, Samuel O.; Nicoll, Julie S.; Hixson, Kim K.; Monroe, Matthew E.; Nicora, Carrie D.; Lipton, Mary S.; Levy, Stuart B.

    2009-12-24

    Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of (possible) functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations which predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes which were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologues in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.

  16. Genome, functional gene annotation, and nuclear transformation of the heterokont oleaginous alga Nannochloropsis oceanica CCMP1779.

    Directory of Open Access Journals (Sweden)

    Astrid Vieler

    Full Text Available Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica-specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis

  17. Gene expression and functional annotation of the human and mouse choroid plexus epithelium.

    Directory of Open Access Journals (Sweden)

    Sarah F Janssen

    Full Text Available BACKGROUND: The choroid plexus epithelium (CPE is a lobed neuro-epithelial structure that forms the outer blood-brain barrier. The CPE protrudes into the brain ventricles and produces the cerebrospinal fluid (CSF, which is crucial for brain homeostasis. Malfunction of the CPE is possibly implicated in disorders like Alzheimer disease, hydrocephalus or glaucoma. To study human genetic diseases and potential new therapies, mouse models are widely used. This requires a detailed knowledge of similarities and differences in gene expression and functional annotation between the species. The aim of this study is to analyze and compare gene expression and functional annotation of healthy human and mouse CPE. METHODS: We performed 44k Agilent microarray hybridizations with RNA derived from laser dissected healthy human and mouse CPE cells. We functionally annotated and compared the gene expression data of human and mouse CPE using the knowledge database Ingenuity. We searched for common and species specific gene expression patterns and function between human and mouse CPE. We also made a comparison with previously published CPE human and mouse gene expression data. RESULTS: Overall, the human and mouse CPE transcriptomes are very similar. Their major functionalities included epithelial junctions, transport, energy production, neuro-endocrine signaling, as well as immunological, neurological and hematological functions and disorders. The mouse CPE presented two additional functions not found in the human CPE: carbohydrate metabolism and a more extensive list of (neural developmental functions. We found three genes specifically expressed in the mouse CPE compared to human CPE, being ACE, PON1 and TRIM3 and no human specifically expressed CPE genes compared to mouse CPE. CONCLUSION: Human and mouse CPE transcriptomes are very similar, and display many common functionalities. Nonetheless, we also identified a few genes and pathways which suggest that the CPE

  18. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  19. OAHG: an integrated resource for annotating human genes with multi-level ontologies

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-01-01

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e–16). PMID:27703231

  20. Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research.

    Science.gov (United States)

    Primmer, C R; Papakostas, S; Leder, E H; Davis, M J; Ragan, M A

    2013-06-01

    Recent advances in molecular technologies have opened up unprecedented opportunities for molecular ecologists to better understand the molecular basis of traits of ecological and evolutionary importance in almost any organism. Nevertheless, reliable and systematic inference of functionally relevant information from these masses of data remains challenging. The aim of this review is to highlight how the Gene Ontology (GO) database can be of use in resolving this challenge. The GO provides a largely species-neutral source of information on the molecular function, biological role and cellular location of tens of thousands of gene products. As it is designed to be species-neutral, the GO is well suited for cross-species use, meaning that, functional annotation derived from model organisms can be transferred to inferred orthologues in newly sequenced species. In other words, the GO can provide gene annotation information for species with nonannotated genomes. In this review, we describe the GO database, how functional information is linked with genes/gene products in model organisms, and how molecular ecologists can utilize this information to annotate their own data. Then, we outline various applications of GO for enhancing the understanding of molecular basis of traits in ecologically relevant species. We also highlight potential pitfalls, provide step-by-step recommendations for conducting a sound study in nonmodel organisms, suggest avenues for future research and outline a strategy for maximizing the benefits of a more ecological and evolutionary genomics-oriented ontology by ensuring its compatibility with the GO. © 2013 John Wiley & Sons Ltd.

  1. Identification of novel endogenous antisense transcripts by DNA microarray analysis targeting complementary strand of annotated genes

    Directory of Open Access Journals (Sweden)

    Kohama Chihiro

    2009-08-01

    Full Text Available Abstract Background Recent transcriptomic analyses in mammals have uncovered the widespread occurrence of endogenous antisense transcripts, termed natural antisense transcripts (NATs. NATs are transcribed from the opposite strand of the gene locus and are thought to control sense gene expression, but the mechanism of such regulation is as yet unknown. Although several thousand potential sense-antisense pairs have been identified in mammals, examples of functionally characterized NATs remain limited. To identify NAT candidates suitable for further functional analyses, we performed DNA microarray-based NAT screening using mouse adult normal tissues and mammary tumors to target not only the sense orientation but also the complementary strand of the annotated genes. Results First, we designed microarray probes to target the complementary strand of genes for which an antisense counterpart had been identified only in human public cDNA sources, but not in the mouse. We observed a prominent expression signal from 66.1% of 635 target genes, and 58 genes of these showed tissue-specific expression. Expression analyses of selected examples (Acaa1b and Aard confirmed their dynamic transcription in vivo. Although interspecies conservation of NAT expression was previously investigated by the presence of cDNA sources in both species, our results suggest that there are more examples of human-mouse conserved NATs that could not be identified by cDNA sources. We also designed probes to target the complementary strand of well-characterized genes, including oncogenes, and compared the expression of these genes between mammary cancerous tissues and non-pathological tissues. We found that antisense expression of 95 genes of 404 well-annotated genes was markedly altered in tumor tissue compared with that in normal tissue and that 19 of these genes also exhibited changes in sense gene expression. These results highlight the importance of NAT expression in the regulation

  2. Next-Generation Sequencing and the Crustacean Immune System: The Need for Alternatives in Immune Gene Annotation.

    Science.gov (United States)

    Clark, K F; Greenwood, Spencer J

    2016-12-01

    Next-generation sequencing has been a huge benefit to investigators studying non-model species. High-throughput gene expression studies, which were once restricted to animals with extensive genomic resources, can now be applied to any species. Transcriptomic studies using RNA-Seq can discover hundreds of thousands of transcripts from any species of interest. The power and limitation of these techniques is the sheer size of the dataset that is acquired. Parsing these large datasets is becoming easier as more bioinformatic tools are available for biologists without extensive computer programming expertise. Gene annotation and physiological pathway tools such as Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology enable the application of the vast amount of information acquired from model organisms to non-model species. While noble in nature, utilization of these tools can inadvertently misrepresent transcriptomic data from non-model species via annotation omission. Annotation followed by molecular pathway analysis highlights pathways that are disproportionately affected by disease, stress, or the physiological condition being examined. Problems occur when gene annotation procedures only recognizes a subset, often 50% or less, of the genes differently expressed from a non-model organisms. Annotated transcripts normally belong to highly conserved metabolic or regulatory genes that likely have a secondary or tertiary role, if any at all, in immunity. They appear to be disproportionately affected simply because conserved genes are most easily annotated. Evolutionarily induced specialization of physiological pathways is a driving force of adaptive evolution, but it results in genes that have diverged sufficiently to prevent their identification and annotation through conventional gene or protein databases. The purpose of this manuscript is to highlight some of the challenges faced when annotating crustacean immune genes by using an American lobster

  3. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  4. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  5. GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology.

    Science.gov (United States)

    Ramsak, Živa; Baebler, Špela; Rotter, Ana; Korbar, Matej; Mozetic, Igor; Usadel, Björn; Gruden, Kristina

    2014-01-01

    GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.

  6. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    Directory of Open Access Journals (Sweden)

    Khan Shafiq A

    2003-06-01

    Full Text Available Abstract Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells.

  7. Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture.

    Science.gov (United States)

    Rogozin, Igor B; Managadze, David; Shabalina, Svetlana A; Koonin, Eugene V

    2014-04-01

    The ortholog conjecture (OC), which is central to functional annotation of genomes, posits that orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of Gene Ontology (GO) annotations and expression profiles, among within-species paralogs compared with orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. However, several subsequent studies suggest that GO annotations and microarray data could artificially inflate functional similarity between paralogs from the same organism. We sought to test the OC using approaches distinct from those used in previous studies. Analysis of a large RNAseq data set from multiple human and mouse tissues shows that expression similarity (correlations coefficients, rank's, or Z-scores) between orthologs is substantially greater than that for between-species paralogs with the same sequence divergence, in agreement with the OC and the results of recent detailed analyses. These findings are further corroborated by a fine-grain analysis in which expression profiles of orthologs and paralogs were compared separately for individual gene families. Expression profiles of within-species paralogs are more strongly correlated than profiles of orthologs but it is shown that this is caused by high background noise, that is, correlation between profiles of unrelated genes in the same organism. Z-scores and rank scores show a nonmonotonic dependence of expression profile similarity on sequence divergence. This complexity of gene expression evolution after duplication might be at least partially caused by selection for protein dosage rebalancing following gene duplication.

  8. UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation.

    Directory of Open Access Journals (Sweden)

    Shaun D Jackman

    Full Text Available When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag". Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper.

  9. UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation

    Science.gov (United States)

    Jackman, Shaun D.; Bohlmann, Joerg; Birol, İnanç

    2015-01-01

    When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper. PMID:26020645

  10. Annotation and classification of the bovine T cell receptor delta genes.

    Science.gov (United States)

    Herzig, Carolyn T A; Lefranc, Marie-Paule; Baldwin, Cynthia L

    2010-02-09

    gammadelta T cells differ from alphabeta T cells with regard to the types of antigen with which their T cell receptors interact; gammadelta T cell antigens are not necessarily peptides nor are they presented on MHC. Cattle are considered a "gammadelta T cell high" species indicating they have an increased proportion of gammadelta T cells in circulation relative to that in "gammadelta T cell low" species such as humans and mice. Prior to the onset of the studies described here, there was limited information regarding the genes that code for the T cell receptor delta chains of this gammadelta T cell high species. By annotating the bovine (Bos taurus) genome Btau_3.1 assembly the presence of 56 distinct T cell receptor delta (TRD) variable (V) genes were found, 52 of which belong to the TRDV1 subgroup and were co-mingled with the T cell receptor alpha variable (TRAV) genes. In addition, two genes belonging to the TRDV2 subgroup and single TRDV3 and TRDV4 genes were found. We confirmed the presence of five diversity (D) genes, three junctional (J) genes and a single constant (C) gene and describe the organization of the TRD locus. The TRDV4 gene is found downstream of the C gene and in an inverted orientation of transcription, consistent with its orthologs in humans and mice. cDNA evidence was assessed to validate expression of the variable genes and showed that one to five D genes could be incorporated into a single transcript. Finally, we grouped the bovine and ovine TRDV1 genes into sets based on their relatedness. The bovine genome contains a large and diverse repertoire of TRD genes when compared to the genomes of "gammadelta T cell low" species. This suggests that in cattle gammadelta T cells play a more important role in immune function since they would be predicted to bind a greater variety of antigens.

  11. ShrimpGPAT: a gene and protein annotation tool for knowledge sharing and gene discovery in shrimp.

    Science.gov (United States)

    Korshkari, Parpakron; Vaiwsri, Sirintra; Flegel, Timothy W; Ngamsuriyaroj, Sudsanguan; Sonthayanon, Burachai; Prachumwat, Anuphap

    2014-06-21

    Although captured and cultivated marine shrimp constitute highly important seafood in terms of both economic value and production quantity, biologists have little knowledge of the shrimp genome and this partly hinders their ability to improve shrimp aquaculture. To help improve this situation, the Shrimp Gene and Protein Annotation Tool (ShrimpGPAT) was conceived as a community-based annotation platform for the acquisition and updating of full-length complementary DNAs (cDNAs), Expressed Sequence Tags (ESTs), transcript contigs and protein sequences of penaeid shrimp and their decapod relatives and for in-silico functional annotation and sequence analysis. ShrimpGPAT currently holds quality-filtered, molecular sequences of 14 decapod species (~500,000 records for six penaeid shrimp and eight other decapods). The database predominantly comprises transcript sequences derived by both traditional EST Sanger sequencing and more recently by massive-parallel sequencing technologies. The analysis pipeline provides putative functions in terms of sequence homologs, gene ontologies and protein-protein interactions. Data retrieval can be conducted easily either by a keyword text search or by a sequence query via BLAST, and users can save records of interest for later investigation using tools such as multiple sequence alignment and BLAST searches against pre-defined databases. In addition, ShrimpGPAT provides space for community insights by allowing functional annotation with tags and comments on sequences. Community-contributed information will allow for continuous database enrichment, for improvement of functions and for other aspects of sequence analysis. ShrimpGPAT is a new, free and easily accessed service for the shrimp research community that provides a comprehensive and up-to-date database of quality-filtered decapod gene and protein sequences together with putative functional prediction and sequence analysis tools. An important feature is its community

  12. Transcriptional dynamics of the developing sweet cherry (Prunus avium L.) fruit: sequencing, annotation and expression profiling of exocarp-associated genes.

    Science.gov (United States)

    Alkio, Merianne; Jonas, Uwe; Declercq, Myriam; Van Nocker, Steven; Knoche, Moritz

    2014-01-01

    The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., 'Regina'), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop.

  13. Maize microarray annotation database

    Directory of Open Access Journals (Sweden)

    Berger Dave K

    2011-10-01

    Full Text Available Abstract Background Microarray technology has matured over the past fifteen years into a cost-effective solution with established data analysis protocols for global gene expression profiling. The Agilent-016047 maize 44 K microarray was custom-designed from EST sequences, but only reporter sequences with EST accession numbers are publicly available. The following information is lacking: (a reporter - gene model match, (b number of reporters per gene model, (c potential for cross hybridization, (d sense/antisense orientation of reporters, (e position of reporter on B73 genome sequence (for eQTL studies, and (f functional annotations of genes represented by reporters. To address this, we developed a strategy to annotate the Agilent-016047 maize microarray, and built a publicly accessible annotation database. Description Genomic annotation of the 42,034 reporters on the Agilent-016047 maize microarray was based on BLASTN results of the 60-mer reporter sequences and their corresponding ESTs against the maize B73 RefGen v2 "Working Gene Set" (WGS predicted transcripts and the genome sequence. The agreement between the EST, WGS transcript and gDNA BLASTN results were used to assign the reporters into six genomic annotation groups. These annotation groups were: (i "annotation by sense gene model" (23,668 reporters, (ii "annotation by antisense gene model" (4,330; (iii "annotation by gDNA" without a WGS transcript hit (1,549; (iv "annotation by EST", in which case the EST from which the reporter was designed, but not the reporter itself, has a WGS transcript hit (3,390; (v "ambiguous annotation" (2,608; and (vi "inconclusive annotation" (6,489. Functional annotations of reporters were obtained by BLASTX and Blast2GO analysis of corresponding WGS transcripts against GenBank. The annotations are available in the Maize Microarray Annotation Database http://MaizeArrayAnnot.bi.up.ac.za/, as well as through a GBrowse annotation file that can be uploaded to

  14. YeastWeb: a workset-centric web resource for gene family analysis in yeast

    Directory of Open Access Journals (Sweden)

    Bao Haihua

    2010-07-01

    Full Text Available Abstract Background Currently, a number of yeast genomes with different physiological features have been sequenced and annotated, which provides invaluable information to investigate yeast genetics, evolutionary mechanism, structure and function of gene families. Description YeastWeb is a novel database created to provide access to gene families derived from the available yeast genomes by assigning the genes into putative families. It has many useful features that complement existing databases, such as SGD, CYGD and Génolevures: 1 Detailed computational annotation was conducted with each entry with InterProScan, EMBOSS and functional/pathway databases, such as GO, COG and KEGG; 2 A well established user-friendly environment was created to allow users to retrieve the annotated genes and gene families using functional classification browser, keyword search or similarity-based search; 3 Workset offers users many powerful functions to manage the retrieved data efficiently, associate the individual items easily and save the intermediate results conveniently; 4 A series of comparative genomics and molecular evolution analysis tools are neatly implemented to allow users to view multiple sequence alignments and phylogenetic tree of gene families. At present, YeastWeb holds the gene families clustered from various MCL inflation values from a total of 13 available yeast genomes. Conclusions Given the great interest in yeast research, YeastWeb has the potential to become a useful resource for the scientific community of yeast biologists and related researchers investigating the evolutionary relationship of yeast gene families. YeastWeb is available at http://centre.bioinformatics.zj.cn/Yeast/.

  15. The evolution of mammalian gene families.

    Directory of Open Access Journals (Sweden)

    Jeffery P Demuth

    Full Text Available Gene families are groups of homologous genes that are likely to have highly similar functions. Differences in family size due to lineage-specific gene duplication and gene loss may provide clues to the evolutionary forces that have shaped mammalian genomes. Here we analyze the gene families contained within the whole genomes of human, chimpanzee, mouse, rat, and dog. In total we find that more than half of the 9,990 families present in the mammalian common ancestor have either expanded or contracted along at least one lineage. Additionally, we find that a large number of families are completely lost from one or more mammalian genomes, and a similar number of gene families have arisen subsequent to the mammalian common ancestor. Along the lineage leading to modern humans we infer the gain of 689 genes and the loss of 86 genes since the split from chimpanzees, including changes likely driven by adaptive natural selection. Our results imply that humans and chimpanzees differ by at least 6% (1,418 of 22,000 genes in their complement of genes, which stands in stark contrast to the oft-cited 1.5% difference between orthologous nucleotide sequences. This genomic "revolving door" of gene gain and loss represents a large number of genetic differences separating humans from our closest relatives.

  16. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  17. EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

    Directory of Open Access Journals (Sweden)

    Hamilton John P

    2007-10-01

    Full Text Available Abstract Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1 the submission of gene annotation to an annotation project, 2 the review of the submitted models by project annotators, and 3 the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP, an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website http://rice.tigr.org, as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the

  18. Genome-wide annotation, expression profiling, and protein interaction studies of the core cell-cycle genes in Phalaenopsis aphrodite.

    Science.gov (United States)

    Lin, Hsiang-Yin; Chen, Jhun-Chen; Wei, Miao-Ju; Lien, Yi-Chen; Li, Huang-Hsien; Ko, Swee-Suak; Liu, Zin-Huang; Fang, Su-Chiung

    2014-01-01

    Orchidaceae is one of the most abundant and diverse families in the plant kingdom and its unique developmental patterns have drawn the attention of many evolutionary biologists. Particular areas of interest have included the co-evolution of pollinators and distinct floral structures, and symbiotic relationships with mycorrhizal flora. However, comprehensive studies to decipher the molecular basis of growth and development in orchids remain scarce. Cell proliferation governed by cell-cycle regulation is fundamental to growth and development of the plant body. We took advantage of recently released transcriptome information to systematically isolate and annotate the core cell-cycle regulators in the moth orchid Phalaenopsis aphrodite. Our data verified that Phalaenopsis cyclin-dependent kinase A (CDKA) is an evolutionarily conserved CDK. Expression profiling studies suggested that core cell-cycle genes functioning during the G1/S, S, and G2/M stages were preferentially enriched in the meristematic tissues that have high proliferation activity. In addition, subcellular localization and pairwise interaction analyses of various combinations of CDKs and cyclins, and of E2 promoter-binding factors and dimerization partners confirmed interactions of the functional units. Furthermore, our data showed that expression of the core cell-cycle genes was coordinately regulated during pollination-induced reproductive development. The data obtained establish a fundamental framework for study of the cell-cycle machinery in Phalaenopsis orchids.

  19. Lineage-specific expansion of IFIT gene family: an insight into coevolution with IFN gene family.

    Directory of Open Access Journals (Sweden)

    Ying Liu

    Full Text Available In mammals, IFIT (Interferon [IFN]-induced proteins with Tetratricopeptide Repeat [TPR] motifs family genes are involved in many cellular and viral processes, which are tightly related to mammalian IFN response. However, little is known about non-mammalian IFIT genes. In the present study, IFIT genes are identified in the genome databases from the jawed vertebrates including the cartilaginous elephant shark but not from non-vertebrates such as lancelet, sea squirt and acorn worm, suggesting that IFIT gene family originates from a vertebrate ancestor about 450 million years ago. IFIT family genes show conserved gene structure and gene arrangements. Phylogenetic analyses reveal that this gene family has expanded through lineage-specific and species-specific gene duplication. Interestingly, IFN gene family seem to share a common ancestor and a similar evolutionary mechanism; the function link of IFIT genes to IFN response is present early since the origin of both gene families, as evidenced by the finding that zebrafish IFIT genes are upregulated by fish IFNs, poly(I:C and two transcription factors IRF3/IRF7, likely via the IFN-stimulated response elements (ISRE within the promoters of vertebrate IFIT family genes. These coevolution features creates functional association of both family genes to fulfill a common biological process, which is likely selected by viral infection during evolution of vertebrates. Our results are helpful for understanding of evolution of vertebrate IFN system.

  20. FunnyBase: a systems level functional annotation of Fundulus ESTs for the analysis of gene expression

    Directory of Open Access Journals (Sweden)

    Kolell Kevin J

    2004-12-01

    Full Text Available Abstract Background While studies of non-model organisms are critical for many research areas, such as evolution, development, and environmental biology, they present particular challenges for both experimental and computational genomic level research. Resources such as mass-produced microarrays and the computational tools linking these data to functional annotation at the system and pathway level are rarely available for non-model species. This type of "systems-level" analysis is critical to the understanding of patterns of gene expression that underlie biological processes. Results We describe a bioinformatics pipeline known as FunnyBase that has been used to store, annotate, and analyze 40,363 expressed sequence tags (ESTs from the heart and liver of the fish, Fundulus heteroclitus. Primary annotations based on sequence similarity are linked to networks of systematic annotation in Gene Ontology (GO and the Kyoto Encyclopedia of Genes and Genomes (KEGG and can be queried and computationally utilized in downstream analyses. Steps are taken to ensure that the annotation is self-consistent and that the structure of GO is used to identify higher level functions that may not be annotated directly. An integrated framework for cDNA library production, sequencing, quality control, expression data generation, and systems-level analysis is presented and utilized. In a case study, a set of genes, that had statistically significant regression between gene expression levels and environmental temperature along the Atlantic Coast, shows a statistically significant (P Conclusion The methods described have application for functional genomics studies, particularly among non-model organisms. The web interface for FunnyBase can be accessed at http://genomics.rsmas.miami.edu/funnybase/super_craw4/. Data and source code are available by request at jpaschall@bioinfobase.umkc.edu.

  1. Evolution of the mammalian lysozyme gene family

    Directory of Open Access Journals (Sweden)

    Biegel Jason M

    2011-06-01

    Full Text Available Abstract Background Lysozyme c (chicken-type lysozyme has an important role in host defense, and has been extensively studied as a model in molecular biology, enzymology, protein chemistry, and crystallography. Traditionally, lysozyme c has been considered to be part of a small family that includes genes for two other proteins, lactalbumin, which is found only in mammals, and calcium-binding lysozyme, which is found in only a few species of birds and mammals. More recently, additional testes-expressed members of this family have been identified in human and mouse, suggesting that the mammalian lysozyme gene family is larger than previously known. Results Here we characterize the extent and diversity of the lysozyme gene family in the genomes of phylogenetically diverse mammals, and show that this family contains at least eight different genes that likely duplicated prior to the diversification of extant mammals. These duplicated genes have largely been maintained, both in intron-exon structure and in genomic context, throughout mammalian evolution. Conclusions The mammalian lysozyme gene family is much larger than previously appreciated and consists of at least eight distinct genes scattered around the genome. Since the lysozyme c and lactalbumin proteins have acquired very different functions during evolution, it is likely that many of the other members of the lysozyme-like family will also have diverse and unexpected biological properties.

  2. From manual curation to visualization of gene families and networks across Solanaceae plant species

    Science.gov (United States)

    Pujar, Anuradha; Menda, Naama; Bombarely, Aureliano; Edwards, Jeremy D.; Strickler, Susan R.; Mueller, Lukas A.

    2013-01-01

    High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information of the Solanaceae family and its closely related species that incorporates a community-based gene and phenotype curation system. In this article, we describe a manual curation system for gene families aimed at facilitating curation, querying and visualization of gene interaction patterns underlying complex biological processes, including an interface for efficiently capturing information from experiments with large data sets reported in the literature. Well-annotated multigene families are useful for further exploration of genome organization and gene evolution across species. As an example, we illustrate the system with the multigene transcription factor families, WRKY and Small Auxin Up-regulated RNA (SAUR), which both play important roles in responding to abiotic stresses in plants. Database URL: http://solgenomics.net/ PMID:23681907

  3. Building Bright Futures: An Annotated Bibliography on Substance Abuse Prevention for Families with Young Children.

    Science.gov (United States)

    Oshinsky, Carole J.; Goodman, Barbara; Woods, Tryon; Rosensweig, Marjorie A.

    There is growing consensus that when substance abuse prevention efforts reach children at an early age, they hold promise for reducing abuse in later years. This 87-item annotated bibliography on substance abuse prevention, developed by the National Program Office of Free to Grow in collaboration with the National Center for Children in Poverty,…

  4. Dynamic Actin Gene Family Evolution in Primates

    Directory of Open Access Journals (Sweden)

    Liucun Zhu

    2013-01-01

    Full Text Available Actin is one of the most highly conserved proteins and plays crucial roles in many vital cellular functions. In most eukaryotes, it is encoded by a multigene family. Although the actin gene family has been studied a lot, few investigators focus on the comparison of actin gene family in relative species. Here, the purpose of our study is to systematically investigate characteristics and evolutionary pattern of actin gene family in primates. We identified 233 actin genes in human, chimpanzee, gorilla, orangutan, gibbon, rhesus monkey, and marmoset genomes. Phylogenetic analysis showed that actin genes in the seven species could be divided into two major types of clades: orthologous group versus complex group. Codon usages and gene expression patterns of actin gene copies were highly consistent among the groups because of basic functions needed by the organisms, but much diverged within species due to functional diversification. Besides, many great potential pseudogenes were found with incomplete open reading frames due to frameshifts or early stop codons. These results implied that actin gene family in primates went through “birth and death” model of evolution process. Under this model, actin genes experienced strong negative selection and increased the functional complexity by reproducing themselves.

  5. FlyPhy: a phylogenomic analysis platform for Drosophila genes and gene families

    Directory of Open Access Journals (Sweden)

    Bao Qiyu

    2009-04-01

    Full Text Available Abstract Background The availability of 12 fully sequenced Drosophila species genomes provides an excellent opportunity to explore the evolutionary mechanism, structure and function of gene families in Drosophila. Currently, several important resources, such as FlyBase, FlyMine and DroSpeGe, have been devoted to integrating genetic, genomic, and functional data of Drosophila into a well-organized form. However, all of these resources are gene-centric and lack the information of the gene families in Drosophila. Description FlyPhy is a comprehensive phylogenomic analysis platform devoted to analyzing the genes and gene families in Drosophila. Genes were classified into families using a graph-based Markov Clustering algorithm and extensively annotated by a number of bioinformatic tools, such as basic sequence features, functional category, gene ontology terms, domain organization and sequence homolog to other databases. FlyPhy provides a simple and user-friendly web interface to allow users to browse and retrieve the information at multiple levels. An outstanding feature of the FlyPhy is that all the retrieved results can be added to a workset for further data manipulation. For the data stored in the workset, multiple sequence alignment, phylogenetic tree construction and visualization can be easily performed to investigate the sequence variation of each given family and to explore its evolutionary mechanism. Conclusion With the above functionalities, FlyPhy will be a useful resource and convenient platform for the Drosophila research community. The FlyPhy is available at http://bioinformatics.zj.cn/fly/.

  6. A database of annotated promoters of genes associated with common respiratory and related diseases

    KAUST Repository

    Chowdhary, Rajesh

    2012-07-01

    Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers. Copyright © 2012 by the American Thoracic Society.

  7. Gene expression and functional annotation of the human ciliary body epithelia.

    Directory of Open Access Journals (Sweden)

    Sarah F Janssen

    Full Text Available PURPOSE: The ciliary body (CB of the human eye consists of the non-pigmented (NPE and pigmented (PE neuro-epithelia. We investigated the gene expression of NPE and PE, to shed light on the molecular mechanisms underlying the most important functions of the CB. We also developed molecular signatures for the NPE and PE and studied possible new clues for glaucoma. METHODS: We isolated NPE and PE cells from seven healthy human donor eyes using laser dissection microscopy. Next, we performed RNA isolation, amplification, labeling and hybridization against 44×k Agilent microarrays. For microarray conformations, we used a literature study, RT-PCRs, and immunohistochemical stainings. We analyzed the gene expression data with R and with the knowledge database Ingenuity. RESULTS: The gene expression profiles and functional annotations of the NPE and PE were highly similar. We found that the most important functionalities of the NPE and PE were related to developmental processes, neural nature of the tissue, endocrine and metabolic signaling, and immunological functions. In total 1576 genes differed statistically significantly between NPE and PE. From these genes, at least 3 were cell-specific for the NPE and 143 for the PE. Finally, we observed high expression in the (NPE of 35 genes previously implicated in molecular mechanisms related to glaucoma. CONCLUSION: Our gene expression analysis suggested that the NPE and PE of the CB were quite similar. Nonetheless, cell-type specific differences were found. The molecular machineries of the human NPE and PE are involved in a range of neuro-endocrinological, developmental and immunological functions, and perhaps glaucoma.

  8. Functional annotation of the T-cell immunoglobulin mucin family in birds.

    Science.gov (United States)

    Hu, Tuanjun; Wu, Zhiguang; Vervelde, Lonneke; Rothwell, Lisa; Hume, David A; Kaiser, Pete

    2016-07-01

    T-cell immunoglobulin and mucin (TIM) family molecules are cell membrane proteins, preferentially expressed on various immune cells and implicated in recognition and clearance of apoptotic cells. Little is known of their function outside human and mouse, and nothing outside mammals. We identified only two TIM genes (chTIM) in the chicken genome, putative orthologues of mammalian TIM1 and TIM4, and cloned the respective cDNAs. Like mammalian TIM1, chTIM1 expression was restricted to lymphoid tissues and immune cells. The gene chTIM4 encodes at least five splice variants with distinct expression profiles that also varied between strains of chicken. Expression of chTIM4 was detected in myeloid antigen-presenting cells, and in γδ T cells, whereas mammalian TIM4 is not expressed in T cells. Like the mammalian proteins, chTIM1 and chTIM4 fusion proteins bind to phosphatidylserine, and are thereby implicated in recognition of apoptotic cells. The chTIM4-immunoglobulin fusion protein also had co-stimulatory activity on chicken T cells, suggesting a function in antigen presentation.

  9. IDconverter and IDClight: Conversion and annotation of gene and protein IDs

    Directory of Open Access Journals (Sweden)

    Díaz-Uriarte Ramón

    2007-01-01

    Full Text Available Abstract Background Researchers involved in the annotation of large numbers of gene, clone or protein identifiers are usually required to perform a one-by-one conversion for each identifier. When the field of research is one such as microarray experiments, this number may be around 30,000. Results To help researchers map accession numbers and identifiers among clones, genes, proteins and chromosomal positions, we have designed and developed IDconverter and IDClight. They are two user-friendly, freely available web server applications that also provide additional functional information by mapping the identifiers on to pathways, Gene Ontology terms, and literature references. Both tools are high-throughput oriented and include identifiers for the most common genomic databases. These tools have been compared to other similar tools, showing that they are among the fastest and the most up-to-date. Conclusion These tools provide a fast and intuitive way of enriching the information coming out of high-throughput experiments like microarrays. They can be valuable both to wet-lab researchers and to bioinformaticians.

  10. Multi-Trait GWAS and New Candidate Genes Annotation for Growth Curve Parameters in Brahman Cattle.

    Science.gov (United States)

    Crispim, Aline Camporez; Kelly, Matthew John; Guimarães, Simone Eliza Facioni; Fonseca e Silva, Fabyano; Fortes, Marina Rufino Salinas; Wenceslau, Raphael Rocha; Moore, Stephen

    2015-01-01

    Understanding the genetic architecture of beef cattle growth cannot be limited simply to the genome-wide association study (GWAS) for body weight at any specific ages, but should be extended to a more general purpose by considering the whole growth trajectory over time using a growth curve approach. For such an approach, the parameters that are used to describe growth curves were treated as phenotypes under a GWAS model. Data from 1,255 Brahman cattle that were weighed at birth, 6, 12, 15, 18, and 24 months of age were analyzed. Parameter estimates, such as mature weight (A) and maturity rate (K) from nonlinear models are utilized as substitutes for the original body weights for the GWAS analysis. We chose the best nonlinear model to describe the weight-age data, and the estimated parameters were used as phenotypes in a multi-trait GWAS. Our aims were to identify and characterize associated SNP markers to indicate SNP-derived candidate genes and annotate their function as related to growth processes in beef cattle. The Brody model presented the best goodness of fit, and the heritability values for the parameter estimates for mature weight (A) and maturity rate (K) were 0.23 and 0.32, respectively, proving that these traits can be a feasible alternative when the objective is to change the shape of growth curves within genetic improvement programs. The genetic correlation between A and K was -0.84, indicating that animals with lower mature body weights reached that weight at younger ages. One hundred and sixty seven (167) and two hundred and sixty two (262) significant SNPs were associated with A and K, respectively. The annotated genes closest to the most significant SNPs for A had direct biological functions related to muscle development (RAB28), myogenic induction (BTG1), fetal growth (IL2), and body weights (APEX2); K genes were functionally associated with body weight, body height, average daily gain (TMEM18), and skeletal muscle development (SMN1). Candidate

  11. Annotation of gene function in citrus using gene expression information and co-expression networks

    OpenAIRE

    Wong, Darren CJ; Sweetman, Crystal; Ford, Christopher M.

    2014-01-01

    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related bi...

  12. Annotation of gene function in citrus using gene expression information and co-expression networks

    OpenAIRE

    Wong, Darren CJ; Sweetman, Crystal; Ford, Christopher M.

    2014-01-01

    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related bi...

  13. Evolution of the Vertebrate Resistin Gene Family.

    Science.gov (United States)

    Hu, Qingda; Tan, Huanran; Irwin, David M

    2015-01-01

    Resistin (encoded by Retn) was previously identified in rodents as a hormone associated with diabetes; however human resistin is instead linked to inflammation. Resistin is a member of a small gene family that includes the resistin-like peptides (encoded by Retnl genes) in mammals. Genomic searches of available genome sequences of diverse vertebrates and phylogenetic analyses were conducted to determine the size and origin of the resistin-like gene family. Genes encoding peptides similar to resistin were found in Mammalia, Sauria, Amphibia, and Actinistia (coelacanth, a lobe-finned fish), but not in Aves or fish from Actinopterygii, Chondrichthyes, or Agnatha. Retnl originated by duplication and transposition from Retn on the early mammalian lineage after divergence of the platypus, but before the placental and marsupial mammal divergence. The resistin-like gene family illustrates an instance where the locus of origin of duplicated genes can be identified, with Retn continuing to reside at this location. Mammalian species typically have a single copy Retn gene, but are much more variable in their numbers of Retnl genes, ranging from 0 to 9. Since Retn is located at the locus of origin, thus likely retained the ancestral expression pattern, largely maintained its copy number, and did not display accelerated evolution, we suggest that it is more likely to have maintained an ancestral function, while Retnl, which transposed to a new location, displays accelerated evolution, and shows greater variability in gene number, including gene loss, likely evolved new, but potentially lineage-specific, functions.

  14. Evolution of the Vertebrate Resistin Gene Family.

    Directory of Open Access Journals (Sweden)

    Qingda Hu

    Full Text Available Resistin (encoded by Retn was previously identified in rodents as a hormone associated with diabetes; however human resistin is instead linked to inflammation. Resistin is a member of a small gene family that includes the resistin-like peptides (encoded by Retnl genes in mammals. Genomic searches of available genome sequences of diverse vertebrates and phylogenetic analyses were conducted to determine the size and origin of the resistin-like gene family. Genes encoding peptides similar to resistin were found in Mammalia, Sauria, Amphibia, and Actinistia (coelacanth, a lobe-finned fish, but not in Aves or fish from Actinopterygii, Chondrichthyes, or Agnatha. Retnl originated by duplication and transposition from Retn on the early mammalian lineage after divergence of the platypus, but before the placental and marsupial mammal divergence. The resistin-like gene family illustrates an instance where the locus of origin of duplicated genes can be identified, with Retn continuing to reside at this location. Mammalian species typically have a single copy Retn gene, but are much more variable in their numbers of Retnl genes, ranging from 0 to 9. Since Retn is located at the locus of origin, thus likely retained the ancestral expression pattern, largely maintained its copy number, and did not display accelerated evolution, we suggest that it is more likely to have maintained an ancestral function, while Retnl, which transposed to a new location, displays accelerated evolution, and shows greater variability in gene number, including gene loss, likely evolved new, but potentially lineage-specific, functions.

  15. Annotation of gene function in citrus using gene expression information and co-expression networks.

    Science.gov (United States)

    Wong, Darren C J; Sweetman, Crystal; Ford, Christopher M

    2014-07-15

    The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world's most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a "guilt-by-association" principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Integration of citrus gene co-expression networks, functional enrichment analysis and gene

  16. pathDIP: an annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis

    Science.gov (United States)

    Rahmati, Sara; Abovsky, Mark; Pastrello, Chiara; Jurisica, Igor

    2017-01-01

    Molecular pathway data are essential in current computational and systems biology research. While there are many primary and integrated pathway databases, several challenges remain, including low proteome coverage (57%), low overlap across different databases, unavailability of direct information about underlying physical connectivity of pathway members, and high fraction of protein-coding genes without any pathway annotations, i.e. ‘pathway orphans’. In order to address all these challenges, we developed pathDIP, which integrates data from 20 source pathway databases, ‘core pathways’, with physical protein–protein interactions to predict biologically relevant protein–pathway associations, referred to as ‘extended pathways’. Cross-validation determined 71% recovery rate of our predictions. Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86%, and provide novel annotations for 5732 pathway orphans. PathDIP (http://ophid.utoronto.ca/pathdip) annotates 17 070 protein-coding genes with 4678 pathways, and provides multiple query, analysis and output options. PMID:27899558

  17. Actin gene family in Branchiostoma belched

    Institute of Scientific and Technical Information of China (English)

    2016-01-01

    Actin is a highly conserved cytoskeletal protein that is found in essentially all eukaryotic cells,which plays a paramount role in several basic functions of the organism, such as the maintenance of cellshape, cell division, cell mobility and muscle contraction. However, little is known about actin gene family inChinese amphioxus (Branchiostoma belcheri). Here we systemically analyzed the actin genes family inBranchiostoma belched and found that amphioxus contains 33 actin genes. These genes have undergoneextensive expansion through tandem duplications by phylogenetic analysis. In addition, we also providedevidence indicating that actin genes have divergent functions by specializing their EST data in both Bran-chiostoma belched and Branchiostoma florida. Our results provided an alternative explanation for the evolu-tion of actin genes, and gave new insights into their functional roles.

  18. Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus

    Directory of Open Access Journals (Sweden)

    Ronan K. Carroll

    2016-02-01

    Full Text Available In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300, in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions.

  19. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  20. An Introduction to Genome Annotation.

    Science.gov (United States)

    Campbell, Michael S; Yandell, Mark

    2015-12-17

    Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation.

  1. Warehousing re-annotated cancer genes for biomarker meta-analysis.

    Science.gov (United States)

    Orsini, M; Travaglione, A; Capobianco, E

    2013-07-01

    Translational research in cancer genomics assigns a fundamental role to bioinformatics in support of candidate gene prioritization with regard to both biomarker discovery and target identification for drug development. Efforts in both such directions rely on the existence and constant update of large repositories of gene expression data and omics records obtained from a variety of experiments. Users who interactively interrogate such repositories may have problems in retrieving sample fields that present limited associated information, due for instance to incomplete entries or sometimes unusable files. Cancer-specific data sources present similar problems. Given that source integration usually improves data quality, one of the objectives is keeping the computational complexity sufficiently low to allow an optimal assimilation and mining of all the information. In particular, the scope of integrating intraomics data can be to improve the exploration of gene co-expression landscapes, while the scope of integrating interomics sources can be that of establishing genotype-phenotype associations. Both integrations are relevant to cancer biomarker meta-analysis, as the proposed study demonstrates. Our approach is based on re-annotating cancer-specific data available at the EBI's ArrayExpress repository and building a data warehouse aimed to biomarker discovery and validation studies. Cancer genes are organized by tissue with biomedical and clinical evidences combined to increase reproducibility and consistency of results. For better comparative evaluation, multiple queries have been designed to efficiently address all types of experiments and platforms, and allow for retrieval of sample-related information, such as cell line, disease state and clinical aspects.

  2. Protease gene families in Populus and Arabidopsis

    Directory of Open Access Journals (Sweden)

    Jansson Stefan

    2006-12-01

    Full Text Available Abstract Background Proteases play key roles in plants, maintaining strict protein quality control and degrading specific sets of proteins in response to diverse environmental and developmental stimuli. Similarities and differences between the proteases expressed in different species may give valuable insights into their physiological roles and evolution. Results We have performed a comparative analysis of protease genes in the two sequenced dicot genomes, Arabidopsis thaliana and Populus trichocarpa by using genes coding for proteases in the MEROPS database 1 for Arabidopsis to identify homologous sequences in Populus. A multigene-based phylogenetic analysis was performed. Most protease families were found to be larger in Populus than in Arabidopsis, reflecting recent genome duplication. Detailed studies on e.g. the DegP, Clp, FtsH, Lon, rhomboid and papain-Like protease families showed the pattern of gene family expansion and gene loss was complex. We finally show that different Populus tissues express unique suites of protease genes and that the mRNA levels of different classes of proteases change along a developmental gradient. Conclusion Recent gene family expansion and contractions have made the Arabidopsis and Populus complements of proteases different and this, together with expression patterns, gives indications about the roles of the individual gene products or groups of proteases.

  3. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  4. SAGExplore: a web server for unambiguous tag mapping in serial analysis of gene expression oriented to gene discovery and annotation.

    Science.gov (United States)

    Norambuena, Tomás; Malig, Rodrigo; Melo, Francisco

    2007-07-01

    We describe a web server for the accurate mapping of experimental tags in serial analysis of gene expression (SAGE). The core of the server relies on a database of genomic virtual tags built by a recently described method that attempts to reduce the amount of ambiguous assignments for those tags that are not unique in the genome. The method provides a complete annotation of potential virtual SAGE tags within a genome, along with an estimation of their confidence for experimental observation that ranks tags that present multiple matches in the genome. The output of the server consists of a table in HTML format that contains links to a graphic representation of the results and to some external servers and databases, facilitating the tasks of analysis of gene expression and gene discovery. Also, a table in tab delimited text format is produced, allowing the user to export the results into custom databases and software for further analysis. The current server version provides the most accurate and complete SAGE tag mapping source that is available for the yeast organism. In the near future, this server will also allow the accurate mapping of experimental SAGE-tags from other model organisms such as human, mouse, frog and fly. The server is freely available on the web at: http://dna.bio.puc.cl/SAGExplore.html.

  5. The Popeye domain-containing gene family.

    Science.gov (United States)

    Brand, Thomas

    2005-01-01

    The Popeye domain-containing gene family has been isolated on the basis of a subtractive screen aiming at the identification of novel genes with a heart-restricted gene expression pattern. The gene family codes for membrane proteins containing three transmembrane domains. The carboxy-terminal part of the protein is localized to the cytoplasm and contains a protein domain with high sequence conservation named the Popeye domain. This domain is involved in protein homo dimerization. The gene family is expressed in heart and skeletal muscle cells as well as smooth muscle cells. In addition, Popdc genes are expressed in other cell types such as neuronal cells in restricted areas of the brain, spinal cord, and dorsal root ganglia, and in various epithelial cells. Recently, it has been proposed that Popdc proteins may function as a novel family of adhesion proteins. That the expression pattern has been conserved during evolution and is very similar in all vertebrate classes and also in basal chordates suggests that Popdc proteins play an important role in cardiac and skeletal muscle.

  6. De novo assembly and annotation of the transcriptome of the agricultural weed Ipomoea purpurea uncovers gene expression changes associated with herbicide resistance.

    Science.gov (United States)

    Leslie, Trent; Baucom, Regina S

    2014-08-25

    Human-mediated selection can lead to rapid evolution in very short time scales, and the evolution of herbicide resistance in agricultural weeds is an excellent example of this phenomenon. The common morning glory, Ipomoea purpurea, is resistant to the herbicide glyphosate, but genetic investigations of this trait have been hampered by the lack of genomic resources for this species. Here, we present the annotated transcriptome of the common morning glory, Ipomoea purpurea, along with an examination of whole genome expression profiling to assess potential gene expression differences between three artificially selected herbicide resistant lines and three susceptible lines. The assembled Ipomoea transcriptome reported in this work contains 65,459 assembled transcripts, ~28,000 of which were functionally annotated by assignment to Gene Ontology categories. Our RNA-seq survey using this reference transcriptome identified 19 differentially expressed genes associated with resistance-one of which, a cytochrome P450, belongs to a large plant family of genes involved in xenobiotic detoxification. The differentially expressed genes also broadly implicated receptor-like kinases, which were down-regulated in the resistant lines, and other growth and defense genes, which were up-regulated in resistant lines. Interestingly, the target of glyphosate-EPSP synthase-was not overexpressed in the resistant Ipomoea lines as in other glyphosate resistant weeds. Overall, this work identifies potential candidate resistance loci for future investigations and dramatically increases genomic resources for this species. The assembled transcriptome presented herein will also provide a valuable resource to the Ipomoea community, as well as to those interested in utilizing the close relationship between the Convolvulaceae and the Solanaceae for phylogenetic and comparative genomics examinations.

  7. The human crystallin gene families

    Directory of Open Access Journals (Sweden)

    Wistow Graeme

    2012-12-01

    Full Text Available Abstract Crystallins are the abundant, long-lived proteins of the eye lens. The major human crystallins belong to two different superfamilies: the small heat-shock proteins (α-crystallins and the βγ-crystallins. During evolution, other proteins have sometimes been recruited as crystallins to modify the properties of the lens. In the developing human lens, the enzyme betaine-homocysteine methyltransferase serves such a role. Evolutionary modification has also resulted in loss of expression of some human crystallin genes or of specific splice forms. Crystallin organization is essential for lens transparency and mutations; even minor changes to surface residues can cause cataract and loss of vision.

  8. Reg gene family and human diseases

    Institute of Scientific and Technical Information of China (English)

    Yu-Wei Zhang; Liu-Song Ding; Mao-De Lai

    2003-01-01

    Regenerating gene (Reg or REG) family, within the superfamily of C-type lectin, is mainly involved in the liver,pancreatic, gastric and intestinal cell proliferation or differentiation. Considerable attention has focused on Reg family and its structurally related molecules. Over the last 15 years, 17 members of the Reg family have been cloned and sequenced. They have been considered as members of a conserved protein family sharing structural and some functional properties being involved in injury, inflammation,diabetes and carcinogenesis. We previously identified Reg Ⅳ as a strong candidate for a gene that was highly expressed in colorectal adenoma when compared to normal mucosa based on suppression subtractive hybridization (SSH),reverse Northern blot, semi-quantitative reverse transcriptase PCR (RT-PCR)and Northern blot. In situ hybridization results further support that overexpression of Reg Ⅳ may be an early event in colorectal carcinogenesis. We suggest that detection of Reg Ⅳ overexpression might be useful in the early diagnosis of carcinomatous transformation of adenoma.This review summarizes the roles of Reg family in diseases in the literature as well as our recent results of Reg Ⅳ in colorectal cancer. The biological properties of Reg family and its possible roles in human diseases are discussed. We particularly focus on the roles of Reg family as sensitive reactants of tissue injury, prognostic indicators of tumor survival and early biomarkers of carcinogenesis. In addition to our current understanding of Reg gene functions, we postulate that there might be relationships between Reg family and microsatellite instability, apoptosis and cancer with a poor prognosis. Investigation of the correlation between tumor Reg expression and survival rate, and analysis of the Reg gene status in human maliganancies, are required to elucidate the biologic consequences of Reg gene expression, the implications for Reg gene regulation of cell growth, tumorigenesis

  9. Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.

    Directory of Open Access Journals (Sweden)

    Nupoor Chowdhary

    Full Text Available Caldicellulosiruptor saccharolyticus has proven itself to be an excellent candidate for biological hydrogen (H2 production, but still it has major drawbacks like sensitivity to high osmotic pressure and low volumetric H2 productivity, which should be considered before it can be used industrially. A whole genome re-annotation work has been carried out as an attempt to update the incomplete genome information that causes gap in the knowledge especially in the area of metabolic engineering, to improve the H2 producing capabilities of C. saccharolyticus. Whole genome re-annotation was performed through manual means for 2,682 Coding Sequences (CDSs. Bioinformatics tools based on sequence similarity, motif search, phylogenetic analysis and fold recognition were employed for re-annotation. Our methodology could successfully add functions for 409 hypothetical proteins (HPs, 46 proteins previously annotated as putative and assigned more accurate functions for the known protein sequences. Homology based gene annotation has been used as a standard method for assigning function to novel proteins, but over the past few years many non-homology based methods such as genomic context approaches for protein function prediction have been developed. Using non-homology based functional prediction methods, we were able to assign cellular processes or physical complexes for 249 hypothetical sequences. Our re-annotation pipeline highlights the addition of 231 new CDSs generated from MicroScope Platform, to the original genome with functional prediction for 49 of them. The re-annotation of HPs and new CDSs is stored in the relational database that is available on the MicroScope web-based platform. In parallel, a comparative genome analyses were performed among the members of genus Caldicellulosiruptor to understand the function and evolutionary processes. Further, with results from integrated re-annotation studies (homology and genomic context approach, we strongly

  10. Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L..

    Directory of Open Access Journals (Sweden)

    Zhi Zou

    Full Text Available WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III. Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae, comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.

  11. Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L.).

    Science.gov (United States)

    Zou, Zhi; Yang, Lifu; Wang, Danhua; Huang, Qixing; Mo, Yeyong; Xie, Guishui

    2016-01-01

    WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae) genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III). Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae), comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.

  12. The glutamine synthetase gene family in Populus

    Directory of Open Access Journals (Sweden)

    Cánovas Francisco M

    2011-08-01

    Full Text Available Abstract Background Glutamine synthetase (GS; EC: 6.3.1.2, L-glutamate: ammonia ligase ADP-forming is a key enzyme in ammonium assimilation and metabolism of higher plants. The current work was undertaken to develop a more comprehensive understanding of molecular and biochemical features of GS gene family in poplar, and to characterize the developmental regulation of GS expression in various tissues and at various times during the poplar perennial growth. Results The GS gene family consists of 8 different genes exhibiting all structural and regulatory elements consistent with their roles as functional genes. Our results indicate that the family members are organized in 4 groups of duplicated genes, 3 of which code for cytosolic GS isoforms (GS1 and 1 which codes for the choroplastic GS isoform (GS2. Our analysis shows that Populus trichocarpa is the first plant species in which it was observed the complete GS family duplicated. Detailed expression analyses have revealed specific spatial and seasonal patterns of GS expression in poplar. These data provide insights into the metabolic function of GS isoforms in poplar and pave the way for future functional studies. Conclusions Our data suggest that GS duplicates could have been retained in order to increase the amount of enzyme in a particular cell type. This possibility could contribute to the homeostasis of nitrogen metabolism in functions associated to changes in glutamine-derived metabolic products. The presence of duplicated GS genes in poplar could also contribute to diversification of the enzymatic properties for a particular GS isoform through the assembly of GS polypeptides into homo oligomeric and/or hetero oligomeric holoenzymes in specific cell types.

  13. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    Science.gov (United States)

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  14. The human protein disulfide isomerase gene family

    Directory of Open Access Journals (Sweden)

    Galligan James J

    2012-07-01

    Full Text Available Abstract Enzyme-mediated disulfide bond formation is a highly conserved process affecting over one-third of all eukaryotic proteins. The enzymes primarily responsible for facilitating thiol-disulfide exchange are members of an expanding family of proteins known as protein disulfide isomerases (PDIs. These proteins are part of a larger superfamily of proteins known as the thioredoxin protein family (TRX. As members of the PDI family of proteins, all proteins contain a TRX-like structural domain and are predominantly expressed in the endoplasmic reticulum. Subcellular localization and the presence of a TRX domain, however, comprise the short list of distinguishing features required for gene family classification. To date, the PDI gene family contains 21 members, varying in domain composition, molecular weight, tissue expression, and cellular processing. Given their vital role in protein-folding, loss of PDI activity has been associated with the pathogenesis of numerous disease states, most commonly related to the unfolded protein response (UPR. Over the past decade, UPR has become a very attractive therapeutic target for multiple pathologies including Alzheimer disease, Parkinson disease, alcoholic and non-alcoholic liver disease, and type-2 diabetes. Understanding the mechanisms of protein-folding, specifically thiol-disulfide exchange, may lead to development of a novel class of therapeutics that would help alleviate a wide range of diseases by targeting the UPR.

  15. Basic Reference Sources in Population/Family Planning: An Annotated Bibliography, Number 2.

    Science.gov (United States)

    Walker, Richard L.

    This "Bibliography Series" is a project of the Carolina Population Center Library/Technical Information Service, University of North Carolina at Chapel Hill. It is intended as a vehicle for the dissemination of quality bibliographies on topics of current interest to librarians, researchers and students in the population/family planning field.…

  16. Genome Annotation Transfer Utility (GATU: rapid annotation of viral genomes using a closely related reference genome

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2006-06-01

    Full Text Available Abstract Background Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics – Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU, to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task. Results GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences. Conclusion GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome

  17. Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome.

    Science.gov (United States)

    Tcherepanov, Vasily; Ehlers, Angelika; Upton, Chris

    2006-06-13

    Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center http://www.biovirus.org and Viral Bioinformatics - Canada http://www.virology.ca, we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU), to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task. GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences. GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome annotations to the target genome. The program is freely

  18. An annotated checklist and a family key to the pseudoscorpion fauna (Arachnida: Pseudoscorpiones) of Sri Lanka.

    Science.gov (United States)

    Batuwita, Sudesh; Benjamin, Suresh P

    2014-06-06

    Sri Lanka is part of the Western Ghats & Sri Lanka biodiversity hotspot. Thus, the conservation of Sri Lanka's unique biodiversity is crucial. The current study is part of an ongoing survey of pseudoscorpion fauna of Sri Lanka. We carried out an island-wide survey of pseudoscorpions using a range of collection methods to sample a diverse set of habitats around the country. This produced 32 species, four of which might be new to science, belonging to 25 genera. The family Cheiridiidae was discovered on the island for the first time. One new combination, Indogarypus ceylonicus (Beier, 1973) comb. nov., is proposed. Out of the 47 species now recorded, 20 (43 %) are potentially endemic to Sri Lanka. We provide a checklist of all known species, document their distribution and give a key to the families.

  19. A Method of Gene-Function Annotation Based on Variable Precision Rough Sets

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    It is very important in the field of bioinformatics to apply computer to perform the function annotation for new sequenced bio-sequences. Based on GO database and BLAST program, a novel method for the function annotation of new biological sequences is presented by using the variable-precision rough set theory. The proposed method is applied to the real data in GO database to examine its effectiveness. Numerical results show that the proposed method has better precision, recall-rate and harmonic mean value compared with existing methods.

  20. Peptides encoded by short ORFs control development and define a new eukaryotic gene family.

    Directory of Open Access Journals (Sweden)

    Máximo Ibo Galindo

    2007-05-01

    Full Text Available Despite recent advances in developmental biology, and the sequencing and annotation of genomes, key questions regarding the organisation of cells into embryos remain. One possibility is that uncharacterised genes having nonstandard coding arrangements and functions could provide some of the answers. Here we present the characterisation of tarsal-less (tal, a new type of noncanonical gene that had been previously classified as a putative noncoding RNA. We show that tal controls gene expression and tissue folding in Drosophila, thus acting as a link between patterning and morphogenesis. tal function is mediated by several 33-nucleotide-long open reading frames (ORFs, which are translated into 11-amino-acid-long peptides. These are the shortest functional ORFs described to date, and therefore tal defines two novel paradigms in eukaryotic coding genes: the existence of short, unprocessed peptides with key biological functions, and their arrangement in polycistronic messengers. Our discovery of tal-related short ORFs in other species defines an ancient and noncanonical gene family in metazoans that represents a new class of eukaryotic genes. Our results open a new avenue for the annotation and functional analysis of genes and sequenced genomes, in which thousands of short ORFs are still uncharacterised.

  1. Annotation of a hybrid partial genome of the coffee rust (Hemileia vastatrix) contributes to the gene repertoire catalog of the Pucciniales.

    Science.gov (United States)

    Cristancho, Marco A; Botero-Rozo, David Octavio; Giraldo, William; Tabima, Javier; Riaño-Pachón, Diego Mauricio; Escobar, Carolina; Rozo, Yomara; Rivera, Luis F; Durán, Andrés; Restrepo, Silvia; Eilam, Tamar; Anikster, Yehoshua; Gaitán, Alvaro L

    2014-01-01

    Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333 Mb was built based on the 8 isolates; this assembly was used for subsequent analyses. Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3921 families were uncovered; a considerable proportion of the predicted proteins (73.8%) were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish races/isolates.

  2. Annotation of a hybrid partial genome of the Coffee Rust (Hemileia vastatrix contributes to the gene repertoire catalogue of the Pucciniales

    Directory of Open Access Journals (Sweden)

    Marco Aurelio Cristancho

    2014-10-01

    Full Text Available Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333Mb was built based on the 8 isolates; this assembly was used for subsequent analyses.Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3,921 families were uncovered; a considerable proportion of the predicted proteins (73.8% were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish

  3. Diversification of the ant odorant receptor gene family and positive selection on candidate cuticular hydrocarbon receptors.

    Science.gov (United States)

    Engsontia, Patamarerk; Sangket, Unitsa; Robertson, Hugh M; Satasook, Chutamas

    2015-08-27

    Chemical communication plays important roles in the social behavior of ants making them one of the most successful groups of animals on earth. However, the molecular evolutionary process responsible for their chemosensory adaptation is still elusive. Recent advances in genomic studies have led to the identification of large odorant receptor (Or) gene repertoires from ant genomes providing fruitful materials for molecular evolution analysis. The aim of this study was to test the hypothesis that diversification of this gene family is involved in olfactory adaptation of each species. We annotated the Or genes from the genome sequences of two leaf-cutter ants, Acromyrmex echinatior and Atta cephalotes (385 and 376 putative functional genes, respectively). These were used, together with Or genes from Camponotus floridanus, Harpegnathos saltator, Pogonomyrmex barbatus, Linepithema humile, Cerapachys biroi, Solenopsis invicta and Apis mellifera, in molecular evolution analysis. Like the Or family in other insects, ant Or genes evolve by the birth-and-death model of gene family evolution. Large gene family expansions involving tandem gene duplications, and gene gains outnumbering losses, are observed. Codon analysis of genes in lineage-specific expansion clades revealed signatures of positive selection on the candidate cuticular hydrocarbon receptor genes (9-exon subfamily) of Cerapachys biroi, Camponotus floridanus, Acromyrmex echinatior and Atta cephalotes. Positively selected amino acid positions are primarily in transmembrane domains 3 and 6, which are hypothesized to contribute to the odor-binding pocket, presumably mediating changing ligand specificity. This study provides support for the hypothesis that some ant lineage-specific Or genes have evolved under positive selection. Newly duplicated genes particularly in the candidate cuticular hydrocarbon receptor clade that have evolved under positive selection may contribute to the highly sophisticated lineage

  4. [The Hygienist Karl Roelcke, M.D. (1907-1982). Annotations to the family biography].

    Science.gov (United States)

    Mildenberger, Florian G

    2016-01-01

    Volker Roelcke, the well-known historian of medicine, wrote a biographical sketch on his father's role in National Socialism. Karl Roelcke (1907-1982) was an important hygienist at the University of Heidelberg and assistant to Ernst Rodenwaldt (1878-1965). Attempts to discuss the Nazi issue with his father directly ended unsuccessfully in the 1970s. In his essay of 2014, Volker Roelcke portrayed his father as quite sophisticated, but did not mention all aspects of his work. The present essay therefore offers new insights into the person of Karl Roelcke which are not constrained by family interests.

  5. Epigenomic annotation of gene regulatory alterations during evolution of the primate brain

    NARCIS (Netherlands)

    Vermunt, Marit W; Tan, Sander C; Castelijns, Bas; Geeven, Geert; Reinink, Peter; de Bruijn, Ewart; Kondova, Ivanela; Persengiev, Stephan; Bontrop, Ronald; Cuppen, Edwin; de Laat, Wouter; Creyghton, Menno P

    Although genome sequencing has identified numerous noncoding alterations between primate species, which of those are regulatory and potentially relevant to the evolution of the human brain is unclear. Here we annotated cis-regulatory elements (CREs) in the human, rhesus macaque and chimpanzee

  6. Epigenomic annotation of gene regulatory alterations during evolution of the primate brain

    NARCIS (Netherlands)

    Vermunt, Marit W.; Tan, Sander C.; Castelijns, Bas; Geeven, Geert; Reinink, Peter; de Bruijn, Ewart; Kondova, Ivanela; Persengiev, Stephan; Bontrop, Ronald; Cuppen, Edwin|info:eu-repo/dai/nl/183050487; de laat, Wouter|info:eu-repo/dai/nl/169934497; Creyghton, Menno P.

    2016-01-01

    Although genome sequencing has identified numerous noncoding alterations between primate species, which of those are regulatory and potentially relevant to the evolution of the human brain is unclear. Here we annotated cis-regulatory elements (CREs) in the human, rhesus macaque and chimpanzee genome

  7. Annotated English

    CERN Document Server

    Hernandez-Orallo, Jose

    2010-01-01

    This document presents Annotated English, a system of diacritical symbols which turns English pronunciation into a precise and unambiguous process. The annotations are defined and located in such a way that the original English text is not altered (not even a letter), thus allowing for a consistent reading and learning of the English language with and without annotations. The annotations are based on a set of general rules that make the frequency of annotations not dramatically high. This makes the reader easily associate annotations with exceptions, and makes it possible to shape, internalise and consolidate some rules for the English language which otherwise are weakened by the enormous amount of exceptions in English pronunciation. The advantages of this annotation system are manifold. Any existing text can be annotated without a significant increase in size. This means that we can get an annotated version of any document or book with the same number of pages and fontsize. Since no letter is affected, the ...

  8. Tumor suppressor genes in familial adenomatous polyposis.

    Science.gov (United States)

    Eshghifar, Nahal; Farrokhi, Naser; Naji, Tahereh; Zali, Mohammadreza

    2017-01-01

    Colorectal cancer (CRC) is mostly due to a series of genetic alterations that are being greatly under the influence of the environmental factors. These changes, mutational or epigenetic modifications at transcriptional forefront and/or post-transcriptional effects via miRNAs, include inactivation and the conversion of proto-oncogene to oncogenes, and/or inactivation of tumor suppressor genes (TSG). Here, a thorough review was carried out on the role of TSGs with the focus on the APC as the master regulator, mutated genes and mal-/dysfunctional pathways that lead to one type of hereditary form of the CRC; namely familial adenomatous polyposis (FAP). This review provides a venue towards defining candidate genes that can be used as new PCR-based markers for early diagnosis of FAP. In addition to diagnosis, defining the modes of genetic alterations will open door towards genome editing to either suppress the disease or reduce its progression during the course of action.

  9. Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation.

    Science.gov (United States)

    Rund, Samuel S C; Yoo, Boyoung; Alam, Camille; Green, Taryn; Stephens, Melissa T; Zeng, Erliang; George, Gary F; Sheppard, Aaron D; Duffield, Giles E; Milenković, Tijana; Pfrender, Michael E

    2016-08-18

    Marine and freshwater zooplankton exhibit daily rhythmic patterns of behavior and physiology which may be regulated directly by the light:dark (LD) cycle and/or a molecular circadian clock. One of the best-studied zooplankton taxa, the freshwater crustacean Daphnia, has a 24 h diel vertical migration (DVM) behavior whereby the organism travels up and down through the water column daily. DVM plays a critical role in resource tracking and the behavioral avoidance of predators and damaging ultraviolet radiation. However, there is little information at the transcriptional level linking the expression patterns of genes to the rhythmic physiology/behavior of Daphnia. Here we analyzed genome-wide temporal transcriptional patterns from Daphnia pulex collected over a 44 h time period under a 12:12 LD cycle (diel) conditions using a cosine-fitting algorithm. We used a comprehensive network modeling and analysis approach to identify novel co-regulated rhythmic genes that have similar network topological properties and functional annotations as rhythmic genes identified by the cosine-fitting analyses. Furthermore, we used the network approach to predict with high accuracy novel gene-function associations, thus enhancing current functional annotations available for genes in this ecologically relevant model species. Our results reveal that genes in many functional groupings exhibit 24 h rhythms in their expression patterns under diel conditions. We highlight the rhythmic expression of immunity, oxidative detoxification, and sensory process genes. We discuss differences in the chronobiology of D. pulex from other well-characterized terrestrial arthropods. This research adds to a growing body of literature suggesting the genetic mechanisms governing rhythmicity in crustaceans may be divergent from other arthropod lineages including insects. Lastly, these results highlight the power of using a network analysis approach to identify differential gene expression and provide novel

  10. Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data.

    Science.gov (United States)

    Gupta, Ravi; Wikramasinghe, Priyankara; Bhattacharyya, Anirban; Perez, Francisco A; Pal, Sharmistha; Davuluri, Ramana V

    2010-01-18

    Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context. We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters. We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The

  11. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS

    Directory of Open Access Journals (Sweden)

    Lawton Jennifer

    2012-03-01

    Full Text Available Abstract Background The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required. Results The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages. Conclusions In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family

  12. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  13. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes

    Science.gov (United States)

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-...

  14. The Pax gene family: Highlights from cephalopods

    Science.gov (United States)

    Baratte, Sébastien; Andouche, Aude; Bonnaud-Ponticelli, Laure

    2017-01-01

    Pax genes play important roles in Metazoan development. Their evolution has been extensively studied but Lophotrochozoa are usually omitted. We addressed the question of Pax paralog diversity in Lophotrochozoa by a thorough review of available databases. The existence of six Pax families (Pax1/9, Pax2/5/8, Pax3/7, Pax4/6, Paxβ, PoxNeuro) was confirmed and the lophotrochozoan Paxβ subfamily was further characterized. Contrary to the pattern reported in chordates, the Pax2/5/8 family is devoid of homeodomain in Lophotrochozoa. Expression patterns of the three main pax classes (pax2/5/8, pax3/7, pax4/6) during Sepia officinalis development showed that Pax roles taken as ancestral and common in metazoans are modified in S. officinalis, most likely due to either the morphological specificities of cephalopods or to their direct development. Some expected expression patterns were missing (e.g. pax6 in the developing retina), and some expressions in unexpected tissues have been found (e.g. pax2/5/8 in dermal tissue and in gills). This study underlines the diversity and functional plasticity of Pax genes and illustrates the difficulty of using probable gene homology as strict indicator of homology between biological structures. PMID:28253300

  15. Familial Hypercholesterolemia: The Lipids or the Genes?

    Directory of Open Access Journals (Sweden)

    Nemer Georges M

    2011-04-01

    Full Text Available Abstract Familial Hypercholesterolemia (FH is a common cause of premature cardiovascular disease and is often undiagnosed in young people. Although the disease is diagnosed clinically by high LDL cholesterol levels and family history, to date there are no single internationally accepted criteria for the diagnosis of FH. Several genes have been shown to be involved in FH; yet determining the implications of the different mutations on the phenotype remains a hard task. The polygenetic nature of FH is being enhanced by the discovery of new genes that serve as modifiers. Nevertheless, the picture is still unclear and many unknown genes contributing to the phenotype are most likely involved. Because of this evolving polygenetic nature, the diagnosis of FH by genetic testing is hampered by its cost and effectiveness. In this review, we reconsider the clinical versus genetic nomenclature of FH in the literature. After we describe each of the genetic causes of FH, we summarize the known correlation with phenotypic measures so far for each genetic defect. We then discuss studies from different populations on the genetic and clinical diagnoses of FH to draw helpful conclusions on cost-effectiveness and suggestions for diagnosis.

  16. Revisiting Vitis vinifera subtilase gene family: a possible role in grapevine resistance against Plasmopara viticola

    Directory of Open Access Journals (Sweden)

    Joana Figueiredo

    2016-11-01

    Full Text Available Subtilisin-like proteases, also known as subtilases, are a very diverse family of serine peptidases present in many organisms. In grapevine, there are hints of the involvement of subtilases in defence mechanisms, but their role is not yet understood. The first characterization of the subtilase gene family was performed in 2014. However, simultaneously, the grapevine genome was re-annotated and several sequences were re-annotated or retrieved. We have performed a re-characterization of this family in grapevine and identified 82 genes coding for 97 putative proteins, as result of alternative splicing. All the subtilases identified present the characteristic S8 peptidase domain and the majority of them also have a pro-domain I9 inhibitor, a protease-associated (PA domain and a signal peptide for targeting to the secretory pathway. Phylogenetic studies revealed six subtilase groups denominated VvSBT1 to VvSBT6. As several evidences have highlighted the participation of plant subtilases in response to biotic stimulus, we have investigated subtilase participation in grapevine resistance to Plasmopara viticola, the causative agent of downy mildew. Fourteen grapevine subtilases presenting either high homology to P69C from tomato, SBT3.3 from Arabidopsis thaliana or located near the Resistance to Plasmopara viticola (RPV locus were selected. Expression studies were conducted in the grapevine-P. viticola pathosystem with resistant and susceptible cultivars. Our results may indicate that some of grapevine subtilisins are potentially participating in the defence response against this biotrophic oomycete.

  17. Discovery of germline-related genes in Cephalochordate amphioxus: A genome wide survey using genome annotation and transcriptome data.

    Science.gov (United States)

    Yue, Jia-Xing; Li, Kun-Lung; Yu, Jr-Kai

    2015-12-01

    The generation of germline cells is a critical process in the reproduction of multicellular organisms. Studies in animal models have identified a common repertoire of genes that play essential roles in primordial germ cell (PGC) formation. However, comparative studies also indicate that the timing and regulation of this core genetic program vary considerably in different animals, raising the intriguing questions regarding the evolution of PGC developmental mechanisms in metazoans. Cephalochordates (commonly called amphioxus or lancelets) represent one of the invertebrate chordate groups and can provide important information about the evolution of developmental mechanisms in the chordate lineage. In this study, we used genome and transcriptome data to identify germline-related genes in two distantly related cephalochordate species, Branchiostoma floridae and Asymmetron lucayanum. Branchiostoma and Asymmetron diverged more than 120 MYA, and the most conspicuous difference between them is their gonadal morphology. We used important germline developmental genes in several model animals to search the amphioxus genome and transcriptome dataset for conserved homologs. We also annotated the assembled transcriptome data using Gene Ontology (GO) terms to facilitate the discovery of putative genes associated with germ cell development and reproductive functions in amphioxus. We further confirmed the expression of 14 genes in developing oocytes or mature eggs using whole mount in situ hybridization, suggesting their potential functions in amphioxus germ cell development. The results of this global survey provide a useful resource for testing potential functions of candidate germline-related genes in cephalochordates and for investigating differences in gonad developmental mechanisms between Branchiostoma and Asymmetron species.

  18. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  19. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.

    Science.gov (United States)

    Köhler, Sebastian; Doelken, Sandra C; Ruef, Barbara J; Bauer, Sebastian; Washington, Nicole; Westerfield, Monte; Gkoutos, George; Schofield, Paul; Smedley, Damian; Lewis, Suzanna E; Robinson, Peter N; Mungall, Christopher J

    2013-01-01

    Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  20. Annotation of Ehux ESTs

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-06-12

    22 percent ESTs do no align with scaffolds. EST Pipeleine assembles 17126 consensi from the noaligned ESTs. Annotation Pipeline predicts 8564 ORFS on the consensi. Domain analysis of ORFs reveals missing genes. Cluster analysis reveals missing genes. Expression analysis reveals potential strain specific genes.

  1. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution.

  2. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D.

    2017-01-01

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new ‘hierarchical view’ of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. PMID:27899595

  3. Improving the gene structure annotation of the apicomplexan parasite Neospora caninum fulfils a vital requirement towards an in silico-derived vaccine.

    Science.gov (United States)

    Goodswen, Stephen J; Barratt, Joel L N; Kennedy, Paul J; Ellis, John T

    2015-04-01

    Neospora caninum is an apicomplexan parasite which can cause abortion in cattle, instigating major economic burden. Vaccination has been proposed as the most cost-effective control measure to alleviate this burden. Consequently the overriding aspiration for N. caninum research is the identification and subsequent evaluation of vaccine candidates in animal models. To save time, cost and effort, it is now feasible to use an in silico approach for vaccine candidate prediction. Precise protein sequences, derived from the correct open reading frame, are paramount and arguably the most important factor determining the success or failure of this approach. The challenge is that publicly available N. caninum sequences are mostly derived from gene predictions. Annotated inaccuracies can lead to erroneously predicted vaccine candidates by bioinformatics programs. This study evaluates the current N. caninum annotation for potential inaccuracies. Comparisons with annotation from a closely related pathogen, Toxoplasma gondii, are also made to distinguish patterns of inconsistency. More importantly, a mRNA sequencing (RNA-Seq) experiment is used to validate the annotation. Potential discrepancies originating from a questionable start codon context and exon boundaries were identified in 1943 protein coding sequences. We conclude, where experimental data were available, that the majority of N. caninum gene sequences were reliably predicted. Nevertheless, almost 28% of genes were identified as questionable. Given the limitations of RNA-Seq, the intention of this study was not to replace the existing annotation but to support or oppose particular aspects of it. Ideally, many studies aimed at improving the annotation are required to build a consensus. We believe this study, in providing a new resource on gene structure and annotation, is a worthy contributor to this endeavour.

  4. Annotated genetic linkage maps of Pinus pinaster Ait. from a Central Spain population using microsatellite and gene based markers

    Directory of Open Access Journals (Sweden)

    de Miguel Marina

    2012-10-01

    Full Text Available Abstract Background Pinus pinaster Ait. is a major resin producing species in Spain. Genetic linkage mapping can facilitate marker-assisted selection (MAS through the identification of Quantitative Trait Loci and selection of allelic variants of interest in breeding populations. In this study, we report annotated genetic linkage maps for two individuals (C14 and C15 belonging to a breeding program aiming to increase resin production. We use different types of DNA markers, including last-generation molecular markers. Results We obtained 13 and 14 linkage groups for C14 and C15 maps, respectively. A total of 211 and 215 markers were positioned on each map and estimated genome length was between 1,870 and 2,166 cM respectively, which represents near 65% of genome coverage. Comparative mapping with previously developed genetic linkage maps for P. pinaster based on about 60 common markers enabled aligning linkage groups to this reference map. The comparison of our annotated linkage maps and linkage maps reporting QTL information revealed 11 annotated SNPs in candidate genes that co-localized with previously reported QTLs for wood properties and water use efficiency. Conclusions This study provides genetic linkage maps from a Spanish population that shows high levels of genetic divergence with French populations from which segregating progenies have been previously mapped. These genetic maps will be of interest to construct a reliable consensus linkage map for the species. The importance of developing functional genetic linkage maps is highlighted, especially when working with breeding populations for its future application in MAS for traits of interest.

  5. Developmental gene discovery in a hemimetabolous insect: de novo assembly and annotation of a transcriptome for the cricket Gryllus bimaculatus.

    Directory of Open Access Journals (Sweden)

    Victor Zeng

    Full Text Available Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects, representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket, a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in

  6. The tomato cis-prenyltransferase gene family.

    Science.gov (United States)

    Akhtar, Tariq A; Matsuba, Yuki; Schauvinhold, Ines; Yu, Geng; Lees, Hazel A; Klein, Samuel E; Pichersky, Eran

    2013-02-01

    cis-prenyltransferases (CPTs) are predicted to be involved in the synthesis of long-chain polyisoprenoids, all with five or more isoprene (C5) units. Recently, we identified a short-chain CPT, neryl diphosphate synthase (NDPS1), in tomato (Solanum lycopersicum). Here, we searched the tomato genome and identified and characterized its entire CPT gene family, which comprises seven members (SlCPT1-7, with NDPS1 designated as SlCPT1). Six of the SlCPT genes encode proteins with N-terminal targeting sequences, which, when fused to GFP, mediated GFP transport to the plastids of Arabidopsis protoplasts. The SlCPT3-GFP fusion protein was localized to the cytosol. Enzymatic characterization of recombinant SlCPT proteins demonstrated that SlCPT6 produces Z,Z-FPP, and SlCPT2 catalyzes the formation of nerylneryl diphosphate while SlCPT4, SlCPT5 and SlCPT7 synthesize longer-chain products (C25-C55). Although no in vitro activity was demonstrated for SlCPT3, its expression in the Saccharomyces cerevisiae dolichol biosynthesis mutant (rer2) complemented the temperature-sensitive growth defect. Transcripts of SlCPT2, SlCPT4, SlCPT5 and SlCPT7 are present at low levels in multiple tissues, SlCPT6 is exclusively expressed in red fruit and roots, and SlCPT1, SlCPT3 and SlCPT7 are highly expressed in trichomes. RNAi-mediated suppression of NDPS1 led to a large decrease in β-phellandrene (which is produced from neryl diphosphate), with greater reductions achieved with the general 35S promoter compared to the trichome-specific MKS1 promoter. Phylogenetic analysis revealed CPT gene families in both eudicots and monocots, and showed that all the short-chain CPT genes from tomato (SlCPT1, SlCPT2 and SlCPT6) are closely linked to terpene synthase gene clusters.

  7. De novo assembly, functional annotation and comparative analysis of Withania somnifera leaf and root transcriptomes to identify putative genes involved in the withanolides biosynthesis.

    Science.gov (United States)

    Gupta, Parul; Goel, Ridhi; Pathak, Sumya; Srivastava, Apeksha; Singh, Surya Pratap; Sangwan, Rajender Singh; Asif, Mehar Hasan; Trivedi, Prabodh Kumar

    2013-01-01

    Withania somnifera is one of the most valuable medicinal plants used in Ayurvedic and other indigenous medicine systems due to bioactive molecules known as withanolides. As genomic information regarding this plant is very limited, little information is available about biosynthesis of withanolides. To facilitate the basic understanding about the withanolide biosynthesis pathways, we performed transcriptome sequencing for Withania leaf (101L) and root (101R) which specifically synthesize withaferin A and withanolide A, respectively. Pyrosequencing yielded 8,34,068 and 7,21,755 reads which got assembled into 89,548 and 1,14,814 unique sequences from 101L and 101R, respectively. A total of 47,885 (101L) and 54,123 (101R) could be annotated using TAIR10, NR, tomato and potato databases. Gene Ontology and KEGG analyses provided a detailed view of all the enzymes involved in withanolide backbone synthesis. Our analysis identified members of cytochrome P450, glycosyltransferase and methyltransferase gene families with unique presence or differential expression in leaf and root and might be involved in synthesis of tissue-specific withanolides. We also detected simple sequence repeats (SSRs) in transcriptome data for use in future genetic studies. Comprehensive sequence resource developed for Withania, in this study, will help to elucidate biosynthetic pathway for tissue-specific synthesis of secondary plant products in non-model plant organisms as well as will be helpful in developing strategies for enhanced biosynthesis of withanolides through biotechnological approaches.

  8. De novo assembly, functional annotation and comparative analysis of Withania somnifera leaf and root transcriptomes to identify putative genes involved in the withanolides biosynthesis.

    Directory of Open Access Journals (Sweden)

    Parul Gupta

    Full Text Available Withania somnifera is one of the most valuable medicinal plants used in Ayurvedic and other indigenous medicine systems due to bioactive molecules known as withanolides. As genomic information regarding this plant is very limited, little information is available about biosynthesis of withanolides. To facilitate the basic understanding about the withanolide biosynthesis pathways, we performed transcriptome sequencing for Withania leaf (101L and root (101R which specifically synthesize withaferin A and withanolide A, respectively. Pyrosequencing yielded 8,34,068 and 7,21,755 reads which got assembled into 89,548 and 1,14,814 unique sequences from 101L and 101R, respectively. A total of 47,885 (101L and 54,123 (101R could be annotated using TAIR10, NR, tomato and potato databases. Gene Ontology and KEGG analyses provided a detailed view of all the enzymes involved in withanolide backbone synthesis. Our analysis identified members of cytochrome P450, glycosyltransferase and methyltransferase gene families with unique presence or differential expression in leaf and root and might be involved in synthesis of tissue-specific withanolides. We also detected simple sequence repeats (SSRs in transcriptome data for use in future genetic studies. Comprehensive sequence resource developed for Withania, in this study, will help to elucidate biosynthetic pathway for tissue-specific synthesis of secondary plant products in non-model plant organisms as well as will be helpful in developing strategies for enhanced biosynthesis of withanolides through biotechnological approaches.

  9. Annotation Of Novel And Conserved MicroRNA Genes In The Build 10 Sus scrofa Reference Genome And Determination Of Their Expression Levels In Ten Different Tissues

    DEFF Research Database (Denmark)

    Thomsen, Bo; Nielsen, Mathilde; Hedegaard, Jakob

    The DNA template used in the pig genome sequencing project was provided by a Duroc pig named TJ Tabasco. In an effort to annotate microRNA (miRNA) genes in the reference genome we have conducted deep sequencing to determine the miRNA transcriptomes in ten different tissues isolated from Pinky......, a genetically identical clone of TJ Tabasco. The purpose was to generate miRNA sequences that are highly homologous to the reference genome sequence, which along with computational prediction will improve confidence in the genomic annotation of miRNA genes. Based on homology searches of the sequence data...

  10. Annotated Gene and Proteome Data Support Recognition of Interconnections Between the Results of Different Experiments in Space Research

    Science.gov (United States)

    Bauer, Johann; Wehland, Markus; Pietsch, Jessica; Sickmann, Albert; Weber, Gerhard; Grimm, Daniela

    2016-06-01

    In a series of studies, human thyroid and endothelial cells exposed to real or simulated microgravity were analyzed in terms of changes in gene expression patterns or protein content. Due to the limitation of available cells in many space research experiments, comparative and control experiments had to be done in a serial manner. Therefore, detected genes or proteins were annotated with gene names and SwissProt numbers, in order to allow searches for interconnections between results obtained in different experiments by different methods. A crosscheck of several studies on the behavior of cytoskeletal genes and proteins suggested that clusters of cytoskeletal components change differently under the influence of microgravity and/or vibration in different cell types. The result that LOX and ISG15 gene expression were clearly altered during the Shenzhou-8 spaceflight mission could be estimated by comparison with the results of other experiments. The more than 100-fold down-regulation of LOX supports our hypothesis that the amount and stability of extracellular matrix have a great influence on the formation of three-dimensional aggregates under microgravity. The approximately 40-fold up-regulation of ISG15 cannot yet be explained in detail, but strongly suggests that ISGylation, an alternative form of posttranslational modification, plays a role in longterm cultures.

  11. WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata.

    Science.gov (United States)

    Putman, Tim E; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Diesh, Colin; Dunn, Nathan; Munoz-Torres, Monica; Stupp, Gregory S; Wu, Chunlei; Su, Andrew I; Good, Benjamin M

    2017-01-01

    With the advancement of genome-sequencing technologies, new genomes are being sequenced daily. Although these sequences are deposited in publicly available data warehouses, their functional and genomic annotations (beyond genes which are predicted automatically) mostly reside in the text of primary publications. Professional curators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don't exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner. Here, we describe WikiGenomes (wikigenomes.org), a web application that facilitates the consumption and curation of genomic data by the entire scientific community. WikiGenomes is based on Wikidata, an openly editable knowledge graph with the goal of aggregating published knowledge into a free and open database. WikiGenomes empowers the individual genomic researcher to contribute their expertise to the curation effort and integrates the knowledge into Wikidata, enabling it to be accessed by anyone without restriction. www.wikigenomes.org.

  12. Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    Energy Technology Data Exchange (ETDEWEB)

    Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O' Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O.; Barrero, Roberto A.; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A.; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; de Fatima Bonaldo, Maria; Bono Hidemasa; Bromberg, Susan K.; Brookes, Anthony J.; Bruford, Elspeth; Carninci Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; Gopinath, Gopal R.; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba Rie; et al.

    2004-01-15

    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4 percent of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5 percent of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for nonprotein-coding RNA

  13. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Directory of Open Access Journals (Sweden)

    Tadashi Imanishi

    2004-06-01

    Full Text Available The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/. It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs, identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA

  14. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Dan-Dan Wei

    Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying

  15. Genomic organization, annotation, and ligand-receptor inferences of chicken chemokines and chemokine receptor genes based on comparative genomics

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2005-03-01

    Full Text Available Abstract Background Chemokines and their receptors play important roles in host defense, organogenesis, hematopoiesis, and neuronal communication. Forty-two chemokines and 19 cognate receptors have been found in the human genome. Prior to this report, only 11 chicken chemokines and 7 receptors had been reported. The objectives of this study were to systematically identify chicken chemokines and their cognate receptor genes in the chicken genome and to annotate these genes and ligand-receptor binding by a comparative genomics approach. Results Twenty-three chemokine and 14 chemokine receptor genes were identified in the chicken genome. All of the chicken chemokines contained a conserved CC, CXC, CX3C, or XC motif, whereas all the chemokine receptors had seven conserved transmembrane helices, four extracellular domains with a conserved cysteine, and a conserved DRYLAIV sequence in the second intracellular domain. The number of coding exons in these genes and the syntenies are highly conserved between human, mouse, and chicken although the amino acid sequence homologies are generally low between mammalian and chicken chemokines. Chicken genes were named with the systematic nomenclature used in humans and mice based on phylogeny, synteny, and sequence homology. Conclusion The independent nomenclature of chicken chemokines and chemokine receptors suggests that the chicken may have ligand-receptor pairings similar to mammals. All identified chicken chemokines and their cognate receptors were identified in the chicken genome except CCR9, whose ligand was not identified in this study. The organization of these genes suggests that there were a substantial number of these genes present before divergence between aves and mammals and more gene duplications of CC, CXC, CCR, and CXCR subfamilies in mammals than in aves after the divergence.

  16. MADS-box gene family in rice: genome-wide identification, organization and expression profiling during reproductive development and stress.

    Science.gov (United States)

    Arora, Rita; Agarwal, Pinky; Ray, Swatismita; Singh, Ashok Kumar; Singh, Vijay Pal; Tyagi, Akhilesh K; Kapoor, Sanjay

    2007-07-18

    MADS-box transcription factors, besides being involved in floral organ specification, have also been implicated in several aspects of plant growth and development. In recent years, there have been reports on genomic localization, protein motif structure, phylogenetic relationships, gene structure and expression of the entire MADS-box family in the model plant system, Arabidopsis. Though there have been some studies in rice as well, an analysis of the complete MADS-box family along with a comprehensive expression profiling was still awaited after the completion of rice genome sequencing. Furthermore, owing to the role of MADS-box family in flower development, an analysis involving structure, expression and functional aspects of MADS-box genes in rice and Arabidopsis was required to understand the role of this gene family in reproductive development. A genome-wide molecular characterization and microarray-based expression profiling of the genes encoding MADS-box transcription factor family in rice is presented. Using a thorough annotation exercise, 75 MADS-box genes have been identified in rice and categorized into MIKCc, MIKC*, Malpha, Mbeta and Mgamma groups based on phylogeny. Chromosomal localization of these genes reveals that 16 MADS-box genes, mostly MIKCc-type, are located within the duplicated segments of the rice genome, whereas most of the M-type genes, 20 in all, seem to have resulted from tandem duplications. Nine members belonging to the Mbeta group, which was considered absent in monocots, have also been identified. The expression profiles of all the MADS-box genes have been analyzed under 11 temporal stages of panicle and seed development, three abiotic stress conditions, along with three stages of vegetative development. Transcripts for 31 genes accumulate preferentially in the reproductive phase, of which, 12 genes are specifically expressed in seeds, and six genes show expression specific to panicle development. Differential expression of seven

  17. MADS-box gene family in rice: genome-wide identification, organization and expression profiling during reproductive development and stress

    Directory of Open Access Journals (Sweden)

    Tyagi Akhilesh K

    2007-07-01

    Full Text Available Abstract Background MADS-box transcription factors, besides being involved in floral organ specification, have also been implicated in several aspects of plant growth and development. In recent years, there have been reports on genomic localization, protein motif structure, phylogenetic relationships, gene structure and expression of the entire MADS-box family in the model plant system, Arabidopsis. Though there have been some studies in rice as well, an analysis of the complete MADS-box family along with a comprehensive expression profiling was still awaited after the completion of rice genome sequencing. Furthermore, owing to the role of MADS-box family in flower development, an analysis involving structure, expression and functional aspects of MADS-box genes in rice and Arabidopsis was required to understand the role of this gene family in reproductive development. Results A genome-wide molecular characterization and microarray-based expression profiling of the genes encoding MADS-box transcription factor family in rice is presented. Using a thorough annotation exercise, 75 MADS-box genes have been identified in rice and categorized into MIKCc, MIKC*, Mα, Mβ and Mγ groups based on phylogeny. Chromosomal localization of these genes reveals that 16 MADS-box genes, mostly MIKCc-type, are located within the duplicated segments of the rice genome, whereas most of the M-type genes, 20 in all, seem to have resulted from tandem duplications. Nine members belonging to the Mβ group, which was considered absent in monocots, have also been identified. The expression profiles of all the MADS-box genes have been analyzed under 11 temporal stages of panicle and seed development, three abiotic stress conditions, along with three stages of vegetative development. Transcripts for 31 genes accumulate preferentially in the reproductive phase, of which, 12 genes are specifically expressed in seeds, and six genes show expression specific to panicle development

  18. Computer-Based Annotation of Putative AraC/XylS-Family Transcription Factors of Known Structure but Unknown Function

    Directory of Open Access Journals (Sweden)

    Andreas Schüller

    2012-01-01

    Full Text Available Currently, about 20 crystal structures per day are released and deposited in the Protein Data Bank. A significant fraction of these structures is produced by research groups associated with the structural genomics consortium. The biological function of many of these proteins is generally unknown or not validated by experiment. Therefore, a growing need for functional prediction of protein structures has emerged. Here we present an integrated bioinformatics method that combines sequence-based relationships and three-dimensional (3D structural similarity of transcriptional regulators with computer prediction of their cognate DNA binding sequences. We applied this method to the AraC/XylS family of transcription factors, which is a large family of transcriptional regulators found in many bacteria controlling the expression of genes involved in diverse biological functions. Three putative new members of this family with known 3D structure but unknown function were identified for which a probable functional classification is provided. Our bioinformatics analyses suggest that they could be involved in plant cell wall degradation (Lin2118 protein from Listeria innocua, PDB code 3oou, symbiotic nitrogen fixation (protein from Chromobacterium violaceum, PDB code 3oio, and either metabolism of plant-derived biomass or nitrogen fixation (protein from Rhodopseudomonas palustris, PDB code 3mn2.

  19. The SPINK gene family and celiac disease susceptibility

    NARCIS (Netherlands)

    Wapenaar, Martin C.; Monsuur, Alienke J.; Poell, Jos; Slot, Ruben Van 't; Meijer, Jos W. R.; Meijer, Gerrit A.; Mulder, Chris J.; Mearin, Maria Luisa; Wijmenga, Cisca

    The gene family of serine protease inhibitors of the Kazal type (SPINK) are functional and positional candidate genes for celiac disease (CD). Our aim was to assess the gut mucosal gene expression and genetic association of SPINK1, -2, -4, and -5 in the Dutch CD population. Gene expression was

  20. Extensive gene amplification and concerted evolution within the CPR family of cuticular proteins in mosquitoes.

    Science.gov (United States)

    Cornman, R Scott; Willis, Judith H

    2008-06-01

    Annotation of the Anopheles gambiae genome has revealed a large increase in the number of genes encoding cuticular proteins with the Rebers and Riddiford Consensus (the CPR gene family) relative to Drosophila melanogaster. This increase reflects an expansion of the RR-2 group of CPR genes, particularly the amplification of sets of highly similar paralogs. Patterns of nucleotide variation indicate that extensive concerted evolution is occurring within these clusters. The pattern of concerted evolution is complex, however, as sequence similarity within clusters is uncorrelated with gene order and orientation, and no comparable clusters occur within similarly compact arrays of the RR-1 group in mosquitoes or in either group in D. melanogaster. The dearth of pseudogenes suggests that sequence clusters are maintained by selection for high gene-copy number, perhaps due to selection for high expression rates. This hypothesis is consistent with the apparently parallel evolution of compact gene architectures within sequence clusters relative to single-copy genes. We show that RR-2 proteins from sequence-cluster genes have complex repeats and extreme amino-acid compositions relative to single-copy CPR proteins in An. gambiae, and that the amino-acid composition of the N-terminal and C-terminal sequence flanking the chitin-binding consensus region evolves in a correlated fashion.

  1. Comprehensive annotation of bidirectional promoters identifies co-regulation among breast and ovarian cancer genes.

    Directory of Open Access Journals (Sweden)

    Mary Q Yang

    2007-04-01

    Full Text Available A "bidirectional gene pair" comprises two adjacent genes whose transcription start sites are neighboring and directed away from each other. The intervening regulatory region is called a "bidirectional promoter." These promoters are often associated with genes that function in DNA repair, with the potential to participate in the development of cancer. No connection between these gene pairs and cancer has been previously investigated. Using the database of spliced-expressed sequence tags (ESTs, we identified the most complete collection of human transcripts under the control of bidirectional promoters. A rigorous screen of the spliced EST data identified new bidirectional promoters, many of which functioned as alternative promoters or regulated novel transcripts. Additionally, we show a highly significant enrichment of bidirectional promoters in genes implicated in somatic cancer, including a substantial number of genes implicated in breast and ovarian cancers. The repeated use of this promoter structure in the human genome suggests it could regulate co-expression patterns among groups of genes. Using microarray expression data from 79 human tissues, we verify regulatory networks among genes controlled by bidirectional promoters. Subsets of these promoters contain similar combinations of transcription factor binding sites, including evolutionarily conserved ETS factor binding sites in ERBB2, FANCD2, and BRCA2. Interpreting the regulation of genes involved in co-expression networks, especially those involved in cancer, will be an important step toward defining molecular events that may contribute to disease.

  2. Array2BIO: from microarray expression data to functional annotation of co-regulated genes

    Directory of Open Access Journals (Sweden)

    Rasley Amy

    2006-06-01

    Full Text Available Abstract Background There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility. Results Array2BIO converts raw intensities into probe expression values, automatically maps those to genes, and subsequently identifies groups of co-expressed genes using two complementary approaches: (1 comparative analysis of signal versus control and (2 clustering analysis of gene expression across different conditions. The identified genes are assigned to functional categories based on Gene Ontology classification and KEGG protein interaction pathways. Array2BIO reliably handles low-expressor genes and provides a set of statistical methods for quantifying expression levels, including Benjamini-Hochberg and Bonferroni multiple testing corrections. An automated interface with the ECR Browser provides evolutionary conservation analysis for the identified gene loci while the interconnection with Crème allows prediction of gene regulatory elements that underlie observed expression patterns. Conclusion We have developed Array2BIO – a web based tool for rapid comprehensive analysis of Affymetrix microarray expression data, which also allows users to link expression data to Dcode.org comparative genomics tools and integrates a system for translating co-expression data into mechanisms of gene co-regulation. Array2BIO is publicly available at http://array2bio.dcode.org.

  3. A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer

    Directory of Open Access Journals (Sweden)

    Leckie Christopher

    2010-09-01

    Full Text Available Abstract Background In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries. Results In this paper, we develop a robust and efficient method for exploratory analysis of microarray data, which produces a number of different orderings (rankings of both genes and samples (reflecting correlation among those genes and samples. The core algorithm is closely related to biclustering, and so we first compare its performance with several existing biclustering algorithms on two real datasets - gastric cancer and lymphoma datasets. We then show on the gastric cancer data that the sample orderings generated by our method are highly statistically significant with respect to the histological classification of samples by using the Jonckheere trend test, while the gene modules are biologically significant with respect to biological processes (from the Gene Ontology. In particular, some of the gene modules associated with biclusters are closely linked to gastric cancer tumorigenesis reported in previous literature, while others are potentially novel discoveries. Conclusion In conclusion, we have developed an effective and efficient method, Bi-Ordering Analysis, to detect informative patterns in gene expression microarrays by ranking genes and samples. In addition, a number of evaluation metrics were applied to assess both the statistical and biological significance of the resulting bi-orderings. The methodology was validated on gastric cancer and lymphoma datasets.

  4. A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer.

    Science.gov (United States)

    Shi, Fan; Leckie, Christopher; MacIntyre, Geoff; Haviv, Izhak; Boussioutas, Alex; Kowalczyk, Adam

    2010-09-23

    In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries. In this paper, we develop a robust and efficient method for exploratory analysis of microarray data, which produces a number of different orderings (rankings) of both genes and samples (reflecting correlation among those genes and samples). The core algorithm is closely related to biclustering, and so we first compare its performance with several existing biclustering algorithms on two real datasets - gastric cancer and lymphoma datasets. We then show on the gastric cancer data that the sample orderings generated by our method are highly statistically significant with respect to the histological classification of samples by using the Jonckheere trend test, while the gene modules are biologically significant with respect to biological processes (from the Gene Ontology). In particular, some of the gene modules associated with biclusters are closely linked to gastric cancer tumorigenesis reported in previous literature, while others are potentially novel discoveries. In conclusion, we have developed an effective and efficient method, Bi-Ordering Analysis, to detect informative patterns in gene expression microarrays by ranking genes and samples. In addition, a number of evaluation metrics were applied to assess both the statistical and biological significance of the resulting bi-orderings. The methodology was validated on gastric cancer and lymphoma datasets.

  5. Mapping and annotating obesity-related genes in pig and human genomes.

    Science.gov (United States)

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases.

  6. An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction.

    Science.gov (United States)

    Piro, Rosario Michael; Ala, Ugo; Molineris, Ivan; Grassi, Elena; Bracco, Chiara; Perego, Gian Paolo; Provero, Paolo; Di Cunto, Ferdinando

    2011-11-01

    Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR.

  7. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Changwei Bi

    2016-09-01

    Full Text Available WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III, with five subgroups (IIa–IIe in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the

  8. Complexity of the MSG gene family of Pneumocystis carinii

    Directory of Open Access Journals (Sweden)

    Stringer James R

    2009-08-01

    Full Text Available Abstract Background The relationship between the parasitic fungus Pneumocystis carinii and its host, the laboratory rat, presumably involves features that allow the fungus to circumvent attacks by the immune system. It is hypothesized that the major surface glycoprotein (MSG gene family endows Pneumocystis with the capacity to vary its surface. This gene family is comprised of approximately 80 genes, which each are approximately 3 kb long. Expression of the MSG gene family is regulated by a cis-dependent mechanism that involves a unique telomeric site in the genome called the expression site. Only the MSG gene adjacent to the expression site is represented by messenger RNA. Several P. carinii MSG genes have been sequenced, which showed that genes in the family can encode distinct isoforms of MSG. The vast majority of family members have not been characterized at the sequence level. Results The first 300 basepairs of MSG genes were subjected to analysis herein. Analysis of 581 MSG sequence reads from P. carinii genomic DNA yielded 281 different sequences. However, many of the sequence reads differed from others at only one site, a degree of variation consistent with that expected to be caused by error. Accounting for error reduced the number of truly distinct sequences observed to 158, roughly twice the number expected if the gene family contains 80 members. The size of the gene family was verified by PCR. The excess of distinct sequences appeared to be due to allelic variation. Discounting alleles, there were 73 different MSG genes observed. The 73 genes differed by 19% on average. Variable regions were rich in nucleotide differences that changed the encoded protein. The genes shared three regions in which at least 16 consecutive basepairs were invariant. There were numerous cases where two different genes were identical within a region that was variable among family members as a whole, suggesting recombination among family members. Conclusion A

  9. The late-annotated small ORF LSO1 is a target gene of the iron regulon of Saccharomyces cerevisiae.

    Science.gov (United States)

    An, Xiuxiang; Zhang, Caiguo; Sclafani, Robert A; Seligman, Paul; Huang, Mingxia

    2015-12-01

    We have identified a new downstream target gene of the Aft1/2-regulated iron regulon in budding yeast Saccharomyces cerevisiae, the late-annotated small open reading frame LSO1. LSO1 transcript is among the most highly induced from a transcriptome analysis of a fet3-1 mutant grown in the presence of the iron chelator bathophenanthrolinedisulfonic acid. LSO1 has a paralog, LSO2, which is constitutively expressed and not affected by iron availability. In contrast, we find that the LSO1 promoter region contains three consensus binding sites for the Aft1/2 transcription factors and that an LSO1-lacZ reporter is highly induced under low-iron conditions in a Aft1-dependent manner. The expression patterns of the Lso1 and Lso2 proteins mirror those of their mRNAs. Both proteins are localized to the nucleus and cytoplasm, but become more cytoplasmic upon iron deprivation consistent with a role in iron transport. LSO1 and LSO2 appear to play overlapping roles in the cellular response to iron starvation since single lso1 and lso2 mutants are sensitive to iron deprivation and this sensitivity is exacerbated when both genes are deleted.

  10. Meta4: a web-application for sharing and annotating metagenomic gene predictions using web-services

    Directory of Open Access Journals (Sweden)

    Emily J Richardson

    2013-09-01

    Full Text Available Whole-genome-shotgun (WGS metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web-application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web-services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website (http://www.ark-genomics.org/bioinformatics/meta4, code is available on Github (https://github.com/mw55309/meta4, a cloud image is available, and an example implementation can be seen at http://www.ark-genomics.org/tools/meta4

  11. CRDB: database of chemosensory receptor gene families in vertebrate.

    Directory of Open Access Journals (Sweden)

    Dong Dong

    Full Text Available Chemosensory receptors (CR are crucial for animals to sense the environmental changes and survive on earth. The emergence of whole-genome sequences provides us an opportunity to identify the entire CR gene repertoires. To completely gain more insight into the evolution of CR genes in vertebrates, we identified the nearly all CR genes in 25 vertebrates using homology-based approaches. Among these CR gene repertoires, nearly half of them were identified for the first time in those previously uncharacterized species, such as the guinea pig, giant panda and elephant, etc. Consistent with previous findings, we found that the numbers of CR genes vary extensively among different species, suggesting an extreme form of 'birth-and-death' evolution. For the purpose of facilitating CR gene analysis, we constructed a database with the goals to provide a resource for CR genes annotation and a web tool for exploring their evolutionary patterns. Besides a search engine for the gene extraction from a specific chromosome region, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of CR genes. Our work can provide a rigorous platform for further study on the evolution of CR genes in vertebrates.

  12. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L..

    Directory of Open Access Journals (Sweden)

    Nan Fu

    Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

  13. Differential Gene Expression in the Otic Capsule and the Middle Ear-An Annotation of Bone-Related Signaling Genes

    DEFF Research Database (Denmark)

    Nielsen, Michelle C.; Martin-Bertelsen, Tomas; Friis, Morten;

    2015-01-01

    and stria vascularis) and the lining tissues from the middle ear of the rat. Data was analyzed with statistical bioinformatics tools. Gene expression levels of selected genes were validated using quantitative polymerase chain reaction. Results: A total of 413 genes were identified when young inner bulla...

  14. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    Science.gov (United States)

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  15. Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)

    KAUST Repository

    Lawton, Jennifer

    2012-03-29

    Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein

  16. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  17. Functional annotation of novel lineage-specific genes using co-expression and promoter analysis

    Directory of Open Access Journals (Sweden)

    Loor Juan J

    2010-03-01

    Full Text Available Abstract Background The diversity of placental architectures within and among mammalian orders is believed to be the result of adaptive evolution. Although, the genetic basis for these differences is unknown, some may arise from rapidly diverging and lineage-specific genes. Previously, we identified 91 novel lineage-specific transcripts (LSTs from a cow term-placenta cDNA library, which are excellent candidates for adaptive placental functions acquired by the ruminant lineage. The aim of the present study was to infer functions of previously uncharacterized lineage-specific genes (LSGs using co-expression, promoter, pathway and network analysis. Results Clusters of co-expressed genes preferentially expressed in liver, placenta and thymus were found using 49 previously uncharacterized LSTs as seeds. Over-represented composite transcription factor binding sites (TFBS in promoters of clustered LSGs and known genes were then identified computationally. Functions were inferred for nine previously uncharacterized LSGs using co-expression analysis and pathway analysis tools. Our results predict that these LSGs may function in cell signaling, glycerophospholipid/fatty acid metabolism, protein trafficking, regulatory processes in the nucleus, and processes that initiate parturition and immune system development. Conclusions The placenta is a rich source of lineage-specific genes that function in the adaptive evolution of placental architecture and functions. We have shown that co-expression, promoter, and gene network analyses are useful methods to infer functions of LSGs with heretofore unknown functions. Our results indicate that many LSGs are involved in cellular recognition and developmental processes. Furthermore, they provide guidance for experimental approaches to validate the functions of LSGs and to study their evolution.

  18. Functional annotation of novel lineage-specific genes using co-expression and promoter analysis.

    Science.gov (United States)

    Kumar, Charu G; Everts, Robin E; Loor, Juan J; Lewin, Harris A

    2010-03-09

    The diversity of placental architectures within and among mammalian orders is believed to be the result of adaptive evolution. Although, the genetic basis for these differences is unknown, some may arise from rapidly diverging and lineage-specific genes. Previously, we identified 91 novel lineage-specific transcripts (LSTs) from a cow term-placenta cDNA library, which are excellent candidates for adaptive placental functions acquired by the ruminant lineage. The aim of the present study was to infer functions of previously uncharacterized lineage-specific genes (LSGs) using co-expression, promoter, pathway and network analysis. Clusters of co-expressed genes preferentially expressed in liver, placenta and thymus were found using 49 previously uncharacterized LSTs as seeds. Over-represented composite transcription factor binding sites (TFBS) in promoters of clustered LSGs and known genes were then identified computationally. Functions were inferred for nine previously uncharacterized LSGs using co-expression analysis and pathway analysis tools. Our results predict that these LSGs may function in cell signaling, glycerophospholipid/fatty acid metabolism, protein trafficking, regulatory processes in the nucleus, and processes that initiate parturition and immune system development. The placenta is a rich source of lineage-specific genes that function in the adaptive evolution of placental architecture and functions. We have shown that co-expression, promoter, and gene network analyses are useful methods to infer functions of LSGs with heretofore unknown functions. Our results indicate that many LSGs are involved in cellular recognition and developmental processes. Furthermore, they provide guidance for experimental approaches to validate the functions of LSGs and to study their evolution.

  19. Recurrent APC gene mutations in Polish FAP families

    Directory of Open Access Journals (Sweden)

    Pławski Andrzej

    2007-12-01

    Full Text Available Abstract The molecular diagnostics of genetically conditioned disorders is based on the identification of the mutations in the predisposing genes. Hereditary cancer disorders of the gastrointestinal tracts are caused by mutations of the tumour suppressor genes or the DNA repair genes. Occurrence of recurrent mutation allows improvement of molecular diagnostics. The mutation spectrum in the genes causing hereditary forms of colorectal cancers in the Polish population was previously described. In the present work an estimation of the frequency of the recurrent mutations of the APC gene was performed. Eight types of mutations occurred in 19.4% of our FAP families and these constitute 43% of all Polish diagnosed families.

  20. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental...... exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results...

  1. Functional annotation of rare gene aberration drivers of pancreatic cancer | Office of Cancer Genomics

    Science.gov (United States)

    As we enter the era of precision medicine, characterization of cancer genomes will directly influence therapeutic decisions in the clinic. Here we describe a platform enabling functionalization of rare gene mutations through their high-throughput construction, molecular barcoding and delivery to cancer models for in vivo tumour driver screens. We apply these technologies to identify oncogenic drivers of pancreatic ductal adenocarcinoma (PDAC).

  2. Gene expression and functional annotation of human choroid plexus epithelium failure in Alzheimer's disease

    NARCIS (Netherlands)

    Bergen, Arthur A; Kaing, Sovann; Ten Brink, Jacoline B; Gorgels, Theo G; Janssen, Sarah F

    2015-01-01

    BACKGROUND: Alzheimer's disease (AD) is the most common form of dementia. AD has a multifactorial disease etiology and is currently untreatable. Multiple genes and molecular mechanisms have been implicated in AD, including ß-amyloid deposition in the brain, neurofibrillary tangle accumulation of

  3. Gene Expression and Functional Annotation of the Human Ciliary Body Epithelia

    NARCIS (Netherlands)

    S.F. Janssen (Sarah); T.G.M.F. Gorgels (Theo); K. Bossers (Koen); J.B. ten Brink (Jacoline); A.H.W. Essing (Anke); M.H. Nagtegaal (Marleen); P.J. van der Spek (Peter); N.M. Jansonius (Nomdo); A.A.B. Bergen (Arthur)

    2012-01-01

    textabstractPurpose: The ciliary body (CB) of the human eye consists of the non-pigmented (NPE) and pigmented (PE) neuro-epithelia. We investigated the gene expression of NPE and PE, to shed light on the molecular mechanisms underlying the most important functions of the CB. We also developed molecu

  4. Gene Expression and Functional Annotation of the Human Ciliary Body Epithelia

    NARCIS (Netherlands)

    Janssen, Sarah F.; Gorgels, Theo G. M. F.; Bossers, Koen; ten Brink, Jacoline B.; Essing, Anke H. W.; Nagtegaal, Martijn; van der Spek, Peter J.; Jansonius, Nomdo M.; Bergen, Arthur A. B.

    2012-01-01

    Purpose: The ciliary body (CB) of the human eye consists of the non-pigmented (NPE) and pigmented (PE) neuro-epithelia. We investigated the gene expression of NPE and PE, to shed light on the molecular mechanisms underlying the most important functions of the CB. We also developed molecular signatur

  5. Msx homeobox gene family and craniofacial development

    Institute of Scientific and Technical Information of China (English)

    SYLVIA ALAPPAT; ZUN YI ZHANG; YI PING CHEN

    2003-01-01

    Vertebrate Msx genes are unlinked,homeobox-containing genes that bear homology to the Drosophila muscle segment homeobox gene.These genes are expressed at multiple sites of tissue-tissue interactions during vertebrate embryonic development.Inductive interactions mediated by the Msx genes are essential for normal craniofacial,limb and ectodermal organ morphogenesis,and are also essential to survival in mice,as manifested by the phenotypic abnormalities shown in knockout mice and in humans.This review summarizes studies on the expression,regulation,and functional analysis of Msx genes that bear relevance to craniofacial development in humans and mice.

  6. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    Science.gov (United States)

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression

  8. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  9. Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA.

    Science.gov (United States)

    Ye, Jieping; Chen, Jianhui; Janardan, Ravi; Kumar, Sudhir

    2008-03-01

    Gene expression in a developing embryo occurs in particular cells (spatial patterns) in a time-specific manner (temporal patterns), which leads to the differentiation of cell fates. Images of a Drosophila melanogaster embryo at a given developmental stage, showing a particular gene expression pattern revealed by a gene-specific probe, can be compared for spatial overlaps. The comparison is fundamentally important to formulating and testing gene interaction hypotheses. Expression pattern comparison is most biologically meaningful when images from a similar time point (developmental stage) are compared. In this paper, we present LdaPath, a novel formulation of Linear Discriminant Analysis (LDA) for automatic developmental stage range classification. It employs multivariate linear regression with the L(1)-norm penalty controlled by a regularization parameter for feature extraction and visualization. LdaPath computes an entire solution path for all values of regularization parameter with essentially the same computational cost as fitting one LDA model. Thus, it facilitates efficient model selection. It is based on the equivalence relationship between LDA and the least squares method for multi-class classifications. This equivalence relationship is established under a mild condition, which we show empirically to hold for many high-dimensional datasets, such as expression pattern images. Our experiments on a collection of 2705 expression pattern images show the effectiveness of the proposed algorithm. Results also show that the LDA model resulting from LdaPath is sparse, and irrelevant features may be removed. Thus, LdaPath provides a general framework for simultaneous feature selection and feature extraction.

  10. Automatic annotation of organellar genomes with DOGMA

    Energy Technology Data Exchange (ETDEWEB)

    Wyman, Stacia; Jansen, Robert K.; Boore, Jeffrey L.

    2004-06-01

    Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of extra-nuclear organellar (chloroplast and animal mitochondrial) genomes. It is a web-based package that allows the use of comparative BLAST searches to identify and annotate genes in a genome. DOGMA presents a list of putative genes to the user in a graphical format for viewing and editing. Annotations are stored on our password-protected server. Complete annotations can be extracted for direct submission to GenBank. Furthermore, intergenic regions of specified length can be extracted, as well the nucleotide sequences and amino acid sequences of the genes.

  11. LeARN: a platform for detecting, clustering and annotating non-coding RNAs

    Directory of Open Access Journals (Sweden)

    Schiex Thomas

    2008-01-01

    Full Text Available Abstract Background In the last decade, sequencing projects have led to the development of a number of annotation systems dedicated to the structural and functional annotation of protein-coding genes. These annotation systems manage the annotation of the non-protein coding genes (ncRNAs in a very crude way, allowing neither the edition of the secondary structures nor the clustering of ncRNA genes into families which are crucial for appropriate annotation of these molecules. Results LeARN is a flexible software package which handles the complete process of ncRNA annotation by integrating the layers of automatic detection and human curation. Conclusion This software provides the infrastructure to deal properly with ncRNAs in the framework of any annotation project. It fills the gap between existing prediction software, that detect independent ncRNA occurrences, and public ncRNA repositories, that do not offer the flexibility and interactivity required for annotation projects. The software is freely available from the download section of the website http://bioinfo.genopole-toulouse.prd.fr/LeARN

  12. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  13. ANOTACIÓN SEMÁNTICA DE IMÁGENES MÉDICAS Semantic Annotation of Medical Images

    Directory of Open Access Journals (Sweden)

    OSCAR CEBALLOS

    Full Text Available El uso de ontologías para facilitar la anotación semántica de imágenes médicas ha sido un enfoque ampliamente utilizado. Una limitación particular de este enfoque, es el reducido número de ontologías con un alto nivel de completitud, debido en parte, a la dificultad que representa su evolución. En este artículo se propone un método que facilita la evolución de ontologías a partir de las contribuciones hechas por expertos de dominio mediante el etiquetado social de imágenes médicas. El método guía el proceso colaborativo durante el descubrimiento del cambio. Adicionalmente, se presenta una herramienta construida sobre Web Protégé para dar soporte al método propuesto.The use of ontologies to facilitate semantic annotation of medical images has been a widely used approach. A particular limitation of this approach is the lack of ontologies with a high level of completeness, mainly because the problem that ontology evolution represent. This article proposes an approach that facilitates the evolution of ontologies from the contributions made by domain experts through social tagging of medical images. The method guides the collaborative process during the discovery of change. Additionally, we present a tool build on Web Protégé to support the proposed method.

  14. UBR5 Gene Mutation Is Associated with Familial Adult Myoclonic Epilepsy in a Japanese Family

    OpenAIRE

    2012-01-01

    The causal gene(s) for familial adult myoclonic epilepsy (FAME) remains undetermined. To identify it, an exome analysis was performed for the proband in a Japanese FAME family. Of the 383 missense/nonsense variants examined, only c.5720G>A mutation (p.Arg1907His) in the UBR5 gene was found in all of the affected individuals in the family, but not in the nonaffected members. Such mutation was not found in any of the 85 healthy individuals in the same community nor in any of the 24 individuals ...

  15. Transcriptome-wide survey of mouse CNS-derived cells reveals monoallelic expression within novel gene families.

    Directory of Open Access Journals (Sweden)

    Sierra M Li

    Full Text Available Monoallelic expression is an integral component of regulation of a number of essential genes and gene families. To probe for allele-specific expression in cells of CNS origin, we used next-generation sequencing (RNA-seq to analyze four clonal neural stem cell (NSC lines derived from Mus musculus C57BL/6 (B6×Mus musculus molossinus (JF1 adult female mice. We established a JF1 cSNP library, then ascertained transcriptome-wide expression from B6 vs. JF1 alleles in the NSC lines. Validating the assay, we found that 262 of 268 X-linked genes evaluable in at least one cell line showed monoallelic expression (at least 85% expression of the predominant allele, p-value<0.05. For autosomal genes 170 of 7,198 genes (2.4% of the total showed monoallelic expression in at least 2 evaluable cell lines. The group included eight known imprinted genes with the expected pattern of allele-specific expression. Among the other autosomal genes with monoallelic expression were five members of the glutathione transferase gene superfamily, which processes xenobiotic compounds as well as carcinogens and cancer therapeutic agents. Monoallelic expression within this superfamily thus may play a functional role in the response to diverse and potentially lethal exogenous factors, as is the case for the immunoglobulin and olfactory receptor superfamilies. Other genes and gene families showing monoallelic expression include the annexin gene family and the Thy1 gene, both linked to inflammation and cancer, as well as genes linked to alcohol dependence (Gabrg1 and epilepsy (Kcnma1. The annotated set of genes will provide a resource for investigation of mechanisms underlying certain cases of these and other major disorders.

  16. Identification of metalloprotease gene families in sugarcane

    Directory of Open Access Journals (Sweden)

    O.H.P. Ramos

    2001-12-01

    Full Text Available Metalloproteases play a key role in many physiological processes in mammals such as cell migration, tissue remodeling and processing of growth factors. They have also been identified as important factors in the patho-physiology of a number of human diseases, including cancer and hypertension. Many bacterial pathogens rely on proteases in order to infect the host. Several classes of metalloproteases have been described in humans, bacteria, snake venoms and insects. However, the presence and characterization of plant metalloproteases have rarely been described in the literature. In our research, we searched the sugarcane expressed sequence tag (SUCEST DNA library in order to identify, by homology with sequences deposited in other databases, metalloprotease gene families expressed under different conditions. Protein sequences from Arabidopsis thaliana and Glycine max were used to search the SUCEST data bank. Conserved regions corresponding to different metalloprotease domains and sequence motifs were identified in the reads to characterize each group of enzymes. At least four classes of sugarcane metalloproteases have been identified, i.e. matrix metalloproteases, zincins, inverzincins, and ATP-dependent metalloproteases. Each enzyme class was analyzed for its expression in different conditions and tissues.Metaloproteases exercem papéis importantes em muitos processos fisiológicos em mamíferos tais como migração celular, remodelamento tecidual e processamento de fatores de crescimento. Estas enzimas estão envolvidas também na pato-fisiologia de um grande número de doenças humanas como hipertensão e câncer. Muitas bactérias patogênicas dependem de proteases para infectar o hospedeiro. Diversas classes de metaloproteases foram descritas em seres humanos, bactérias, venenos de serpentes e insetos. No entanto, a presença e a caracterização de metaloproteases em plantas estão pouco descritas na literatura. Neste trabalho, foi

  17. BAG Family Gene and Its Relationship with Lung Adenocarcinoma Susceptibility

    Directory of Open Access Journals (Sweden)

    Ying LI

    2010-10-01

    Full Text Available Background and objective BAG genes (Bcl-2-associated athanogene belong to a recently discovered multifunctional anti-apoptosis gene family that regulate various physiological processes which include apoptosis, tumorigenesis, neural differentiation, stress response and cell cycle and so on. The expression status of BAG family genes are related to certain tumor incidence and prognosis. The aim of this study is to explore the association of the BAG family gene expression status with the susceptibility of lung adenocarcinoma. Methods The gene expression data of BAG family genes from 29 cases of lung adenocarcinoma tissues and matched pericancerous lung tissess were generated by microarray chips. Cox regression was used to analyze the association between the expression of BAG family genes and the susceptibility of lung adenocarcinoma and the results were verified by GEO database. Results The expression levels of BAG-1, BAG-2, BAG-5 in cancer tissues were significantly downregulated compared with matched pericancerous lung tissues and were protective factors of lung adenocarcinoma (P < 0.05, OR < 1; while the expression level of BAG-4 in cancer tissues were remankably upregulated compared with the matched pericancerous lung tissues and was risk factor of lung adenocarcinoma (P < 0.05, OR > 1. Conclusion BAG-1, BAG-2, BAG-5 might be the potential protective factors while BAG-4 is possible risk factor of lung adenocarcinoma.

  18. The angiotensin-converting enzyme (ACE gene family of Anopheles gambiae

    Directory of Open Access Journals (Sweden)

    Isaac R Elwyn

    2005-12-01

    Full Text Available Abstract Background Members of the M2 family of peptidases, related to mammalian angiotensin converting enzyme (ACE, play important roles in regulating a number of physiological processes. As more invertebrate genomes are sequenced, there is increasing evidence of a variety of M2 peptidase genes, even within a single species. The function of these ACE-like proteins is largely unknown. Sequencing of the A. gambiae genome has revealed a number of ACE-like genes but probable errors in the Ensembl annotation have left the number of ACE-like genes, and their structure, unclear. Results TBLASTN and sequence analysis of cDNAs revealed that the A. gambiae genome contains nine genes (AnoACE genes which code for proteins with similarity to mammalian ACE. Eight of these genes code for putative single domain enzymes similar to other insect ACEs described so far. AnoACE9, however, has several features in common with mammalian somatic ACE such as a two domain structure and a hydrophobic C terminus. Four of the AnoACE genes (2, 3, 7 and 9 were shown to be expressed at a variety of developmental stages. Expression of AnoACE3, AnoACE7 and AnoACE9 is induced by a blood meal, with AnoACE7 showing the largest (approximately 10-fold induction. Conclusion Genes coding for two-domain ACEs have arisen several times during the course of evolution suggesting a common selective advantage to having an ACE with two active-sites in tandem in a single protein. AnoACE7 belongs to a sub-group of insect ACEs which are likely to be membrane-bound and which have an unusual, conserved gene structure.

  19. Genome organization and expression of the rat ACBP gene family

    DEFF Research Database (Denmark)

    Mandrup, S; Andreasen, P H; Knudsen, J

    1993-01-01

    pool former. We have molecularly cloned and characterized the rat ACBP gene family which comprises one expressed and four processed pseudogenes. One of these was shown to exist in two allelic forms. A comprehensive computer-aided analysis of the promoter region of the expressed ACBP gene revealed...

  20. The Odorant Binding Protein Gene Family from the Genome of Silkworm, Bombyx mori

    Directory of Open Access Journals (Sweden)

    Zhao Ping

    2009-07-01

    Full Text Available Abstract Background Chemosensory systems play key roles in the survival and reproductive success of insects. Insect chemoreception is mediated by two large and diverse gene superfamilies, chemoreceptors and odorant binding proteins (OBPs. OBPs are believed to transport hydrophobic odorants from the environment to the olfactory receptors. Results We identified a family of OBP-like genes in the silkworm genome and characterized their expression using oligonucleotide microarrays. A total of forty-four OBP genes were annotated, a number comparable to the 57 OBPs known from Anopheles gambiae and 51 from Drosophila melanogaster. As seen in other fully sequenced insect genomes, most silkworm OBP genes are present in large clusters. We defined six subfamilies of OBPs, each of which shows lineage-specific expansion and diversification. EST data and OBP expression profiles from multiple larvae tissues of day three fifth instars demonstrated that many OBPs are expressed in chemosensory-specific tissues although some OBPs are expressed ubiquitously and others exclusively in non-chemosensory tissues. Some atypical OBPs are expressed throughout development. These results reveal that, although many OBPs are chemosensory-specific, others may have more general physiological roles. Conclusion Silkworms possess a number of OBPs genes similar to other insects. Their expression profiles suggest that many OBPs may be involved in olfaction and gustation as well as general carriers of hydrophobic molecules. The expansion of OBP gene subfamilies and sequence divergence indicate that the silkworm OBP family acquired functional diversity concurrently with functional constraints. Further investigation of the OBPs of the silkworm could give insights in the roles of OBPs in chemoreception.

  1. The tyrosinase gene family and albinism in fish

    Institute of Scientific and Technical Information of China (English)

    WANG Jiaqing; HOU Lin; ZHANG Ruifeng; ZHAO Xintao; JIANG Lijuan; SUN Wenjing; AN Jialu; LI Xiaoyan

    2007-01-01

    Tyrosinase exists universally in organisms and is a characterstic enzyme of melanocytes.Tyrosinase family genes in vertebrates consist of 3 related members; tyrosinase (TYR, Tyr),tyrosinase-related protein-1 (TRP-1, Tyrpl), and tyrosinase-related protein-2 (TRP-2, Tyrp2, Dct). These proteins catalyze melanin biosynthesis in pigment cells and play important roles in determining vertebrate coloration. Transcription of the TYR and TRP genes is useful for studying neural crest and optic vesicle cell migration and differentiation during embryogenesis and important in pigment rescue in fish. In this paper, the structure of gene and protein molecular evolution, function and roles of the TYR family in fish were reviewed.

  2. Update of human and mouse forkhead box (FOX gene families

    Directory of Open Access Journals (Sweden)

    Jackson Brian C

    2010-06-01

    Full Text Available Abstract The forkhead box (FOX proteins are transcription factors that play complex and important roles in processes from development and organogenesis to regulation of metabolism and the immune system. There are 50 FOX genes in the human genome and 44 in the mouse, divided into 19 subfamilies. All human FOX genes have close mouse orthologues, with one exception: the mouse has a single Foxd4, whereas the human gene has undergone a recent duplication to a total of seven (FOXD4 and FOXD4L1 → FOXD4L6. Evolutionarily ancient family members can be found as far back as the fungi and metazoans. The DNA-binding domain, the forkhead domain, is an example of the winged-helix domain, and is very well conserved across the FOX family and across species, with a few notable exceptions in which divergence has created new functionality. Mutations in FOX genes have been implicated in at least four familial human diseases, and differential expression may play a role in a number of other pathologies -- ranging from metabolic disorders to autoimmunity. Furthermore, FOX genes are differentially expressed in a large number of cancers; their role can be either as an oncogene or tumour suppressor, depending on the family member and cell type. Although some drugs that target FOX gene expression or activity, notably proteasome inhibitors, appear to work well, much more basic research is needed to unlock the complex interplay of upstream and downstream interactions with FOX family transcription factors.

  3. Identification and expression analysis of the LRR-RLK gene family in tomato (Solanum lycopersicum) Heinz 1706.

    Science.gov (United States)

    Wei, Zhirong; Wang, Jiehua; Yang, Shaohui; Song, Yingjin

    2015-04-01

    As the largest subfamily of receptor-like kinases (RLKs), leucine-rich repeat receptor-like kinases (LRR-RLKs) regulate the growth, development, and stress responses of plants. Through a reiterative process of sequence analysis and re-annotation, 234 LRR-RLK genes were identified in the genome of tomato (Solanum lycopersicum) 'Heinz 1706', which were further grouped into 10 major groups based on their sequence similarity. In comparison to the significant role of tandem duplication in the expansion process of this gene family in other species, only approximately 12% (29 out of 234) of SlLRR-RLK genes arose from tandem duplication. Using the multiple expectation maximization for motif elicitation (MEME) method, the motif composition and arrangement were found to be variably conserved within each SlLRR-RLK group, indicating their different extent of functional divergence. Expression profiling analyses by qRT-PCR data revealed that SlLRR-RLK genes were differentially expressed in various tomato organs and tissues, and some SlLRR-RLK genes exhibited preferential expression in fruits at distinct developmental stages, suggesting that SlLRR-RLK may take important roles in fruit development and ripening process. The results of this study provide an overview of the LRR-RLK gene family in tomato Heinz 1706, one important species of Solanaceae, and will be helpful for future functional analysis of this important protein family in fleshy fruit-bearing species.

  4. Functional annotations of diabetes nephropathy susceptibility loci through analysis of genome-wide renal gene expression in rat models of diabetes mellitus

    DEFF Research Database (Denmark)

    Hu, Yaomin; Kaisaki, Pamela J; Argoud, Karène

    2009-01-01

    of spontaneous (genetically determined) mild hyperglycaemia and insulin resistance (Goto-Kakizaki-GK) and experimentally induced severe hyperglycaemia (Wistar-Kyoto-WKY rats injected with streptozotocin [STZ]). RESULTS: Different patterns of transcription regulation in the two rat models of diabetes likely...... number of protein coding sequences of unknown function which can be considered as functional and, when they map to DN loci, positional candidates for DN. Further expression analysis of rat orthologs of human DN positional candidate genes provided functional annotations of known and novel genes...

  5. Gene turnover and differential retention in the relaxin/insulin-like gene family in primates.

    Science.gov (United States)

    Arroyo, José Ignacio; Hoffmann, Federico G; Opazo, Juan C

    2012-06-01

    The relaxin/insulin-like gene family is related to the insulin gene family, and includes two separate types of peptides: relaxins (RLNs) and insulin-like peptides (INSLs) that perform a variety of physiological roles including testicular descent, growth and differentiation of the mammary glands, trophoblast development, and cell differentiation. In vertebrates, these genes are found on three separate genomic loci, and in mammals, variation in the number and nature of genes in this family is mostly restricted to the Relaxin Family Locus B. For example, this locus contains a single copy of RLN in platypus and opossum, whereas it contains copies of the INSL6, INSL4, RLN2 and RLN1 genes in human and chimp. The main objective of this research is to characterize changes in the size and membership composition of the RLN/INSL gene family in primates, reconstruct the history of the RLN/INSL genes of primates, and test competing evolutionary scenarios regarding the origin of INSL4 and of the duplicated copies of the RLN gene of apes. Our results show that the relaxin/INSL-like gene family of primates has had a more dynamic evolutionary history than previously thought, including several examples of gene duplications and losses which are consistent with the predictions of the birth-and-death model of gene family evolution. In particular, we found that the differential retention of relatively old paralogs played a key role in shaping the gene complement of this family in primates. Two examples of this phenomenon are the origin of the INSL4 gene of catarrhines (the group that includes Old World monkeys and apes), and of the duplicate RLN1 and RLN2 paralogs of apes. In the case of INSL4, comparative genomics and phylogenetic analyses indicate that the origin of this gene, which was thought to represent a catarrhine-specific evolutionary innovation, is as old as the split between carnivores and primates, which took place approximately 97 million years ago. In addition, in the case

  6. DFLAT: functional annotation for human development.

    Science.gov (United States)

    Wick, Heather C; Drabkin, Harold; Ngu, Huy; Sackman, Michael; Fournier, Craig; Haggett, Jessica; Blake, Judith A; Bianchi, Diana W; Slonim, Donna K

    2014-02-07

    Recent increases in genomic studies of the developing human fetus and neonate have led to a need for widespread characterization of the functional roles of genes at different developmental stages. The Gene Ontology (GO), a valuable and widely-used resource for characterizing gene function, offers perhaps the most suitable functional annotation system for this purpose. However, due in part to the difficulty of studying molecular genetic effects in humans, even the current collection of comprehensive GO annotations for human genes and gene products often lacks adequate developmental context for scientists wishing to study gene function in the human fetus. The Developmental FunctionaL Annotation at Tufts (DFLAT) project aims to improve the quality of analyses of fetal gene expression and regulation by curating human fetal gene functions using both manual and semi-automated GO procedures. Eligible annotations are then contributed to the GO database and included in GO releases of human data. DFLAT has produced a considerable body of functional annotation that we demonstrate provides valuable information about developmental genomics. A collection of gene sets (genes implicated in the same function or biological process), made by combining existing GO annotations with the 13,344 new DFLAT annotations, is available for use in novel analyses. Gene set analyses of expression in several data sets, including amniotic fluid RNA from fetuses with trisomies 21 and 18, umbilical cord blood, and blood from newborns with bronchopulmonary dysplasia, were conducted both with and without the DFLAT annotation. Functional analysis of expression data using the DFLAT annotation increases the number of implicated gene sets, reflecting the DFLAT's improved representation of current knowledge. Blinded literature review supports the validity of newly significant findings obtained with the DFLAT annotations. Newly implicated significant gene sets also suggest specific hypotheses for future

  7. The tomato terpene synthase gene family

    NARCIS (Netherlands)

    Falara, V.; Akhtar, T.A.; Nguyen, T.T.H.; Spyropoulou, E.A.; Bleeker, P.M.; Schauvinhold, I.; Matsuba, Y.; Bonini, M.E.; Schilmiller, A.L.; Last, R.L.; Schuurink, R.C.; Pichersky, E.

    2011-01-01

    Compounds of the terpenoid class play many roles in the interactions of plants with their environment, such as attracting pollinators and defending the plant against pests. We show here that the genome of Solanum lycopersicum (cultivated tomato) contains 40 terpene synthase (TPS) genes, including 28

  8. Review: the dominant flocculation genes of Saccharomyces cerevisiae constitute a new subtelomeric gene family.

    Science.gov (United States)

    Teunissen, A W; Steensma, H Y

    1995-09-15

    The quality of brewing strains is, in large part, determined by their flocculation properties. By classical genetics, several dominant, semidominant and recessive flocculation genes have been recognized. Recent results of experiments to localize the flocculation genes FLO5 and FLO8, combined with the in silicio analysis of the available sequence data of the yeast genome, have revealed that the flocculation genes belong to a family which comprises at least four genes and three pseudogenes. All members of this gene family are located near the end of chromosomes, just like the SUC, MEL and MAL genes, which are also important for good quality baking or brewing strains. Transcription of the flocculation genes is repressed by several regulatory genes. In addition, a number of genes have been found which cause cell aggregation upon disruption or overexpression in an as yet unknown manner. In total, 33 genes have been reported that are involved in flocculation or cell aggregation.

  9. Jannovar: a java library for exome annotation.

    Science.gov (United States)

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.

  10. Mitochondrial gene mutations and type 2 diabetes in Chinese families

    Institute of Scientific and Technical Information of China (English)

    LI Ming-zhen; YU De-min; YU Pei; LIU De-min; WANG Kun; TANG Xin-zhi

    2008-01-01

    Background Numerous mitochondrial DNA mutations are significantly correlated with development of diabetes. This study investigated mitochondrial gene, point mutations in patients with type 2 diabetes and their families. Methods Unrelated patients with type 2 diabetes(n=826)were randomly recruited; unrelated and nondiabetic subjects (n=637)served as controls. The clinical and biochemical data of the participants were collected. Total genome was extracted from peripheral leucocytes. Polymerase chain reaction, restriction fragment length polymorphism (PCR-RFLP)and clonig techniques were used to screen mitochondrial genes including np3316,np3394 and np3426 in the ND1 region and np3243 in the tRNALeu (UUR). Results In 39 diabetics with one or more mitochondrial gene point mutations, the prevalence(4.7%,39/826)of mtDNA mutations was higher than that(0.7%,5/637)in the controls. The identical mutation was found in 23 of 43 tested members from three pedigrees. Affected family members presented with variable clinical features ranging from normal glucose tolerance to impaired glucose tolerance (IGT)(n=2),impaired fasting glucose(IFG)(n=1)to type 2 diabetes (n=13)with 3 family members suffering from hearing loss. Conclusions Type 2 diabetes in China is associated with several mitochondrial gene mutations. Aged patients with diabetic family history had a higher prevalence of mutation and various clinical pictures. Mitochondrial gene mutation might be one of the genetic factors contributing to diabetic familial clustering.

  11. Evolution of the YABBY gene family in seed plants.

    Science.gov (United States)

    Finet, Cédric; Floyd, Sandra K; Conway, Stephanie J; Zhong, Bojian; Scutt, Charles P; Bowman, John L

    2016-01-01

    Members of the YABBY gene family of transcription factors in angiosperms have been shown to be involved in the initiation of outgrowth of the lamina, the maintenance of polarity, and establishment of the leaf margin. Although most of the dorsal-ventral polarity genes in seed plants have homologs in non-spermatophyte lineages, the presence of YABBY genes is restricted to seed plants. To gain insight into the origin and diversification of this gene family, we reconstructed the evolutionary history of YABBY gene lineages in seed plants. Our findings suggest that either one or two YABBY genes were present in the last common ancestor of extant seed plants. We also examined the expression of YABBY genes in the gymnosperms Ephedra distachya (Gnetales), Ginkgo biloba (Ginkgoales), and Pseudotsuga menziesii (Coniferales). Our data indicate that some YABBY genes are expressed in a polar (abaxial) manner in leaves and female cones in gymnosperms. We propose that YABBY genes already acted as polarity genes in the last common ancestor of extant seed plants. © 2016 Wiley Periodicals, Inc.

  12. Molecular evolution and functional divergence of the metallothionein gene family in vertebrates.

    Science.gov (United States)

    Serén, Nina; Glaberman, Scott; Carretero, Miguel A; Chiari, Ylenia

    2014-04-01

    The metallothionein (MT) gene superfamily consists of metal-binding proteins involved in various metal detoxification and storage mechanisms. The evolution of this gene family in vertebrates has mostly been studied in mammals using sparse taxon or gene sampling. Genomic databases and available data on MT protein function and expression allow a better understanding of the evolution and functional divergence of the different MT types. We recovered 77 MT coding sequences from 20 representative vertebrates with annotated complete genomes. We found multiple MT genes, also in reptiles, which were thought to have only one MT type. Phylogenetic and synteny analyses indicate the existence of a eutherian MT1 and MT2, a tetrapod MT3, an amniote MT4, and fish MT. The optimal gene-tree/species-tree reconciliation analyses identified the best root in the fish clade. Functional analyses reveal variation in hydropathic index among protein domains, likely correlated with their distinct flexibility and metal affinity. Analyses of functional divergence identified amino acid sites correlated with functional divergence among MT types. Uncovering the number of genes and sites possibly correlated with functional divergence will help to design cost-effective MT functional and gene expression studies. This will permit further understanding of the distinct roles and specificity of these proteins and to properly target specific MT for different types of functional studies. Therefore, this work presents a critical background on the molecular evolution and functional divergence of vertebrate MTs to carry out further detailed studies on the relationship between heavy metal metabolism and tolerances among vertebrates.

  13. Molecular Evolution of the Glycosyltransferase 6 Gene Family in Primates

    Science.gov (United States)

    Mendonça-Mattos, Patricia Jeanne de Souza; Harada, Maria Lúcia

    2016-01-01

    Glycosyltransferase 6 gene family includes ABO, Ggta1, iGb3S, and GBGT1 genes and by three putative genes restricted to mammals, GT6m6, GTm6, and GT6m7, only the latter is found in primates. GT6 genes may encode functional and nonfunctional proteins. Ggta1 and GBGT1 genes, for instance, are pseudogenes in catarrhine primates, while iGb3S gene is only inactive in human, bonobo, and chimpanzee. Even inactivated, these genes tend to be conversed in primates. As some of the GT6 genes are related to the susceptibility or resistance to parasites, we investigated (i) the selective pressure on the GT6 paralogs genes in primates; (ii) the basis of the conservation of iGb3S in human, chimpanzee, and bonobo; and (iii) the functional potential of the GBGT1 and GT6m7 in catarrhines. We observed that the purifying selection is prevalent and these genes have a low diversity, though ABO and Ggta1 genes have some sites under positive selection. GT6m7, a putative gene associated with aggressive periodontitis, may have regulatory function, but experimental studies are needed to assess its function. The evolutionary conservation of iGb3S in humans, chimpanzee, and bonobo seems to be the result of proximity to genes with important biological functions. PMID:28044107

  14. Molecular Evolution of the Glycosyltransferase 6 Gene Family in Primates

    Directory of Open Access Journals (Sweden)

    Eliane Evanovich

    2016-01-01

    Full Text Available Glycosyltransferase 6 gene family includes ABO, Ggta1, iGb3S, and GBGT1 genes and by three putative genes restricted to mammals, GT6m6, GTm6, and GT6m7, only the latter is found in primates. GT6 genes may encode functional and nonfunctional proteins. Ggta1 and GBGT1 genes, for instance, are pseudogenes in catarrhine primates, while iGb3S gene is only inactive in human, bonobo, and chimpanzee. Even inactivated, these genes tend to be conversed in primates. As some of the GT6 genes are related to the susceptibility or resistance to parasites, we investigated (i the selective pressure on the GT6 paralogs genes in primates; (ii the basis of the conservation of iGb3S in human, chimpanzee, and bonobo; and (iii the functional potential of the GBGT1 and GT6m7 in catarrhines. We observed that the purifying selection is prevalent and these genes have a low diversity, though ABO and Ggta1 genes have some sites under positive selection. GT6m7, a putative gene associated with aggressive periodontitis, may have regulatory function, but experimental studies are needed to assess its function. The evolutionary conservation of iGb3S in humans, chimpanzee, and bonobo seems to be the result of proximity to genes with important biological functions.

  15. Identification and in silico analysis of the Citrus HSP70 molecular chaperone gene family

    Directory of Open Access Journals (Sweden)

    Luciano G. Fietto

    2007-01-01

    Full Text Available The completion of the genome sequencing of the Arabidopsis thaliana model system provided a powerful molecular tool for comparative analysis of gene families present in the genome of economically relevant plant species. In this investigation, we used the sequences of the Arabidopsis Hsp70 gene family to identify and annotate the Citrus Hsp70 genes represented in the CitEST database. Based on sequence comparison analysis, we identified 18 clusters that were further divided into 5 subgroups encoding four mitochondrial mtHsp70s, three plastid csHsp70s, one ER luminal Hsp70 BiP, two HSP110/SSE-related proteins and eight cytosolic Hsp/Hsc70s. We also analyzed the expression profile by digital Northern of each Hsp70 transcript in different organs and in response to stress conditions. The EST database revealed a distinct population distribution of Hsp70 ESTs among isoforms and across the organs surveyed. The Hsp70-5 isoform was highly expressed in seeds, whereas BiP, mitochondrial and plastid HSp70 mRNAs displayed a similar expression profile in the organs analyzed, and were predominantly represented in flowers. Distinct Hsp70 mRNAs were also differentially expressed during Xylella infection and Citrus tristeza viral infection as well as during water deficit. This in silico study sets the groundwork for future investigations to fully characterize functionally the Citrus Hsp70 family and underscores the relevance of Hsp70s in response to abiotic and biotic stresses in Citrus.

  16. Functional annotation of rheumatoid arthritis and osteoarthritis associated genes by integrative genome-wide gene expression profiling analysis.

    Directory of Open Access Journals (Sweden)

    Zhan-Chun Li

    Full Text Available BACKGROUND: Rheumatoid arthritis (RA and osteoarthritis (OA are two major types of joint diseases that share multiple common symptoms. However, their pathological mechanism remains largely unknown. The aim of our study is to identify RA and OA related-genes and gain an insight into the underlying genetic basis of these diseases. METHODS: We collected 11 whole genome-wide expression profiling datasets from RA and OA cohorts and performed a meta-analysis to comprehensively investigate their expression signatures. This method can avoid some pitfalls of single dataset analyses. RESULTS AND CONCLUSION: We found that several biological pathways (i.e., the immunity, inflammation and apoptosis related pathways are commonly involved in the development of both RA and OA. Whereas several other pathways (i.e., vasopressin-related pathway, regulation of autophagy, endocytosis, calcium transport and endoplasmic reticulum stress related pathways present significant difference between RA and OA. This study provides novel insights into the molecular mechanisms underlying this disease, thereby aiding the diagnosis and treatment of the disease.

  17. Exploiting gene families for phylogenomic analysis of myzostomid transcriptome data.

    Directory of Open Access Journals (Sweden)

    Stefanie Hartmann

    Full Text Available BACKGROUND: In trying to understand the evolutionary relationships of organisms, the current flood of sequence data offers great opportunities, but also reveals new challenges with regard to data quality, the selection of data for subsequent analysis, and the automation of steps that were once done manually for single-gene analyses. Even though genome or transcriptome data is available for representatives of most bilaterian phyla, some enigmatic taxa still have an uncertain position in the animal tree of life. This is especially true for myzostomids, a group of symbiotic (or parasitic protostomes that are either placed with annelids or flatworms. METHODOLOGY: Based on similarity criteria, Illumina-based transcriptome sequences of one myzostomid were compared to protein sequences of one additional myzostomid and 29 reference metazoa and clustered into gene families. These families were then used to investigate the phylogenetic position of Myzostomida using different approaches: Alignments of 989 sequence families were concatenated, and the resulting superalignment was analyzed under a Maximum Likelihood criterion. We also used all 1,878 gene trees with at least one myzostomid sequence for a supertree approach: the individual gene trees were computed and then reconciled into a species tree using gene tree parsimony. CONCLUSIONS: Superalignments require strictly orthologous genes, and both the gene selection and the widely varying amount of data available for different taxa in our dataset may cause anomalous placements and low bootstrap support. In contrast, gene tree parsimony is designed to accommodate multilocus gene families and therefore allows a much more comprehensive data set to be analyzed. Results of this supertree approach showed a well-resolved phylogeny, in which myzostomids were part of the annelid radiation, and major bilaterian taxa were found to be monophyletic.

  18. Molecular evolution of PKD2 gene family in mammals.

    Science.gov (United States)

    Ye, Chun; Sun, Huan; Guo, Wenhu; Wei, Yuquan; Zhou, Qin

    2009-09-01

    PKD2 gene encodes a critical cation channel protein that plays important roles in various developmental processes and is usually evolutionarily conserved. In the present study, we analyzed the evolutionary patterns of PKD2 and its homologous genes (PKD2L1, PKD2L2) from nine mammalian species. In this study, we demonstrated the orthologs of PKD2 gene family evolved under a dominant purifying selection force. Our results in combination with the reported evidences from functional researches suggested the entire PKD2 gene family are conserved and perform essential biological roles during mammalian evolution. In rodents, PKD2 gene family members appeared to have evolved more rapidly than other mammalian lineages, probably resulting from relaxation of purifying selection. However, positive selection imposed on synonymous sites also potentially contributed to this case. For the paralogs, our results implied that PKD2L2 genes evolved under a weaker purifying selection constraint than PKD2 and PKD2L1 genes. Interestingly, some loop regions of transmembrane domain of PKD2L2 exhibited higher P (N)/P (S) ratios than expected, suggesting these regions are more functional divergent in organisms and worthy of special attention.

  19. Genetic variance in the adiponutrin gene family and childhood obesity.

    Directory of Open Access Journals (Sweden)

    Lovisa E Johansson

    Full Text Available AIM: The adiponutrin gene family consists of five genes (PNPLA1-5 coding for proteins with both lipolytic and lipogenic properties. PNPLA3 has previously been associated with adult obesity. Here we investigated the possible association between genetic variants in these genes and childhood and adolescent obesity. METHODS/RESULTS: Polymorphisms in the five genes of the adiponutrin gene family were selected and genotyped using the Sequenom platform in a childhood and adolescent obesity case-control study. Six variants in PNPLA1 showed association with obesity (rs9380559, rs12212459, rs1467912, rs4713951, rs10947600, and rs12199580, p0.05. When analyzing these SNPs in relation to phenotypes, two SNPs in the PNPLA3 gene showed association with insulin sensitivity (rs12483959: beta = -0.053, p = 0.016, and rs2072907: beta = -0.049, p = 0.024. No associations were seen for PNPLA2, PNPLA4, and PNPLA5. CONCLUSIONS: Genetic variation in the adiponutrin gene family does not seem to contribute strongly to obesity in children and adolescents. PNPLA1 exhibited a modest effect on obesity and PNPLA3 on insulin sensitivity. These data, however, require confirmation in other cohorts and ethnic groups.

  20. A Uniform System For The Annotation Of Human microRNA Genes And The Evolution Of The Human microRNAome

    Science.gov (United States)

    Fromm, Bastian; Billipp, Tyler; Peck, Liam E.; Johansen, Morten; Tarver, James E.; King, Benjamin L.; Newcomb, James M.; Sempere, Lorenzo F.; Flatmark, Kjersti; Hovig, Eivind; Peterson, Kevin J.

    2016-01-01

    Although microRNAs (miRNAs) are among the most intensively studied molecules of the past 20 years, determining what is and what is not a miRNA has not been straightforward. Here, we present a uniform system for the annotation and nomenclature of miRNA genes. We show that fewer than a third of the 1,881 human miRBase entries, and only approximately 16% of the 7,095 metazoan miRBase entries, are robustly supported as miRNA genes. Furthermore, we show that the human repertoire of miRNAs has been shaped by periods of intense miRNA innovation, and that mature gene products show a very different tempo and mode of sequence evolution than star products. We establish a new open access database -- MirGeneDB (http://mirgenedb.org) -- to catalog this set of robustly supported miRNAs, which complements the efforts of miRBase, but differs from it by annotating the mature versus star products, and by imposing an evolutionary hierarchy upon this curated and consistently named repertoire. PMID:26473382

  1. A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome.

    Science.gov (United States)

    Fromm, Bastian; Billipp, Tyler; Peck, Liam E; Johansen, Morten; Tarver, James E; King, Benjamin L; Newcomb, James M; Sempere, Lorenzo F; Flatmark, Kjersti; Hovig, Eivind; Peterson, Kevin J

    2015-01-01

    Although microRNAs (miRNAs) are among the most intensively studied molecules of the past 20 years, determining what is and what is not a miRNA has not been straightforward. Here, we present a uniform system for the annotation and nomenclature of miRNA genes. We show that less than a third of the 1,881 human miRBase entries, and only approximately 16% of the 7,095 metazoan miRBase entries, are robustly supported as miRNA genes. Furthermore, we show that the human repertoire of miRNAs has been shaped by periods of intense miRNA innovation and that mature gene products show a very different tempo and mode of sequence evolution than star products. We establish a new open access database--MirGeneDB ( http://mirgenedb.org )--to catalog this set of miRNAs, which complements the efforts of miRBase but differs from it by annotating the mature versus star products and by imposing an evolutionary hierarchy upon this curated and consistently named repertoire.

  2. Functional annotation of a full-length mouse cDNA collection

    Energy Technology Data Exchange (ETDEWEB)

    Kawai, J.; Shinagawa, A.; Shibata, K.; Yoshino, M.; Itoh, M.; Ishii, Y.; Arakawa, T.; Hara, A.; Fukunishi, Y.; Konno, H.; Adachi, J.; Fukuda, S.; Aizawa, K.; Izawa, M.; Nishi, K.; Kiyosawa, H.; Kondo, S.; Yamanaka, I.; Saito, T.; Okazaki, Y.; Gojobori, T.; Bono, H.; Kasukawa, T.; Saito, R.; Kadota, K.; Matsuda, H.; Ashburner, M.; Batalov, S.; Casavant, T.; Fleischmann, W.; Gaasterland, T.; Gissi, C.; King, B.; Kochiwa, H.; Kuehl, P.; Lewis, S.; Matsuo, Y.; Nikaido, I.; Pesole, G.; Quackenbush, J.; Schriml, L.M.; Staubli, F.; Suzuki, R.; Tomita, M.; Wagner, L.; Washio, T.; Sakai, K.; Okido, T.; Furuno, M.; Aono, H.; Baldarelli, R.; Barsh, G.; Blake, J.; Boffelli, D.; Bojunga, N.; Carninci, P.; de Bonaldo, M.F.; Brownstein, M.J.; Bult, C.; Fletcher, C.; Fujita, M.; Gariboldi, M.; Gustincich, S.; Hill, D.; Hofmann, M.; Hume, D.A.; Kamiya, M.; Lee, N.H.; Lyons, P.; Marchionni, L.; Mashima, J.; Mazzarelli, J.; Mombaerts, P.; Nordone, P.; Ring, B.; Ringwald, M.; Rodriguez, I.; Sakamoto, N.; Sasaki, H.; Sato, K.; Schonbach, C.; Seya, T.; Shibata, Y.; Storch, K.-F.; Suzuki, H.; Toyo-oka, K.; Wang, K.H.; Weitz, C.; Whittaker, C.; Wilming, L.; Wynshaw-Boris, A.; Yoshida, K.; Hasegawa, Y.; Kawaji, H.; Kohtsuki, S.; Hayashizaki, Y.; RIKEN Genome Exploration Research Group Phase II T; FANTOM Consortium

    2001-01-01

    The RIKEN Mouse Gene Encyclopedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analyzed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

  3. NCBI prokaryotic genome annotation pipeline.

    Science.gov (United States)

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

  4. Comprehensive analysis of CCCH-type zinc finger gene family in citrus (Clementine mandarin) by genome-wide characterization.

    Science.gov (United States)

    Liu, Shengrui; Khan, Muhammad Rehman Gul; Li, Yongping; Zhang, Jinzhi; Hu, Chungen

    2014-10-01

    The CCCH-type zinc finger proteins comprise a large gene family of regulatory proteins and are widely distributed in eukaryotic organisms. The CCCH proteins have been implicated in multiple biological processes and environmental responses in plants. Little information is available, however, about CCCH genes in plants, especially in woody plants such as citrus. The release of the whole-genome sequence of citrus allowed us to perform a genome-wide analysis of CCCH genes and to compare the identified proteins with their orthologs in model plants. In this study, 62 CCCH genes and a total of 132 CCCH motifs were identified, and a comprehensive analysis including the chromosomal locations, phylogenetic relationships, functional annotations, gene structures and conserved motifs was performed. Distribution mapping revealed that 54 of the 62 CCCH genes are unevenly dispersed on the nine citrus chromosomes. Based on phylogenetic analysis and gene structural features, we constructed 5 subfamilies of 62 CCCH members and integrative subfamilies from citrus, Arabidopsis, and rice, respectively. Importantly, large numbers of SNPs and InDels in 26 CCCH genes were identified from Poncirus trifoliata and Fortunella japonica using whole-genome deep re-sequencing. Furthermore, citrus CCCH genes showed distinct temporal and spatial expression patterns in different developmental processes and in response to various stress conditions. Our comprehensive analysis of CleC3Hs is a valuable resource that further elucidates the roles of CCCH family members in plant growth and development. In addition, variants and comparative genomics analyses deepen our understanding of the evolution of the CCCH gene family and will contribute to further genetics and genomics studies of citrus and other plant species.

  5. RiceDB: A Web-Based Integrated Database for Annotating Rice Microarray

    Institute of Scientific and Technical Information of China (English)

    HE Fei; SHI Qing-yun; CHEN Ming; WU Ping

    2007-01-01

    RiceDB, a web-based integrated database to annotate rice microarray in various biological contexts was developed. It is composed of eight modules. RiceMap module archives the process of Affymetrix probe sets mapping to different databases about rice, and aims to the genes represented by a microarray set by retrieving annotation information via the identifier or accession number of every database; RiceGO module indicates the association between a microarray set and gene ontology (GO) categories; RiceKO module is used to annotate a microarray set based on the KEGG biochemical pathways; RiceDO module indicates the information of domain associated with a microarray set; RiceUP module is used to obtain promoter sequences for all genes represented by a microarray set; RiceMR module lists potential microRNA which regulated the genes represented by a microarray set; RiceCD and RiceGF are used to annotate the genes represented by a microarray set in the context of chromosome distribution and rice paralogous family distribution. The results of automatic annotation are mostly consistent with manual annotation. Biological interpretation of the microarray data is quickened by the help of RiceDB.

  6. Runx Family Genes in Tissue Stem Cell Dynamics.

    Science.gov (United States)

    Wang, Chelsia Qiuxia; Mok, Michelle Meng Huang; Yokomizo, Tomomasa; Tergaonkar, Vinay; Osato, Motomi

    2017-01-01

    The Runx family genes play important roles in development and cancer, largely via their regulation of tissue stem cell behavior. Their involvement in two organs, blood and skin, is well documented. This review summarizes currently known Runx functions in the stem cells of these tissues. The fundamental core mechanism(s) mediated by Runx proteins has been sought; however, it appears that there does not exist one single common machinery that governs both tissue stem cells. Instead, Runx family genes employ multiple spatiotemporal mechanisms in regulating individual tissue stem cell populations. Such specific Runx requirements have been unveiled by a series of cell type-, developmental stage- or age-specific gene targeting studies in mice. Observations from these experiments revealed that the regulation of stem cells by Runx family genes turned out to be far more complex than previously thought. For instance, although it has been reported that Runx1 is required for the endothelial-to-hematopoietic cell transition (EHT) but not thereafter, recent studies clearly demonstrated that Runx1 is also needed during the period subsequent to EHT, namely at perinatal stage. In addition, Runx1 ablation in the embryonic skin mesenchyme eventually leads to complete loss of hair follicle stem cells (HFSCs) in the adult epithelium, suggesting that Runx1 facilitates the specification of skin epithelial stem cells in a cell extrinsic manner. Further in-depth investigation into how Runx family genes are involved in stem cell regulation is warranted.

  7. Expansion of transducin subunit gene families in early vertebrate tetraploidizations.

    Science.gov (United States)

    Lagman, David; Sundström, Görel; Ocampo Daza, Daniel; Abalo, Xesús M; Larhammar, Dan

    2012-10-01

    Hundreds of gene families expanded in the early vertebrate tetraploidizations including many gene families in the phototransduction cascade. We have investigated the evolution of the heterotrimeric G-proteins of photoreceptors, the transducins, in relation to these events using both phylogenetic analyses and synteny comparisons. Three alpha subunit genes were identified in amniotes and the coelacanth, GNAT1-3; two of these were identified in amphibians and teleost fish, GNAT1 and GNAT2. Most tetrapods have four beta genes, GNB1-4, and teleosts have additional duplicates. Finally, three gamma genes were identified in mammals, GNGT1, GNG11 and GNGT2. Of these, GNGT1 and GNGT2 were found in the other vertebrates. In frog and zebrafish additional duplicates of GNGT2 were identified. Our analyses show all three transducin families expanded during the early vertebrate tetraploidizations and the beta and gamma families gained additional copies in the teleost-specific genome duplication. This suggests that the tetraploidizations contributed to visual specialisations.

  8. Diverse Roles of ERECTA Family Genes in Plant Development

    Institute of Scientific and Technical Information of China (English)

    Elena D.Shpak

    2013-01-01

    Multiple receptor-like kinases (RLKs) enable intercellular communication that coordinates growth and development of plant tissues. ERECTA family receptors (ERfs) are an ancient family of leucine-rich repeat RLKs that in Arabidopsis consists of three genes: ERECTA, ERL1, and ERL2. ERfs sense secreted cysteine-rich peptides from the EPF/EPFL family and transmit the signal through a MAP kinase cascade. This review discusses the functions of ERfs in stomata development, in regulation of longitudinal growth of aboveground organs, during reproductive development, and in the shoot apical meristem. In addition the role of ERECTA in plant responses to biotic and abiotic factors is examined.

  9. The maize PIN gene family of auxin transporters

    Directory of Open Access Journals (Sweden)

    Cristian eForestan

    2012-02-01

    Full Text Available Auxin is a key regulator of plant development and its differential distribution in plant tissues, established by a polar cell-to-cell transport, can trigger a wide range of developmental processes. A few members of the two families of auxin efflux transport proteins, PIN-formed (PIN and P-glycoprotein (ABCB/PGP, have so far been characterized in maize. Nine new Zea mays auxin efflux carriers PIN family members and two maize PIN-like genes have now been identified. Four members of PIN1 (named ZmPIN1a–d cluster, one gene homologous to AtPIN2 (ZmPIN2, three orthologs of PIN5 (ZmPIN5a–c, one gene paired with AtPIN8 (ZmPIN8, and three monocot-specific PINs (ZmPIN9, ZmPIN10a and b were cloned and the phylogenetic relationships between early land plants, monocots and eudicots PIN proteins investigated, including the new maize PIN proteins. Tissue-specific expression patterns of the twelve maize PIN genes, two PIN-like genes and ZmABCB1, an ABCB auxin efflux carrier, were analyzed using semi-quantitative RT–PCR. ZmPIN gene transcripts have overlapping expression domains in the root apex, during male and female inflorescence differentiation and kernel development. However, some PIN family members have specific tissue localization: ZmPIN1d transcript marks the L1 layer of the SAM and IM during the flowering transition and the monocot-specific ZmPIN9 is expressed in the root endodermis and pericycle. The phylogenetic and gene structure analyses together with the expression pattern of the ZmPIN gene family indicate that subfunctionalization of some maize PINs can be associated to the differentiation and development of monocot-specific organs and tissues and might have occurred after the divergence between dicots and monocots.

  10. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.;

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced...

  11. Genome-Wide Comparative Analysis of Chemosensory Gene Families in Five Tsetse Fly Species.

    Directory of Open Access Journals (Sweden)

    Rosaline Macharia

    2016-02-01

    Full Text Available For decades, odour-baited traps have been used for control of tsetse flies (Diptera; Glossinidae, vectors of African trypanosomes. However, differential responses to known attractants have been reported in different Glossina species, hindering establishment of a universal vector control tool. Availability of full genome sequences of five Glossina species offers an opportunity to compare their chemosensory repertoire and enhance our understanding of their biology in relation to chemosensation. Here, we identified and annotated the major chemosensory gene families in Glossina. We identified a total of 118, 115, 124, and 123 chemosensory genes in Glossina austeni, G. brevipalpis, G. f. fuscipes, G. pallidipes, respectively, relative to 127 reported in G. m. morsitans. Our results show that tsetse fly genomes have fewer chemosensory genes when compared to other dipterans such as Musca domestica (n>393, Drosophila melanogaster (n = 246 and Anopheles gambiae (n>247. We also found that Glossina chemosensory genes are dispersed across distantly located scaffolds in their respective genomes, in contrast to other insects like D. melanogaster whose genes occur in clusters. Further, Glossina appears to be devoid of sugar receptors and to have expanded CO2 associated receptors, potentially reflecting Glossina's obligate hematophagy and the need to detect hosts that may be out of sight. We also identified, in all species, homologs of Ir84a; a Drosophila-specific ionotropic receptor that promotes male courtship suggesting that this is a conserved trait in tsetse flies. Notably, our selection analysis revealed that a total of four gene loci (Gr21a, GluRIIA, Gr28b, and Obp83a were under positive selection, which confers fitness advantage to species. These findings provide a platform for studies to further define the language of communication of tsetse with their environment, and influence development of novel approaches for control.

  12. Regulatory patterns of a large family of defensin-like genes expressed in nodules of Medicago truncatula.

    Directory of Open Access Journals (Sweden)

    Sumitha Nallu

    Full Text Available Root nodules are the symbiotic organ of legumes that house nitrogen-fixing bacteria. Many genes are specifically induced in nodules during the interactions between the host plant and symbiotic rhizobia. Information regarding the regulation of expression for most of these genes is lacking. One of the largest gene families expressed in the nodules of the model legume Medicago truncatula is the nodule cysteine-rich (NCR group of defensin-like (DEFL genes. We used a custom Affymetrix microarray to catalog the expression changes of 566 NCRs at different stages of nodule development. Additionally, bacterial mutants were used to understand the importance of the rhizobial partners in induction of NCRs. Expression of early NCRs was detected during the initial infection of rhizobia in nodules and expression continued as nodules became mature. Late NCRs were induced concomitantly with bacteroid development in the nodules. The induction of early and late NCRs was correlated with the number and morphology of rhizobia in the nodule. Conserved 41 to 50 bp motifs identified in the upstream 1,000 bp promoter regions of NCRs were required for promoter activity. These cis-element motifs were found to be unique to the NCR family among all annotated genes in the M. truncatula genome, although they contain sub-regions with clear similarity to known regulatory motifs involved in nodule-specific expression and temporal gene regulation.

  13. Snf2 family gene distribution in higher plant genomes reveals DRD1 expansion and diversification in the tomato genome.

    Directory of Open Access Journals (Sweden)

    Joachim W Bargsten

    Full Text Available As part of large protein complexes, Snf2 family ATPases are responsible for energy supply during chromatin remodeling, but the precise mechanism of action of many of these proteins is largely unknown. They influence many processes in plants, such as the response to environmental stress. This analysis is the first comprehensive study of Snf2 family ATPases in plants. We here present a comparative analysis of 1159 candidate plant Snf2 genes in 33 complete and annotated plant genomes, including two green algae. The number of Snf2 ATPases shows considerable variation across plant genomes (17-63 genes. The DRD1, Rad5/16 and Snf2 subfamily members occur most often. Detailed analysis of the plant-specific DRD1 subfamily in related plant genomes shows the occurrence of a complex series of evolutionary events. Notably tomato carries unexpected gene expansions of DRD1 gene members. Most of these genes are expressed in tomato, although at low levels and with distinct tissue or organ specificity. In contrast, the Snf2 subfamily genes tend to be expressed constitutively in tomato. The results underpin and extend the Snf2 subfamily classification, which could help to determine the various functional roles of Snf2 ATPases and to target environmental stress tolerance and yield in future breeding.

  14. Growth Rate of and Gene Expression in Bradyrhizobium diazoefficiens USDA110 due to a Mutation in blr7984, a TetR Family Transcriptional Regulator Gene.

    Science.gov (United States)

    Ohkama-Ohtsu, Naoko; Honma, Haruna; Nakagome, Mariko; Nagata, Maki; Yamaya-Ito, Hiroko; Sano, Yoshiaki; Hiraoka, Norina; Ikemi, Takaaki; Suzuki, Akihiro; Okazaki, Shin; Minamisawa, Kiwamu; Yokoyama, Tadashi

    2016-09-29

    Previous transcriptome analyses have suggested that a gene cluster including a transcriptional regulator (blr7984) of the tetracycline repressor family was markedly down-regulated in symbiosis. Since blr7984 is annotated to be the transcriptional repressor, we hypothesized that it is involved in the repression of genes in the genomic cluster including blr7984 in symbiotic bacteroids. In order to examine the function and involvement of the blr7984 gene in differentiation into bacteroids, we compared the free-living growth/symbiotic phenotype and gene expression between a blr7984-knockout mutant and the wild-type strain of Bradyrhizobium diazoefficiens USDA110. The mutant transiently increased the cell growth rate under free-living conditions and nodule numbers over those with the wild-type strain USDA110. The expression of three genes adjacent to the disrupted blr7984 gene was strongly up-regulated in the mutant in free-living and symbiotic cells. The mutant also induced the expression of genes for glutathione S-transferase, cytochrome c oxidases, ABC transporters, PTS sugar transport systems, and flagella synthesis under free-living conditions. bll7983 encoding glutathione S-transferase was up-regulated the most by the blr7984 disruption. Since redox regulation by glutathione is known to be involved in cell division in prokaryotes and eukaryotes, the strong expression of glutathione S-transferase encoded by the bll7983 gene may have caused redox changes in mutant cells, which resulted in higher rates of cell division.

  15. Sucrose metabolism gene families and their biological functions.

    Science.gov (United States)

    Jiang, Shu-Ye; Chi, Yun-Hua; Wang, Ji-Zhou; Zhou, Jun-Xia; Cheng, Yan-Song; Zhang, Bao-Lan; Ma, Ali; Vanitha, Jeevanandam; Ramachandran, Srinivasan

    2015-11-30

    Sucrose, as the main product of photosynthesis, plays crucial roles in plant development. Although studies on general metabolism pathway were well documented, less information is available on the genome-wide identification of these genes, their expansion and evolutionary history as well as their biological functions. We focused on four sucrose metabolism related gene families including sucrose synthase, sucrose phosphate synthase, sucrose phosphate phosphatase and UDP-glucose pyrophosphorylase. These gene families exhibited different expansion and evolutionary history as their host genomes experienced differentiated rates of the whole genome duplication, tandem and segmental duplication, or mobile element mediated gene gain and loss. They were evolutionarily conserved under purifying selection among species and expression divergence played important roles for gene survival after expansion. However, we have detected recent positive selection during intra-species divergence. Overexpression of 15 sorghum genes in Arabidopsis revealed their roles in biomass accumulation, flowering time control, seed germination and response to high salinity and sugar stresses. Our studies uncovered the molecular mechanisms of gene expansion and evolution and also provided new insight into the role of positive selection in intra-species divergence. Overexpression data revealed novel biological functions of these genes in flowering time control and seed germination under normal and stress conditions.

  16. Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function.

    Science.gov (United States)

    Busk, P K; Pilgaard, B; Lezyk, M J; Meyer, A S; Lange, L

    2017-04-12

    Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis for prediction of enzyme function. A fast and reliable method for de novo annotation of genes encoding carbohydrate-active enzymes is to identify conserved peptides in the curated enzyme families followed by matching of the conserved peptides to the sequence of interest as demonstrated for the glycosyl hydrolase and the lytic polysaccharide monooxygenase families. This approach not only assigns the enzymes to families but also provides functional prediction of the enzymes with high accuracy. We identified conserved peptides for all enzyme families in the CAZy database with Peptide Pattern Recognition. The conserved peptides were matched to protein sequence for de novo annotation and functional prediction of carbohydrate-active enzymes with the Hotpep method. Annotation of protein sequences from 12 bacterial and 16 fungal genomes to families with Hotpep had an accuracy of 0.84 (measured as F1-score) compared to semiautomatic annotation by the CAZy database whereas the dbCAN HMM-based method had an accuracy of 0.77 with optimized parameters. Furthermore, Hotpep provided a functional prediction with 86% accuracy for the annotated genes. Hotpep is available as a stand-alone application for MS Windows. Hotpep is a state-of-the-art method for automatic annotation and functional prediction of carbohydrate-active enzymes.

  17. Genome-wide analysis of homeobox gene family in legumes: identification, gene duplication and expression profiling.

    Science.gov (United States)

    Bhattacharjee, Annapurna; Ghangal, Rajesh; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Homeobox genes encode transcription factors that are known to play a major role in different aspects of plant growth and development. In the present study, we identified homeobox genes belonging to 14 different classes in five legume species, including chickpea, soybean, Medicago, Lotus and pigeonpea. The characteristic differences within homeodomain sequences among various classes of homeobox gene family were quite evident. Genome-wide expression analysis using publicly available datasets (RNA-seq and microarray) indicated that homeobox genes are differentially expressed in various tissues/developmental stages and under stress conditions in different legumes. We validated the differential expression of selected chickpea homeobox genes via quantitative reverse transcription polymerase chain reaction. Genome duplication analysis in soybean indicated that segmental duplication has significantly contributed in the expansion of homeobox gene family. The Ka/Ks ratio of duplicated homeobox genes in soybean showed that several members of this family have undergone purifying selection. Moreover, expression profiling indicated that duplicated genes might have been retained due to sub-functionalization. The genome-wide identification and comprehensive gene expression profiling of homeobox gene family members in legumes will provide opportunities for functional analysis to unravel their exact role in plant growth and development.

  18. Characterization of the MSMEG_2631 Gene (mmp) Encoding a Multidrug and Toxic Compound Extrusion (MATE) Family Protein in Mycobacterium smegmatis and Exploration of Its Polyspecific Nature Using Biolog Phenotype MicroArray

    OpenAIRE

    Mishra, Mukti Nath; Daniels, Lacy

    2013-01-01

    In Mycobacterium, multidrug efflux pumps can be associated with intrinsic drug resistance. Comparison of putative mycobacterial transport genes revealed a single annotated open reading frame (ORF) for a multidrug and toxic compound extrusion (MATE) family efflux pump in all sequenced mycobacteria except Mycobacterium leprae. Since MATE efflux pumps function as multidrug efflux pumps by conferring resistance to structurally diverse antibiotics and DNA-damaging chemicals, we studied this gene (...

  19. Chromosomal evolution of the PKD1 gene family in primates

    Directory of Open Access Journals (Sweden)

    Krawczak Michael

    2009-01-01

    Full Text Available Abstract Correction to Kirsch S, Pasantes J, Wolf A, Bogdanova N, Münch C, Pennekamp P, Krawczak M, Dworniczak B, Schempp W: Chromosomal evolution of the PKD1 gene family in primates. BMC Evolutionary Biology 2008, 8:263 (doi:10.1186/1471-2148-8-263

  20. Phylogenetic molecular function annotation

    Science.gov (United States)

    Engelhardt, Barbara E.; Jordan, Michael I.; Repo, Susanna T.; Brenner, Steven E.

    2009-07-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic approach for predicting molecular function (sometimes called "phylogenomics") is an effective means to predict protein molecular function. These methods incorporate functional evidence from all members of a family that have functional characterizations using the evolutionary history of the protein family to make robust predictions for the uncharacterized proteins. However, they are often difficult to apply on a genome-wide scale because of the time-consuming step of reconstructing the phylogenies of each protein to be annotated. Our automated approach for function annotation using phylogeny, the SIFTER (Statistical Inference of Function Through Evolutionary Relationships) methodology, uses a statistical graphical model to compute the probabilities of molecular functions for unannotated proteins. Our benchmark tests showed that SIFTER provides accurate functional predictions on various protein families, outperforming other available methods.

  1. Origin and evolution of laminin gene family diversity.

    Science.gov (United States)

    Fahey, Bryony; Degnan, Bernard M

    2012-07-01

    Laminins are a family of multidomain glycoproteins that are important contributors to the structure of metazoan extracellular matrices. To investigate the origin and evolution of the laminin family, we characterized the full complement of laminin-related genes in the genome of the sponge, Amphimedon queenslandica. As a representative of the Demospongiae, a group consistently placed within the earliest diverging branch of animals by molecular phylogenies, Amphimedon is uniquely placed to provide insight into early steps in the evolution of metazoan gene families. Five Amphimedon laminin-related genes possess the conserved molecular features, and most of the domains found in bilaterian laminins, but all display domain architectures distinct from those of the canonical laminin chain types known from model bilaterians. This finding prompted us to perform a comparative genomic analysis of laminins and related genes from a choanoflagellate and diverse metazoans and to conduct phylogenetic analyses using the conserved Laminin N-terminal domain in order to explore the relationships between genes with distinct architectures. Laminin-like genes appear to have originated in the holozoan lineage (choanoflagellates + metazoans + several other unicellular opisthokont taxa), with several laminin domains originating later and appearing only in metazoan (animal) or eumetazoan (placozoans + ctenophores + cnidarians + bilaterians) laminins. Typical bilaterian α, β, and γ laminin chain forms arose in the eumetazoan stem and another chain type that is conserved in Amphimedon, the cnidarian, Nematostella vectensis, and the echinoderm, Strongylocentrotus purpuratus, appears to have been lost independently from the placozoan, Trichoplax adhaerens, and from multiple bilaterians. Phylogenetic analysis did not clearly reconstruct relationships between the distinct laminin chain types (with the exception of the α chains) but did reveal how several members of the netrin family were

  2. De novo cloning and annotation of genes associated with immunity, detoxification and energy metabolism from the fat body of the oriental fruit fly, Bactrocera dorsalis.

    Science.gov (United States)

    Yang, Wen-Jia; Yuan, Guo-Rui; Cong, Lin; Xie, Yi-Fei; Wang, Jin-Jun

    2014-01-01

    The oriental fruit fly, Bactrocera dorsalis, is a destructive pest in tropical and subtropical areas. In this study, we performed transcriptome-wide analysis of the fat body of B. dorsalis and obtained more than 59 million sequencing reads, which were assembled into 27,787 unigenes with an average length of 591 bp. Among them, 17,442 (62.8%) unigenes matched known proteins in the NCBI database. The assembled sequences were further annotated with gene ontology, cluster of orthologous group terms, and Kyoto encyclopedia of genes and genomes. In depth analysis was performed to identify genes putatively involved in immunity, detoxification, and energy metabolism. Many new genes were identified including serpins, peptidoglycan recognition proteins and defensins, which were potentially linked to immune defense. Many detoxification genes were identified, including cytochrome P450s, glutathione S-transferases and ATP-binding cassette (ABC) transporters. Many new transcripts possibly involved in energy metabolism, including fatty acid desaturases, lipases, alpha amylases, and trehalose-6-phosphate synthases, were identified. Moreover, we randomly selected some genes to examine their expression patterns in different tissues by quantitative real-time PCR, which indicated that some genes exhibited fat body-specific expression in B. dorsalis. The identification of a numerous transcripts in the fat body of B. dorsalis laid the foundation for future studies on the functions of these genes.

  3. De novo cloning and annotation of genes associated with immunity, detoxification and energy metabolism from the fat body of the oriental fruit fly, Bactrocera dorsalis.

    Directory of Open Access Journals (Sweden)

    Wen-Jia Yang

    Full Text Available The oriental fruit fly, Bactrocera dorsalis, is a destructive pest in tropical and subtropical areas. In this study, we performed transcriptome-wide analysis of the fat body of B. dorsalis and obtained more than 59 million sequencing reads, which were assembled into 27,787 unigenes with an average length of 591 bp. Among them, 17,442 (62.8% unigenes matched known proteins in the NCBI database. The assembled sequences were further annotated with gene ontology, cluster of orthologous group terms, and Kyoto encyclopedia of genes and genomes. In depth analysis was performed to identify genes putatively involved in immunity, detoxification, and energy metabolism. Many new genes were identified including serpins, peptidoglycan recognition proteins and defensins, which were potentially linked to immune defense. Many detoxification genes were identified, including cytochrome P450s, glutathione S-transferases and ATP-binding cassette (ABC transporters. Many new transcripts possibly involved in energy metabolism, including fatty acid desaturases, lipases, alpha amylases, and trehalose-6-phosphate synthases, were identified. Moreover, we randomly selected some genes to examine their expression patterns in different tissues by quantitative real-time PCR, which indicated that some genes exhibited fat body-specific expression in B. dorsalis. The identification of a numerous transcripts in the fat body of B. dorsalis laid the foundation for future studies on the functions of these genes.

  4. GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations.

    Science.gov (United States)

    Sangrador-Vegas, Amaia; Mitchell, Alex L; Chang, Hsin-Yu; Yong, Siew-Yit; Finn, Robert D

    2016-01-01

    The removal of annotation from biological databases is often perceived as an indicator of erroneous annotation. As a corollary, annotation stability is considered to be a measure of reliability. However, diverse data-driven events can affect the stability of annotations in both primary protein sequence databases and the protein family databases that are built upon the sequence databases and used to help annotate them. Here, we describe some of these events and their consequences for the InterPro database, and demonstrate that annotation removal or reassignment is not always linked to incorrect annotation by the curator. Database URL: http://www.ebi.ac.uk/interpro.

  5. Gene Expression Divergence and Evolutionary Analysis of the Drosomycin Gene Family in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Deng

    2009-01-01

    Full Text Available Drosomycin (Drs encoding an inducible 44-residue antifungal peptide is clustered with six additional genes, Dro1, Dro2, Dro3, Dro4, Dro5, and Dro6, forming a multigene family on the 3L chromosome arm in Drosophila melanogaster. To get further insight into the regulation of each member of the drosomycin gene family, here we investigated gene expression patterns of this family by either microbe-free injury or microbial challenges using real time RT-PCR. The results indicated that among the seven drosomycin genes, Drs, Dro2, Dro3, Dro4, and Dro5 showed constitutive expressions. Three out of five, Dro2, Dro3, and Dro5, were able to be upregulated by simple injury. Interestingly, Drs is an only gene strongly upregulated when Drosophila was infected with microbes. In contrast to these five genes, Dro1 and Dro6 were not transcribed at all in either noninfected or infected flies. Furthermore, by 5′ rapid amplification of cDNA ends, two transcription start sites were identified in Drs and Dro2, and one in Dro3, Dro4, and Dro5. In addition, NF-κB binding sites were found in promoter regions of Drs, Dro2, Dro3, and Dro5, indicating the importance of NF-κB binding sites for the inducibility of drosomycin genes. Based on the analyses of flanking sequences of each gene in D. melanogaster and phylogenetic relationship of drosomycins in D. melanogaster species-group, we concluded that gene duplications were involved in the formation of the drosomycin gene family. The possible evolutionary fates of drosomycin genes were discussed according to the combining analysis of gene expression pattern, gene structure, and functional divergence of these genes.

  6. Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis

    OpenAIRE

    Ayele, Mulu; Haas, Brian J.; Kumar, Nikhil; Wu, Hank; Xiao, Yongli; Van Aken, Susan; Utterback, Teresa R.; WORTMAN, Jennifer R.; White, Owen R.; Town, Christopher D

    2005-01-01

    Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering 283 Mb (0.44×) of the estimated 650 Mb Brassica genome were searched against the Arabidopsis genome, and conserved Arabidopsis genome sequences (CAGSs) were identified. Of these ...

  7. Evolution of the MAGUK protein gene family in premetazoan lineages

    Directory of Open Access Journals (Sweden)

    Ruiz-Trillo Iñaki

    2010-04-01

    Full Text Available Abstract Background Cell-to-cell communication is a key process in multicellular organisms. In multicellular animals, scaffolding proteins belonging to the family of membrane-associated guanylate kinases (MAGUK are involved in the regulation and formation of cell junctions. These MAGUK proteins were believed to be exclusive to Metazoa. However, a MAGUK gene was recently identified in an EST survey of Capsaspora owczarzaki, an unicellular organism that branches off near the metazoan clade. To further investigate the evolutionary history of MAGUK, we have undertook a broader search for this gene family using available genomic sequences of different opisthokont taxa. Results Our survey and phylogenetic analyses show that MAGUK proteins are present not only in Metazoa, but also in the choanoflagellate Monosiga brevicollis and in the protist Capsaspora owczarzaki. However, MAGUKs are absent from fungi, amoebozoans or any other eukaryote. The repertoire of MAGUKs in Placozoa and eumetazoan taxa (Cnidaria + Bilateria is quite similar, except for one class that is missing in Trichoplax, while Porifera have a simpler MAGUK repertoire. However, Vertebrata have undergone several independent duplications and exhibit two exclusive MAGUK classes. Three different MAGUK types are found in both M. brevicollis and C. owczarzaki: DLG, MPP and MAGI. Furthermore, M. brevicollis has suffered a lineage-specific diversification. Conclusions The diversification of the MAGUK protein gene family occurred, most probably, prior to the divergence between Metazoa+choanoflagellates and the Capsaspora+Ministeria clade. A MAGI-like, a DLG-like, and a MPP-like ancestral genes were already present in the unicellular ancestor of Metazoa, and new gene members have been incorporated through metazoan evolution within two major periods, one before the sponge-eumetazoan split and another within the vertebrate lineage. Moreover, choanoflagellates have suffered an independent MAGUK

  8. Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

    Directory of Open Access Journals (Sweden)

    Waterston Robert H

    2010-02-01

    Full Text Available Abstract Background Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual C. elegans genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (i.e., editing is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours. Results In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions and train a support vector machine (SVM classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at http://starrynite.sourceforge.net. Conclusions We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.

  9. PARK1 gene mutation of autosomal dominant Parkinson's disease family

    Institute of Scientific and Technical Information of China (English)

    Ligang Jiang; Guohua Hu; Qiuhui Chen; Ying Zhang; Xinyu Hu; Jia Fan; Lifeng Liu; Rui Guo; Yajuan Sun; Yixhi Zhang

    2011-01-01

    Studies have shown that PARK1 gene is associated with the autosomal dominant inheritance of Parkinson's disease.PARK1 gene contains two mutation sites, namely Ala30Pro and AIa53Thr, which are located on exons 3 and 4, respectively.However, the genetic loci of the pathogenic genes remain unclear.In this study, blood samples were collected from 11 members of a family with high prevalence of Parkinson's disease, including four affected cases, five suspected cases,and two non-affected cases.Point mutation screening of common mutation sites on PARK1 gene exon 4 was conducted using PCR, to determine the genetic loci of the causative gene for Parkinson's disease.Gene identification and sequencing results showed that a T base deletion mutation was observed in the PARK1 gene exon 4 of all 11 collected samples.It was confirmed that the PARKf gene exon 4 gene mutation is an important pathogenic mutation for Parkinson's disease.

  10. Genome Sequence and Annotation of Colletotrichum higginsianum, a Causal Agent of Crucifer Anthracnose Disease.

    Science.gov (United States)

    Zampounis, Antonios; Pigné, Sandrine; Dallery, Jean-Félix; Wittenberg, Alexander H J; Zhou, Shiguo; Schwartz, David C; Thon, Michael R; O'Connell, Richard J

    2016-08-18

    Colletotrichum higginsianum is an ascomycete fungus causing anthracnose disease on numerous cultivated plants in the family Brassicaceae, as well as the model plant Arabidopsis thaliana We report an assembly of the nuclear genome and gene annotation of this pathogen, which was obtained using a combination of PacBio long-read sequencing and optical mapping. Copyright © 2016 Zampounis et al.

  11. Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression

    Directory of Open Access Journals (Sweden)

    Raherison Elie

    2012-08-01

    Full Text Available Abstract Background Conifers have very large genomes (13 to 30 Gigabases that are mostly uncharacterized although extensive cDNA resources have recently become available. This report presents a global overview of transcriptome variation in a conifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues. Results An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents 23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison of secondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highly conserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robust validation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed from over 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on the microarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues than genes of known or predicted function. Diversity of expression within gene families may be rapidly assessed from PiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA osmotic regulation proteins occur in large gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family, while diverse patterns suggested a greater degree of diversification among LEAs. Conclusion Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource of this kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate genetic studies in the large and important group comprised of conifer trees.

  12. The genetics of alcoholism: identifying specific genes through family studies.

    Science.gov (United States)

    Edenberg, Howard J; Foroud, Tatiana

    2006-09-01

    Alcoholism is a complex disorder with both genetic and environmental risk factors. Studies in humans have begun to elucidate the genetic underpinnings of the risk for alcoholism. Here we briefly review strategies for identifying individual genes in which variations affect the risk for alcoholism and related phenotypes, in the context of one large study that has successfully identified such genes. The Collaborative Study on the Genetics of Alcoholism (COGA) is a family-based study that has collected detailed phenotypic data on individuals in families with multiple alcoholic members. A genome-wide linkage approach led to the identification of chromosomal regions containing genes that influenced alcoholism risk and related phenotypes. Subsequently, single nucleotide polymorphisms (SNPs) were genotyped in positional candidate genes located within the linked chromosomal regions, and analyzed for association with these phenotypes. Using this sequential approach, COGA has detected association with GABRA2, CHRM2 and ADH4; these associations have all been replicated by other researchers. COGA has detected association to additional genes including GABRG3, TAS2R16, SNCA, OPRK1 and PDYN, results that are awaiting confirmation. These successes demonstrate that genes contributing to the risk for alcoholism can be reliably identified using human subjects.

  13. Population- and Family-Based Studies Associate the "MTHFR" Gene with Idiopathic Autism in Simplex Families

    Science.gov (United States)

    Liu, Xudong; Solehdin, Fatima; Cohen, Ira L.; Gonzalez, Maripaz G.; Jenkins, Edmund C.; Lewis, M. E. Suzanne; Holden, Jeanette J. A.

    2011-01-01

    Two methylenetetrahydrofolate reductase gene ("MTHFR") functional polymorphisms were studied in 205 North American simplex (SPX) and 307 multiplex (MPX) families having one or more children with an autism spectrum disorder. Case-control comparisons revealed a significantly higher frequency of the low-activity 677T allele, higher prevalence of the…

  14. Population- and Family-Based Studies Associate the "MTHFR" Gene with Idiopathic Autism in Simplex Families

    Science.gov (United States)

    Liu, Xudong; Solehdin, Fatima; Cohen, Ira L.; Gonzalez, Maripaz G.; Jenkins, Edmund C.; Lewis, M. E. Suzanne; Holden, Jeanette J. A.

    2011-01-01

    Two methylenetetrahydrofolate reductase gene ("MTHFR") functional polymorphisms were studied in 205 North American simplex (SPX) and 307 multiplex (MPX) families having one or more children with an autism spectrum disorder. Case-control comparisons revealed a significantly higher frequency of the low-activity 677T allele, higher prevalence of the…

  15. TMC and EVER genes belong to a larger novel family, the TMC gene family encoding transmembrane proteins

    Directory of Open Access Journals (Sweden)

    Mutai Hideki

    2003-06-01

    Full Text Available Abstract Background Mutations in the transmembrane cochlear expressed gene 1 (TMC1 cause deafness in human and mouse. Mutations in two homologous genes, EVER1 and EVER2 increase the susceptibility to infection with certain human papillomaviruses resulting in high risk of skin carcinoma. Here we report that TMC1, EVER1 and EVER2 (now TMC6 and TMC8 belong to a larger novel gene family, which is named TMC for trans membrane channel-like gene family. Results Using a combination of iterative database searches and reverse transcriptase-polymerase chain reaction (RT-PCR experiments we assembled contigs for cDNA encoding human, murine, puffer fish, and invertebrate TMC proteins. TMC proteins of individual species can be grouped into three subfamilies A, B, and C. Vertebrates have eight TMC genes. The majority of murine TMC transcripts are expressed in most organs; some transcripts, however, in particular the three subfamily A members are rare and more restrictively expressed. Conclusion The eight vertebrate TMC genes are evolutionary conserved and encode proteins that form three subfamilies. Invertebrate TMC proteins can also be categorized into these three subfamilies. All TMC genes encode transmembrane proteins with intracellular amino- and carboxyl-termini and at least eight membrane-spanning domains. We speculate that the TMC proteins constitute a novel group of ion channels, transporters, or modifiers of such.

  16. Exclusive gene mapping of congenital microphthalmia in a Chinese family

    Institute of Scientific and Technical Information of China (English)

    YIN Yanan; LI Hui; YU Ping; ZHOU Qiang; ZHAO Luhang; ZHANG Ya-Ping

    2006-01-01

    Congenital microphthalmia is a developmental ocular disorder and might be caused by the mutations in the genes involved in eye development.To uncover the genetic cause in a six-generation Chinese pedigree with autosomal dominant congenital microphthalmia, we performed genescan and linkage analysis in this family. Fourteen microsatellite markers on chromosomes 3, 11, 14 and 15 were selected as genetic markers according to the five previously reported loci associated with microphthalmia (MITF, SOX2, PAX6, MCOP and NN02). The genomic DNA of each member in the pedigree was amplified with 14 pairs of fluorescence labeled primers. Genome screening and genotyping were conducted on ABI377 DNA sequencer and linkage analysis was performed with Linkage software package. All two-point LOD scores of linkage analysis between the suggested disease genes and microsatellite markers were <-2, which indicated that none of the five genes were responsible for microphthalmia in this Chinese family. Microphthalmia in this family may be caused by mutation in a new gene which is essential in eye development.

  17. A Candida albicans CRISPR system permits genetic engineering of essential genes and gene families.

    Science.gov (United States)

    Vyas, Valmik K; Barrasa, M Inmaculada; Fink, Gerald R

    Candida albicans is a pathogenic yeast that causes mucosal and systematic infections with high mortality. The absence of facile molecular genetics has been a major impediment to analysis of pathogenesis. The lack of meiosis coupled with the absence of plasmids makes genetic engineering cumbersome, especially for essential functions and gene families. We describe a C. albicans CRISPR system that overcomes many of the obstacles to genetic engineering in this organism. The high frequency with which CRISPR-induced mutations can be directed to target genes enables easy isolation of homozygous gene knockouts, even without selection. Moreover, the system permits the creation of strains with mutations in multiple genes, gene families, and genes that encode essential functions. This CRISPR system is also effective in a fresh clinical isolate of undetermined ploidy. Our method transforms the ability to manipulate the genome of Candida and provides a new window into the biology of this pathogen.

  18. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae.

    Science.gov (United States)

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-11-07

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS₁ and trnS₂) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies.

  19. Early evolution of the LIM homeobox gene family

    Energy Technology Data Exchange (ETDEWEB)

    Srivastava, Mansi; Larroux, Claire; Lu, Daniel R; Mohanty, Kareshma; Chapman, Jarrod; Degnan, Bernard M; Rokhsar, Daniel S

    2010-01-01

    LIM homeobox (Lhx) transcription factors are unique to the animal lineage and have patterning roles during embryonic development in flies, nematodes and vertebrates, with a conserved role in specifying neuronal identity. Though genes of this family have been reported in a sponge and a cnidarian, the expression patterns and functions of the Lhx family during development in non-bilaterian phyla are not known. We identified Lhx genes in two cnidarians and a placozoan and report the expression of Lhx genes during embryonic development in Nematostella and the demosponge Amphimedon. Members of the six major LIM homeobox subfamilies are represented in the genomes of the starlet sea anemone, Nematostella vectensis, and the placozoan Trichoplax adhaerens. The hydrozoan cnidarian, Hydra magnipapillata, has retained four of the six Lhx subfamilies, but apparently lost two others. Only three subfamilies are represented in the haplosclerid demosponge Amphimedon queenslandica. A tandem cluster of three Lhx genes of different subfamilies and a gene containing two LIM domains in the genome of T. adhaerens (an animal without any neurons) indicates that Lhx subfamilies were generated by tandem duplication. This tandem cluster in Trichoplax is likely a remnant of the original chromosomal context in which Lhx subfamilies first appeared. Three of the six Trichoplax Lhx genes are expressed in animals in laboratory culture, as are all Lhx genes in Hydra. Expression patterns of Nematostella Lhx genes correlate with neural territories in larval and juvenile polyp stages. In the aneural demosponge, A. queenslandica, the three Lhx genes are expressed widely during development, including in cells that are associated with the larval photosensory ring. The Lhx family expanded and diversified early in animal evolution, with all six subfamilies already diverged prior to the cnidarian-placozoan-bilaterian last common ancestor. In Nematostella, Lhx gene expression is correlated with neural

  20. Early evolution of the LIM homeobox gene family

    Directory of Open Access Journals (Sweden)

    Degnan Bernard M

    2010-01-01

    Full Text Available Abstract Background LIM homeobox (Lhx transcription factors are unique to the animal lineage and have patterning roles during embryonic development in flies, nematodes and vertebrates, with a conserved role in specifying neuronal identity. Though genes of this family have been reported in a sponge and a cnidarian, the expression patterns and functions of the Lhx family during development in non-bilaterian phyla are not known. Results We identified Lhx genes in two cnidarians and a placozoan and report the expression of Lhx genes during embryonic development in Nematostella and the demosponge Amphimedon. Members of the six major LIM homeobox subfamilies are represented in the genomes of the starlet sea anemone, Nematostella vectensis, and the placozoan Trichoplax adhaerens. The hydrozoan cnidarian, Hydra magnipapillata, has retained four of the six Lhx subfamilies, but apparently lost two others. Only three subfamilies are represented in the haplosclerid demosponge Amphimedon queenslandica. A tandem cluster of three Lhx genes of different subfamilies and a gene containing two LIM domains in the genome of T. adhaerens (an animal without any neurons indicates that Lhx subfamilies were generated by tandem duplication. This tandem cluster in Trichoplax is likely a remnant of the original chromosomal context in which Lhx subfamilies first appeared. Three of the six Trichoplax Lhx genes are expressed in animals in laboratory culture, as are all Lhx genes in Hydra. Expression patterns of Nematostella Lhx genes correlate with neural territories in larval and juvenile polyp stages. In the aneural demosponge, A. queenslandica, the three Lhx genes are expressed widely during development, including in cells that are associated with the larval photosensory ring. Conclusions The Lhx family expanded and diversified early in animal evolution, with all six subfamilies already diverged prior to the cnidarian-placozoan-bilaterian last common ancestor. In

  1. Improving pan-genome annotation using whole genome multiple alignment

    Directory of Open Access Journals (Sweden)

    Salzberg Steven L

    2011-06-01

    Full Text Available Abstract Background Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

  2. The WRKY Gene Family in Rice (Oryza sativa)

    Institute of Scientific and Technical Information of China (English)

    Christian A. Ross; Yue Liu; Qingxi J. Shen

    2007-01-01

    WRKYgenes encode transcription factors that are involved in the regulation of various biological processes. These zinc-finger proteins, especially those members mediating stress responses, are uniquely expanded in plants. To facilitate the study of the evolutionary history and functions of this supergene family, we performed an exhaustive search for WRKY genes using HMMER and a Hidden Markov Model that was specifically trained for rice. This work resulted in a comprehensive list of WRKY gene models in Oryza sativa L. ssp. indica and L. ssp. japonica. Mapping of these genes to individual chromosomes facilitated elimination of the redundant, leading to the identification of 98 WRKY genes in japonica and 102 in indica rice. These genes were further categorized according to the number and structure of their zinc-finger domains. Based on a phylogenetic tree of the conserved WRKY domains and the graphic display of WRKY loci on corresponding indica and japonica chromosomes, we identified possible WRKY gene duplications within, and losses between the two closely related rice subspecies. Also reviewed are the roles of WRKY genes in disease resistance and responses to salicylic acid and jasmonic acid, seed development and germination mediated by gibberellins, other developmental processes including senescence, and responses to abiotic stresses and abscisic acid in rice and other plants. The signaling pathways mediating WRKY gene expression are also discussed.

  3. The mammalian PYHIN gene family: Phylogeny, evolution and expression

    Directory of Open Access Journals (Sweden)

    Cridland Jasmyn A

    2012-08-01

    Full Text Available Abstract Background Proteins of the mammalian PYHIN (IFI200/HIN-200 family are involved in defence against infection through recognition of foreign DNA. The family member absent in melanoma 2 (AIM2 binds cytosolic DNA via its HIN domain and initiates inflammasome formation via its pyrin domain. AIM2 lies within a cluster of related genes, many of which are uncharacterised in mouse. To better understand the evolution, orthology and function of these genes, we have documented the range of PYHIN genes present in representative mammalian species, and undertaken phylogenetic and expression analyses. Results No PYHIN genes are evident in non-mammals or monotremes, with a single member found in each of three marsupial genomes. Placental mammals show variable family expansions, from one gene in cow to four in human and 14 in mouse. A single HIN domain appears to have evolved in the common ancestor of marsupials and placental mammals, and duplicated to give rise to three distinct forms (HIN-A, -B and -C in the placental mammal ancestor. Phylogenetic analyses showed that AIM2 HIN-C and pyrin domains clearly diverge from the rest of the family, and it is the only PYHIN protein with orthology across many species. Interestingly, although AIM2 is important in defence against some bacteria and viruses in mice, AIM2 is a pseudogene in cow, sheep, llama, dolphin, dog and elephant. The other 13 mouse genes have arisen by duplication and rearrangement within the lineage, which has allowed some diversification in expression patterns. Conclusions The role of AIM2 in forming the inflammasome is relatively well understood, but molecular interactions of other PYHIN proteins involved in defence against foreign DNA remain to be defined. The non-AIM2 PYHIN protein sequences are very distinct from AIM2, suggesting they vary in effector mechanism in response to foreign DNA, and may bind different DNA structures. The PYHIN family has highly varied gene composition between

  4. Chromosomal evolution of the PKD1 gene family in primates

    Directory of Open Access Journals (Sweden)

    Krawczak Michael

    2008-09-01

    Full Text Available Abstract Background The autosomal dominant polycystic kidney disease (ADPKD is mostly caused by mutations in the PKD1 (polycystic kidney disease 1 gene located in 16p13.3. Moreover, there are six pseudogenes of PKD1 that are located proximal to the master gene in 16p13.1. In contrast, no pseudogene could be detected in the mouse genome, only a single copy gene on chromosome 17. The question arises how the human situation originated phylogenetically. To address this question we applied comparative FISH-mapping of a human PKD1-containing genomic BAC clone and a PKD1-cDNA clone to chromosomes of a variety of primate species and the dog as a non-primate outgroup species. Results Comparative FISH with the PKD1-cDNA clone clearly shows that in all primate species studied distinct single signals map in subtelomeric chromosomal positions orthologous to the short arm of human chromosome 16 harbouring the master PKD1 gene. Only in human and African great apes, but not in orangutan, FISH with both BAC and cDNA clones reveals additional signal clusters located proximal of and clearly separated from the PKD1 master genes indicating the chromosomal position of PKD1 pseudogenes in 16p of these species, respectively. Indeed, this is in accordance with sequencing data in human, chimpanzee and orangutan. Apart from the master PKD1 gene, six pseudogenes are identified in both, human and chimpanzee, while only a single-copy gene is present in the whole-genome sequence of orangutan. The phylogenetic reconstruction of the PKD1-tree reveals that all human pseudogenes are closely related to the human PKD1 gene, and all chimpanzee pseudogenes are closely related to the chimpanzee PKD1 gene. However, our statistical analyses provide strong indication that gene conversion events may have occurred within the PKD1 family members of human and chimpanzee, respectively. Conclusion PKD1 must have undergone amplification very recently in hominid evolution. Duplicative

  5. Massive expansion of the calpain gene family in unicellular eukaryotes

    Directory of Open Access Journals (Sweden)

    Zhao Sen

    2012-09-01

    Full Text Available Abstract Background Calpains are Ca2+-dependent cysteine proteases that participate in a range of crucial cellular processes. Dysfunction of these enzymes may cause, for instance, life-threatening diseases in humans, the loss of sex determination in nematodes and embryo lethality in plants. Although the calpain family is well characterized in animal and plant model organisms, there is a great lack of knowledge about these genes in unicellular eukaryote species (i.e. protists. Here, we study the distribution and evolution of calpain genes in a wide range of eukaryote genomes from major branches in the tree of life. Results Our investigations reveal 24 types of protein domains that are combined with the calpain-specific catalytic domain CysPc. In total we identify 41 different calpain domain architectures, 28 of these domain combinations have not been previously described. Based on our phylogenetic inferences, we propose that at least four calpain variants were established in the early evolution of eukaryotes, most likely before the radiation of all the major supergroups of eukaryotes. Many domains associated with eukaryotic calpain genes can be found among eubacteria or archaebacteria but never in combination with the CysPc domain. Conclusions The analyses presented here show that ancient modules present in prokaryotes, and a few de novo eukaryote domains, have been assembled into many novel domain combinations along the evolutionary history of eukaryotes. Some of the new calpain genes show a narrow distribution in a few branches in the tree of life, likely representing lineage-specific innovations. Hence, the functionally important classical calpain genes found among humans and vertebrates make up only a tiny fraction of the calpain family. In fact, a massive expansion of the calpain family occurred by domain shuffling among unicellular eukaryotes and contributed to a wealth of functionally different genes.

  6. Taxonomic precision of different hypervariable regions of 16S rRNA gene and annotation methods for functional bacterial groups in biological wastewater treatment.

    Directory of Open Access Journals (Sweden)

    Feng Guo

    Full Text Available High throughput sequencing of 16S rRNA gene leads us into a deeper understanding on bacterial diversity for complex environmental samples, but introduces blurring due to the relatively low taxonomic capability of short read. For wastewater treatment plant, only those functional bacterial genera categorized as nutrient remediators, bulk/foaming species, and potential pathogens are significant to biological wastewater treatment and environmental impacts. Precise taxonomic assignment of these bacteria at least at genus level is important for microbial ecological research and routine wastewater treatment monitoring. Therefore, the focus of this study was to evaluate the taxonomic precisions of different ribosomal RNA (rRNA gene hypervariable regions generated from a mix activated sludge sample. In addition, three commonly used classification methods including RDP Classifier, BLAST-based best-hit annotation, and the lowest common ancestor annotation by MEGAN were evaluated by comparing their consistency. Under an unsupervised way, analysis of consistency among different classification methods suggests there are no hypervariable regions with good taxonomic coverage for all genera. Taxonomic assignment based on certain regions of the 16S rRNA genes, e.g. the V1&V2 regions - provide fairly consistent taxonomic assignment for a relatively wide range of genera. Hence, it is recommended to use these regions for studying functional groups in activated sludge. Moreover, the inconsistency among methods also demonstrated that a specific method might not be suitable for identification of some bacterial genera using certain 16S rRNA gene regions. As a general rule, drawing conclusions based only on one sequencing region and one classification method should be avoided due to the potential false negative results.

  7. Mutation Analysis of HTRA2 Gene in Chinese Familial Essential Tremor and Familial Parkinson's Disease.

    Science.gov (United States)

    He, Ya-Chao; Huang, Pei; Li, Qiong-Qiong; Sun, Qian; Li, Dun-Hui; Wang, Tian; Shen, Jun-Yi; Du, Juan-Juan; Cui, Shi-Shuang; Gao, Chao; Fu, Rao; Chen, Sheng-Di

    2017-01-01

    Background. HTRA2 has already been nominated as PARK13 which may cause Parkinson's disease, though there are still discrepancies among these results. Recently, Gulsuner et al.'s study found that HTRA2 p.G399S is responsible for hereditary essential tremor and homozygotes of this allele develop Parkinson's disease by examining a six-generation family segregating essential tremor and essential tremor coexisting with Parkinson's disease. We performed this study to validate the condition of HTRA2 gene in Chinese familial essential tremor and familial Parkinson's disease patients, especially essential tremor. Methods. We directly sequenced all eight exons, exon-intron boundaries, and part of the introns in 101 familial essential tremor patients, 105 familial Parkinson's disease patients, and 100 healthy controls. Results. No exonic variant was identified, while one exon-intron boundary variant (rs2241028) and one intron variant (rs2241027) were detected, both with no clinical significance and uncertain function. There was no difference in allele, genotype, and haplotype between groups. Conclusions. HTRA2 exonic variant might be rare among Chinese Parkinson's disease and essential tremor patients with family history, and HTRA2 may not be the cause of familial Parkinson's disease and essential tremor in China.

  8. Rice Mitogen-activated Protein Kinase Gene Family and Its Role in Biotic and Abiotic Stress Response

    Institute of Scientific and Technical Information of China (English)

    Jai S. Rohila; Yinong Yang

    2007-01-01

    The mitogen-activated protein kinase (MARK) cascade is an important signaling module that transduces extracellular stimuli into intracellular responses in eukaryotic organisms. An increasing body of evidence has shown that the MAPK-mediated cellular signaling is crucial to plant growth and development, as well as biotic and abiotic stress responses. To date, a total of 17 MARK genes have been identified from the rice genome. Expression profiling, biochemical characterization and/or functional analysis were carried out with many members of the rice MARK gene family, especially those associated with biotic and abiotic stress responses. In this review, the phylogenetic relationship and classification of rice MARK genes are discussed to facilitate a simple nomenclature and standard annotation of the rice MARK gene family. Functional data relating to biotic and abiotic stress responses are reviewed for each MARK group and show that despite overlapping in functionality, there is a certain level of functional specificity among different rice MAP kinases. The future challenges are to functionally characterize each MARK, to identify their downstream substrates and upstream kinases, and to genetically manipulate the MARK signaling pathway in rice crops for the improvement of agronomically important traits.

  9. Identification and Expression Analysis of the Barley (Hordeum vulgare L. Aquaporin Gene Family.

    Directory of Open Access Journals (Sweden)

    Runyararo M Hove

    Full Text Available Aquaporins (AQPs are major intrinsic proteins (MIPs that mediate bidirectional flux of water and other substrates across cell membranes, and play critical roles in plant-water relations, dehydration stress responses and crop productivity. However, limited data are available as yet on the contributions of these proteins to the physiology of the major crop barley (Hordeum vulgare. The present work reports the identification and expression analysis of the barley MIP family. A comprehensive search of publicly available leaf mRNA-seq data, draft barley genome data, GenBank transcripts and sixteen new annotations together revealed that the barley MIP family is comprised of at least forty AQPs. Alternative splicing events were likely in two plasma membrane intrinsic protein (PIP AQPs. Analyses of the AQP signature sequences and specificity determining positions indicated a potential of several putative AQP isoforms to transport non-aqua substrates including physiological important substrates, and respond to abiotic stresses. Analysis of our publicly available leaf mRNA-seq data identified notable differential expression of HvPIP1;2 and HvTIP4;1 under salt stress. Analyses of other gene expression resources also confirmed isoform-specific responses in different tissues and/or in response to salinity, as well as some potentially inter-cultivar differences. The work reports systematic and comprehensive analysis of most, if not all, barley AQP genes, their sequences, expression patterns in different tissues, potential transport and stress response functions, and a strong framework for selection and/or development of stress tolerant barley varieties. In addition, the barley data would be highly valuable for genetic studies of the evolutionarily closely related wheat (Triticum aestivum L..

  10. Variation in the RAD51 gene and familial breast cancer

    Science.gov (United States)

    Lose, Felicity; Lovelock, Paul; Chenevix-Trench, Georgia; Mann, Graham J; Pupo, Gulietta M; Spurdle, Amanda B

    2006-01-01

    Introduction Human RAD51 is a homologue of the Escherichia coli RecA protein and is known to function in recombinational repair of double-stranded DNA breaks. Mutations in the lower eukaryotic homologues of RAD51 result in a deficiency in the repair of double-stranded DNA breaks. Loss of RAD51 function would therefore be expected to result in an elevated mutation rate, leading to accumulation of DNA damage and, hence, to increased cancer risk. RAD51 interacts directly or indirectly with a number of proteins implicated in breast cancer, such as BRCA1 and BRCA2. Similar to BRCA1 mice, RAD51-/- mice are embryonic lethal. The RAD51 gene region has been shown to exhibit loss of heterozygosity in breast tumours, and deregulated RAD51 expression in breast cancer patients has also been reported. Few studies have investigated the role of coding region variation in the RAD51 gene in familial breast cancer, with only one coding region variant – exon 6 c.449G>A (p.R150Q) – reported to date. Methods All nine coding exons of the RAD51 gene were analysed for variation in 46 well-characterised, BRCA1/2-negative breast cancer families using denaturing high-performance liquid chromatography. Genotyping of the exon 6 p.R150Q variant was performed in an additional 66 families. Additionally, lymphoblastoid cell lines from breast cancer patients were subjected to single nucleotide primer extension analysis to assess RAD51 expression. Results No coding region variation was found, and all intronic variation detected was either found in unaffected controls or was unlikely to have functional consequences. Single nucleotide primer extension analysis did not reveal any allele-specific changes in RAD51 expression in all lymphoblastoid cell lines tested. Conclusion Our study indicates that RAD51 is not a major familial breast cancer predisposition gene. PMID:16762046

  11. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4).

    Science.gov (United States)

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Tennessen, Kristin; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2016-01-01

    The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provided via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation is followed by functional annotation including assignment of protein product names and connection to various protein family databases.

  12. Differential gene regulation by the SRC family of coactivators

    Institute of Scientific and Technical Information of China (English)

    HuaZhang; XiaYi; Xiaojingsun; NaYin; BinShi; HuijianWu; DanWang; GeWu; YongfengShang

    2005-01-01

    SRCs (steroid receptor coactivatorsl are required for nuclear receptor-mediated transcription and are also implicated in the transcription initiation by other transcription factors, such as STATs and NFKB. Despite phenotypic manifestations in gene knockout mice for SRC-1, GRIP1, and AIB1 of the SRC (Steroid Receptor Coactivator) family indicating their differential roles in animal physiology, there is no clear evidence, at the molecular level, to support a functional specificity for these proteins. We demonstrated in this report that two species of SRC coactivators, either as AIBI:GRIP1 or as AIBI:SRC-1 are recruited, possibly through heterodimerization, on the promoter of genes that contain a classical hormone responsive element (HRE). In contrast, on non-HRE-containing gene promoters, on which steroid receptors bind indirectly, either GRIP1 orSRC-1 is recruited as a monomer, depending on the cellular abundance of the protein. Typically, non-HRE-containing genes are early genes activated by steroid receptors, whereas HRE-containing genes are activated later. Our results also showed that SRC proteins contribute to the temporal regulation of gene transcription. In addition, our experiments revealed a positive correlation between AIB1/c-myc overexpression in ER+ breast carcinoma samples, suggesting a possible mechanism for AIB1/n breast cancer carcinogenesis.

  13. Structural and functional analysis of the GRAS gene family in grapevine indicates a role of GRAS proteins in the control of development and stress responses

    Directory of Open Access Journals (Sweden)

    Jerome eGrimplet

    2016-03-01

    Full Text Available GRAS transcription factors are involved in many processes of plant growth and development (e.g. axillary shoot meristem formation, root radial patterning, nodule morphogenesis, arbuscular development as well as in plant disease resistance and abiotic stress responses. However, little information is available concerning this gene family in grapevine (Vitis vinifera L., an economically important woody crop. We performed a model curation of GRAS genes identified in the latest genome annotation leading to the identification of 52 genes. Gene models were improved and three new genes were identified that could be grapevine- or woody-plant specific. Phylogenetic analysis showed that GRAS genes could be classified into 13 groups that mapped on the 19 Vitis vinifera chromosomes. Five new subfamilies, previously not characterized in other species, were identified. Multiple sequence alignment showed typical GRAS domain in the proteins and new motifs were also described. As observed in other species, both segmental and tandem duplications contributed significantly to the expansion and evolution of the GRAS gene family in grapevine. Expression patterns across a variety of tissues and upon abiotic and biotic conditions revealed possible divergent functions of GRAS genes in grapevine development and stress responses. By comparing the information available for tomato and grapevine GRAS genes, we identified candidate genes that might constitute conserved transcriptional regulators of both climacteric and non-climacteric fruit ripening. Altogether this study provides valuable information and robust candidate genes for future functional analysis aiming at improving the quality of fleshy fruits.

  14. Structural and Functional Analysis of the GRAS Gene Family in Grapevine Indicates a Role of GRAS Proteins in the Control of Development and Stress Responses

    Science.gov (United States)

    Grimplet, Jérôme; Agudelo-Romero, Patricia; Teixeira, Rita T.; Martinez-Zapater, Jose M.; Fortes, Ana M.

    2016-01-01

    GRAS transcription factors are involved in many processes of plant growth and development (e.g., axillary shoot meristem formation, root radial patterning, nodule morphogenesis, arbuscular development) as well as in plant disease resistance and abiotic stress responses. However, little information is available concerning this gene family in grapevine (Vitis vinifera L.), an economically important woody crop. We performed a model curation of GRAS genes identified in the latest genome annotation leading to the identification of 52 genes. Gene models were improved and three new genes were identified that could be grapevine- or woody-plant specific. Phylogenetic analysis showed that GRAS genes could be classified into 13 groups that mapped on the 19 V. vinifera chromosomes. Five new subfamilies, previously not characterized in other species, were identified. Multiple sequence alignment showed typical GRAS domain in the proteins and new motifs were also described. As observed in other species, both segmental and tandem duplications contributed significantly to the expansion and evolution of the GRAS gene family in grapevine. Expression patterns across a variety of tissues and upon abiotic and biotic conditions revealed possible divergent functions of GRAS genes in grapevine development and stress responses. By comparing the information available for tomato and grapevine GRAS genes, we identified candidate genes that might constitute conserved transcriptional regulators of both climacteric and non-climacteric fruit ripening. Altogether this study provides valuable information and robust candidate genes for future functional analysis aiming at improving the quality of fleshy fruits. PMID:27065316

  15. Tomato ABSCISIC ACID STRESS RIPENING (ASR) gene family revisited.

    Science.gov (United States)

    Golan, Ido; Dominguez, Pia Guadalupe; Konrad, Zvia; Shkolnik-Inbar, Doron; Carrari, Fernando; Bar-Zvi, Dudy

    2014-01-01

    Tomato ABSCISIC ACID RIPENING 1 (ASR1) was the first cloned plant ASR gene. ASR orthologs were then cloned from a large number of monocot, dicot and gymnosperm plants, where they are mostly involved in response to abiotic (drought and salinity) stress and fruit ripening. The tomato genome encodes five ASR genes: ASR1, 2, 3 and 5 encode low-molecular-weight proteins (ca. 110 amino acid residues each), whereas ASR4 encodes a 297-residue polypeptide. Information on the expression of the tomato ASR gene family is scarce. We used quantitative RT-PCR to assay the expression of this gene family in plant development and in response to salt and osmotic stresses. ASR1 and ASR4 were the main expressed genes in all tested organs and conditions, whereas ASR2 and ASR3/5 expression was two to three orders of magnitude lower (with the exception of cotyledons). ASR1 is expressed in all plant tissues tested whereas ASR4 expression is limited to photosynthetic organs and stamens. Essentially, ASR1 accounted for most of ASR gene expression in roots, stems and fruits at all developmental stages, whereas ASR4 was the major gene expressed in cotyledons and young and fully developed leaves. Both ASR1 and ASR4 were expressed in flower organs, with ASR1 expression dominating in stamens and pistils, ASR4 in sepals and petals. Steady-state levels of ASR1 and ASR4 were upregulated in plant vegetative organs following exposure to salt stress, osmotic stress or the plant abiotic stress hormone abscisic acid (ABA). Tomato plants overexpressing ASR1 displayed enhanced survival rates under conditions of water stress, whereas ASR1-antisense plants displayed marginal hypersensitivity to water withholding.

  16. Tomato ABSCISIC ACID STRESS RIPENING (ASR gene family revisited.

    Directory of Open Access Journals (Sweden)

    Ido Golan

    Full Text Available Tomato ABSCISIC ACID RIPENING 1 (ASR1 was the first cloned plant ASR gene. ASR orthologs were then cloned from a large number of monocot, dicot and gymnosperm plants, where they are mostly involved in response to abiotic (drought and salinity stress and fruit ripening. The tomato genome encodes five ASR genes: ASR1, 2, 3 and 5 encode low-molecular-weight proteins (ca. 110 amino acid residues each, whereas ASR4 encodes a 297-residue polypeptide. Information on the expression of the tomato ASR gene family is scarce. We used quantitative RT-PCR to assay the expression of this gene family in plant development and in response to salt and osmotic stresses. ASR1 and ASR4 were the main expressed genes in all tested organs and conditions, whereas ASR2 and ASR3/5 expression was two to three orders of magnitude lower (with the exception of cotyledons. ASR1 is expressed in all plant tissues tested whereas ASR4 expression is limited to photosynthetic organs and stamens. Essentially, ASR1 accounted for most of ASR gene expression in roots, stems and fruits at all developmental stages, whereas ASR4 was the major gene expressed in cotyledons and young and fully developed leaves. Both ASR1 and ASR4 were expressed in flower organs, with ASR1 expression dominating in stamens and pistils, ASR4 in sepals and petals. Steady-state levels of ASR1 and ASR4 were upregulated in plant vegetative organs following exposure to salt stress, osmotic stress or the plant abiotic stress hormone abscisic acid (ABA. Tomato plants overexpressing ASR1 displayed enhanced survival rates under conditions of water stress, whereas ASR1-antisense plants displayed marginal hypersensitivity to water withholding.

  17. Approaches to Working with Children, Young People and Families for Traveller, Irish Traveller, Gypsy, Roma and Show People Communities. Annotated Bibliography for the Children's Workforce Development Council

    Science.gov (United States)

    Robinson, Mark; Martin, Kerry; Wilkin, Carol

    2008-01-01

    This annoted bibliography relays a range of issues and approaches to working with Travellers, Irish Travellers, Gypsies, Roma and Show People. This is an accompanying document to the literature review report, ED501860.

  18. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    Science.gov (United States)

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-01-01

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula. PMID:27049397

  19. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Wei Li

    2016-04-01

    Full Text Available Mitogen‐activated protein kinase kinase kinase (MAPKKK is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome‐wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high‐throughput sequencing‐data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA‐seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome‐wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula.

  20. PRODH gene is associated with executive function in schizophrenic families.

    Science.gov (United States)

    Li, Tao; Ma, Xiaohong; Hu, Xun; Wang, Yingcheng; Yan, Chengying; Meng, Huaqing; Liu, Xiehe; Toulopoulou, Timothea; Murray, Robin M; Collier, David A

    2008-07-05

    The aim of this study was to investigate the relationship between polymorphisms in the PRODH and COMT genes and selected neurocognitive functions. Six SNPs in PRODH and two SNPs in COMT were genotyped in 167 first-episode schizophrenic families who had been assessed by a set of 14 neuropsychological tests. Neuropsychological measures were selected as quantitative traits for association analysis. The haplotype of SNPs PRODH 1945T/C and PRODH 1852G/A was associated with impaired performance on the Tower of Hanoi, a problem-solving task mainly reflecting planning capacity. There was no significant evidence for association with any other neuropsychological traits for other SNPs or haplotypes of paired SNPs in the two genes. This study takes previous findings of association between PRODH and schizophrenia further by associating variation within the gene with performance on a neurocognitive trait characteristic of the illness. It fails to confirm previous reports of an association between COMT and cognitive function.

  1. Biofuel Potential of Plants Transformed Genetically With NAC Family Genes

    Directory of Open Access Journals (Sweden)

    Sadhana eSingh

    2016-01-01

    Full Text Available NAC genes contribute to enhance survivability of plants under conditions of environmental stress and in secondary growth of the plants, thereby building biomass. Thus, genetic transformation of plants using NAC genes provides a possibility to tailor made biofuel plants. Over-expression studies have indicated that NAC family genes can provide tolerance to various biotic and abiotic stresses, either by physiological or biochemical changes at the cellular level, or by affecting visible morphological and anatomical changes, for example by development of lateral roots in a number of plants. Over-expression of these genes also work as triggers for development of secondary cell walls. In our laboratory, we have observed a NAC gene from Lepidium latifolium contributing to both enhanced biomass as well as cold stress tolerance of model plants tobacco. Thus, we have reviewed all the developments of genetic engineering using NAC genes which could enhance the traits required for biofuel plants, either by enhancing the stress tolerance or by enhancing the biomass of the plants. KeywordsNAC, Genetically engineered plants, Abiotic stress tolerance, Secondary growth, Cell wall synthesis, Biomass

  2. Gene turnover in the avian globin gene families and evolutionary changes in hemoglobin isoform expression.

    Science.gov (United States)

    Opazo, Juan C; Hoffmann, Federico G; Natarajan, Chandrasekhar; Witt, Christopher C; Berenbrink, Michael; Storz, Jay F

    2015-04-01

    The apparent stasis in the evolution of avian chromosomes suggests that birds may have experienced relatively low rates of gene gain and loss in multigene families. To investigate this possibility and to explore the phenotypic consequences of variation in gene copy number, we examined evolutionary changes in the families of genes that encode the α- and β-type subunits of hemoglobin (Hb), the tetrameric α2β2 protein responsible for blood-O2 transport. A comparative genomic analysis of 52 bird species revealed that the size and membership composition of the α- and β-globin gene families have remained remarkably constant during approximately 100 My of avian evolution. Most interspecific variation in gene content is attributable to multiple independent inactivations of the α(D)-globin gene, which encodes the α-chain subunit of a functionally distinct Hb isoform (HbD) that is expressed in both embryonic and definitive erythrocytes. Due to consistent differences in O2-binding properties between HbD and the major adult-expressed Hb isoform, HbA (which incorporates products of the α(A)-globin gene), recurrent losses of α(D)-globin contribute to among-species variation in blood-O2 affinity. Analysis of HbA/HbD expression levels in the red blood cells of 122 bird species revealed high variability among lineages and strong phylogenetic signal. In comparison with the homologous gene clusters in mammals, the low retention rate for lineage-specific gene duplicates in the avian globin gene clusters suggests that the developmental regulation of Hb synthesis in birds may be more highly conserved, with orthologous genes having similar stage-specific expression profiles and similar functional properties in disparate taxa.

  3. Evolutionary history of chordate PAX genes: dynamics of change in a complex gene family.

    Directory of Open Access Journals (Sweden)

    Vanessa Rodrigues Paixão-Côrtes

    Full Text Available Paired box (PAX genes are transcription factors that play important roles in embryonic development. Although the PAX gene family occurs in animals only, it is widely distributed. Among the vertebrates, its 9 genes appear to be the product of complete duplication of an original set of 4 genes, followed by an additional partial duplication. Although some studies of PAX genes have been conducted, no comprehensive survey of these genes across the entire taxonomic unit has yet been attempted. In this study, we conducted a detailed comparison of PAX sequences from 188 chordates, which revealed restricted variation. The absence of PAX4 and PAX8 among some species of reptiles and birds was notable; however, all 9 genes were present in all 74 mammalian genomes investigated. A search for signatures of selection indicated that all genes are subject to purifying selection, with a possible constraint relaxation in PAX4, PAX7, and PAX8. This result indicates asymmetric evolution of PAX family genes, which can be associated with the emergence of adaptive novelties in the chordate evolutionary trajectory.

  4. Evolutionary characterization of pig interferon-inducible transmembrane gene family and member expression dynamics in tracheobronchial lymph nodes of pigs infected with swine respiratory disease viruses.

    Science.gov (United States)

    Miller, Laura C; Jiang, Zhihua; Sang, Yongming; Harhay, Gregory P; Lager, Kelly M

    2014-06-15

    Studies have found that a cluster of duplicated gene loci encoding the interferon-inducible transmembrane proteins (IFITMs) family have antiviral activity against several viruses, including influenza A virus. The gene family has 5 and 7 members in humans and mice, respectively. Here, we confirm the current annotation of pig IFITM1, IFITM2, IFITM3, IFITM5, IFITM1L1 and IFITM1L4, manually annotated IFITM1L2, IFITM1L3, IFITM5L, IFITM3L1 and IFITM3L2, and provide expressed sequence tag (EST) and/or mRNA evidence, not contained with the NCBI Reference Sequence database (RefSeq), for the existence of IFITM6, IFITM7 and a new IFITM1-like (IFITM1LN) gene in pigs. Phylogenic analyses showed seven porcine IFITM genes with highly conserved human/mouse orthologs known to have anti-viral activity. Digital Gene Expression Tag Profiling (DGETP) of swine tracheobronchial lymph nodes (TBLN) of pigs infected with swine influenza virus (SIV), porcine pseudorabies virus, porcine reproductive and respiratory syndrome virus or porcine circovirus type 2 over 14 days post-inoculation (dpi) showed that gene expression abundance differs dramatically among pig IFITM family members, ranging from 0 to over 3000 tags per million. In particular, SIV up-regulated IFITM1 by 5.9 fold at 3 dpi. Bayesian framework further identified pig IFITM1 and IFITM3 as differentially expressed genes in the overall transcriptome analysis. In addition to being a component of protein complexes involved in homotypic adhesion, the IFITM1 is also associated with pathways related to regulation of cell proliferation and IFITM3 is involved in immune responses.

  5. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    Science.gov (United States)

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  6. Manual annotation, transcriptional analysis, and protein expression studies reveal novel genes in the agl cluster responsible for N glycosylation in the halophilic archaeon Haloferax volcanii.

    Science.gov (United States)

    Yurist-Doutsch, Sophie; Eichler, Jerry

    2009-05-01

    While Eukarya, Bacteria, and Archaea are all capable of protein N glycosylation, the archaeal version of this posttranslational modification is the least understood. To redress this imbalance, recent studies of the halophilic archaeon Haloferax volcanii have identified a gene cluster encoding the Agl proteins involved in the assembly and attachment of a pentasaccharide to select Asn residues of the surface layer glycoprotein in this species. However, because the automated tools used for rapid annotation of genome sequences, including that of H. volcanii, are not always accurate, a reannotation of the agl cluster was undertaken in order to discover genes not previously recognized. In the present report, reanalysis of the gene cluster that includes aglB, aglE, aglF, aglG, aglI, and aglJ, which are known components of the H. volcanii protein N-glycosylation machinery, was undertaken. Using computer-based tools or visual inspection, together with transcriptional analysis and protein expression approaches, genes encoding AglP, AglQ, and AglR are now described.

  7. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Directory of Open Access Journals (Sweden)

    Bibby Kyle

    2011-03-01

    Full Text Available Abstract Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology (KO identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  8. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Science.gov (United States)

    2011-01-01

    Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:21401935

  9. BRCA1 Gene Mutations in Chinese Families with Breast Cancer

    Institute of Scientific and Technical Information of China (English)

    Yurong Shi; Chenbin Li; Ruifang Niu; Xishan Hao; Xiangcheng Zhi; Liansheng Ning

    2005-01-01

    OBJECTIVE To investigate the frequency of BRCA1 gene mutations in breast cancer families in China.METHODS Genomic DNA was obtained by conventional techniques from the peripheral blood mononuclear cells collected from 94 persons derived from 45 breast cancer families. All participants gave written informed consent. The mutations in the BRCA1 gene were detected by the polymerase chain reaction and single stranded conformation polymorphism(PCR-SSCP). Then , the samples of interest were sent for direct DNA sequencing.RESULTS No mutation sites were found in exon 2 or 20 by DNA sequencing.Eight sites were found in exon 11 such as 2201C>T (Ser694Ser),3232A>G(Glu 1038Gly), 2201C >A/G (Ser694Arg), 2731C >T (Pro871Leu),2086A >T(Asn591lle) and three sites of 1584G>T (Glu424Stop). Three mutation sites were found in exon 16 which included 5106A >G (Met1663Val),5208delT(Stop 1639) and 4956A>G (Ser 1613Gly).CONCLUSION These mutation sites may be related to breast cancer, but more investigation is needed to determine whether the mutation sites are hot spots of mutations in Chinese familial breast cancer patients.

  10. Family genetic algorithms based on gene exchange and its application

    Institute of Scientific and Technical Information of China (English)

    Li Jianhua; Ding Xiangqian; Wang Sunan; Yu Qing

    2006-01-01

    Genetic Algorithms (GA) are a search techniques based on mechanics of nature selection and have already been successfully applied in many diverse areas. However, increasing samples show that GA's performance is not as good as it was expected to be. Criticism of this algorithm includes the slow speed and premature result during convergence procedure. In order to improve the performance, the population size and individuals' space is emphatically described. The influence of individuals' space and population size on the operators is analyzed. And a novel family genetic algorithm (FGA) is put forward based on this analysis. In this novel algorithm, the optimum solution families closed to quality individuals is constructed, which is exchanged found by a search in the world space. Search will be done in this microspace. The family that can search better genes in a limited period of time would win a new life. At the same time, the best gene of this micro space with the basic population in the world space is exchanged. Finally, the FGA is applied to the function optimization and image matching through several experiments. The results show that the FGA possessed high performance.

  11. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  12. Multiple inter-kingdom horizontal gene transfers in the evolution of the phosphoenolpyruvate carboxylase gene family.

    Directory of Open Access Journals (Sweden)

    Yingmei Peng

    Full Text Available Pepcase is a gene encoding phosphoenolpyruvate carboxylase that exists in bacteria, archaea and plants,playing an important role in plant metabolism and development. Most plants have two or more pepcase genes belonging to two gene sub-families, while only one gene exists in other organisms. Previous research categorized one plant pepcase gene as plant-type pepcase (PTPC while the other as bacteria-type pepcase (BTPC because of its similarity with the pepcase gene found in bacteria. Phylogenetic reconstruction showed that PTPC is the ancestral lineage of plant pepcase, and that all bacteria, protistpepcase and BTPC in plants are derived from a lineage of pepcase closely related with PTPC in algae. However, their phylogeny contradicts the species tree and traditional chronology of organism evolution. Because the diversification of bacteria occurred much earlier than the origin of plants, presumably all bacterialpepcase derived from the ancestral PTPC of algal plants after divergingfrom the ancestor of vascular plant PTPC. To solve this contradiction, we reconstructed the phylogeny of pepcase gene family. Our result showed that both PTPC and BTPC are derived from an ancestral lineage of gamma-proteobacteriapepcases, possibly via an ancient inter-kingdom horizontal gene transfer (HGT from bacteria to the eukaryotic common ancestor of plants, protists and cellular slime mold. Our phylogenetic analysis also found 48other pepcase genes originated from inter-kingdom HGTs. These results imply that inter-kingdom HGTs played important roles in the evolution of the pepcase gene family and furthermore that HGTsare a more frequent evolutionary event than previouslythought.

  13. Annotation of Differential Gene Expression in Small Yellow Follicles of a Broiler-Type Strain of Taiwan Country Chickens in Response to Acute Heat Stress.

    Science.gov (United States)

    Cheng, Chuen-Yu; Tu, Wei-Lin; Wang, Shih-Han; Tang, Pin-Chi; Chen, Chih-Feng; Chen, Hsin-Hsin; Lee, Yen-Pai; Chen, Shuen-Ei; Huang, San-Yuan

    2015-01-01

    This study investigated global gene expression in the small yellow follicles (6-8 mm diameter) of broiler-type B strain Taiwan country chickens (TCCs) in response to acute heat stress. Twelve 30-wk-old TCC hens were divided into four groups: control hens maintained at 25°C and hens subjected to 38°C acute heat stress for 2 h without recovery (H2R0), with 2-h recovery (H2R2), and with 6-h recovery (H2R6). Small yellow follicles were collected for RNA isolation and microarray analysis at the end of each time point. Results showed that 69, 51, and 76 genes were upregulated and 58, 15, 56 genes were downregulated after heat treatment of H2R0, H2R2, and H2R6, respectively, using a cutoff value of two-fold or higher. Gene ontology analysis revealed that these differentially expressed genes are associated with the biological processes of cell communication, developmental process, protein metabolic process, immune system process, and response to stimuli. Upregulation of heat shock protein 25, interleukin 6, metallopeptidase 1, and metalloproteinase 13, and downregulation of type II alpha 1 collagen, discoidin domain receptor tyrosine kinase 2, and Kruppel-like factor 2 suggested that acute heat stress induces proteolytic disintegration of the structural matrix and inflamed damage and adaptive responses of gene expression in the follicle cells. These suggestions were validated through gene expression, using quantitative real-time polymerase chain reaction. Functional annotation clarified that interleukin 6-related pathways play a critical role in regulating acute heat stress responses in the small yellow follicles of TCC hens.

  14. A novel mutation of KCNQ3 gene in a Chinese family with benign familial neonatal convulsions.

    Science.gov (United States)

    Li, Haiyan; Li, Nan; Shen, Lu; Jiang, Hong; Yang, Qian; Song, Yanmin; Guo, Jifeng; Xia, Kun; Pan, Qian; Tang, Beisha

    2008-03-01

    Benign familial neonatal convulsions (BFNC, also named benign familial neonatal seizures, BFNS) is a rare autosomal dominant inherited epilepsy syndrome with clinical and genetic heterogeneity. Two voltage-gated potassium channel subunit genes, KCNQ2 and KCNQ3, have been identified to cause BFNC1 and BFNC2, respectively. To date, only three mutations of KCNQ3, all located within exon 5, have been reported. By limited linkage analysis and mutation analysis of KCNQ3 in a Chinese family with BFNC, we identified a novel missense mutation of KCNQ3, c.988C>T located within exon 6. c.988C>T led to the substitution Cys for Arg in amino acid position 330 (p.R330C) in KCNQ3 potassium channel, which possibly impaired the neuronal M-current and altered neuronal excitability. Seizures of all BFNC patients started from day 2 to 3 after birth and remitted during 1 month, and no recurrence was found. One family member who displayed fever-associated seizures for two times at age 5 years and was diagnosed as febrile seizures, however, did not carry this mutation, which suggests that febrile seizures and BFNC have different pathogenesis. To our knowledge, this is the first report of KCNQ3 mutation in Chinese family with BFNC.

  15. A transcriptomic analysis of striped catfish (Pangasianodon hypophthalmus) in response to salinity adaptation: De novo assembly, gene annotation and marker discovery.

    Science.gov (United States)

    Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter

    2014-06-01

    The striped catfish (Pangasianodon hypophthalmus) culture industry in the Mekong Delta in Vietnam has developed rapidly over the past decade. The culture industry now however, faces some significant challenges, especially related to climate change impacts notably from predicted extensive saltwater intrusion into many low topographical coastal provinces across the Mekong Delta. This problem highlights a need for development of culture stocks that can tolerate more saline culture environments as a response to expansion of saline water-intruded land. While a traditional artificial selection program can potentially address this need, understanding the genomic basis of salinity tolerance can assist development of more productive culture lines. The current study applied a transcriptomic approach using Ion PGM technology to generate expressed sequence tag (EST) resources from the intestine and swim bladder from striped catfish reared at a salinity level of 9ppt which showed best growth performance. Total sequence data generated was 467.8Mbp, consisting of 4,116,424 reads with an average length of 112bp. De novo assembly was employed that generated 51,188 contigs, and allowed identification of 16,116 putative genes based on the GenBank non-redundant database. GO annotation, KEGG pathway mapping, and functional annotation of the EST sequences recovered with a wide diversity of biological functions and processes. In addition, more than 11,600 simple sequence repeats were also detected. This is the first comprehensive analysis of a striped catfish transcriptome, and provides a valuable genomic resource for future selective breeding programs and functional or evolutionary studies of genes that influence salinity tolerance in this important culture species.

  16. The ANKH gene and familial calcium pyrophosphate dihydrate deposition disease.

    Science.gov (United States)

    Netter, Patrick; Bardin, Thomas; Bianchi, Arnaud; Richette, Pascal; Loeuille, Damien

    2004-09-01

    Familial calcium pyrophosphate dihydrate deposition (CPPD) disease is a chronic condition in which CPPD microcrystals deposit in the joint fluid, cartilage, and periarticular tissues. Two forms of familial CPPD disease have been identified: CCAL1 and CCAL2. The CCAL1 locus is located on the long arm of chromosome 8 and is associated with CPPD and severe osteoarthritis. The CCAL2 locus has been mapped to the short arm of chromosome 5 and identified in families from the Alsace region of France and the United Kingdom. The ANKH protein is involved in pyrophosphate metabolism and, more specifically, in pyrophosphate transport from the intracellular to the extracellular compartment. Numerous ANKH gene mutations cause familial CCAL2; they enhance ANKH protein activity, thereby elevating extracellular pyrophosphate levels and promoting the formation of pyrophosphate crystals, which produce the manifestations of the disease. Recent studies show that growth factors and cytokines can modify the expression of the normal ANKH protein. These results suggest a role for ANKH in sporadic CPPD disease and in CPPD associated with degenerative disease.

  17. Amelogenesis Imperfecta: 1 Family, 2 Phenotypes, and 2 Mutated Genes.

    Science.gov (United States)

    Prasad, M K; Laouina, S; El Alloussi, M; Dollfus, H; Bloch-Zupan, A

    2016-12-01

    Amelogenesis imperfecta (AI) is a clinically and genetically heterogeneous group of diseases characterized by enamel defects. The authors have identified a large consanguineous Moroccan family segregating different clinical subtypes of hypoplastic and hypomineralized AI in different individuals within the family. Using targeted next-generation sequencing, the authors identified a novel heterozygous nonsense mutation in COL17A1 (c.1873C>T, p.R625*) segregating with hypoplastic AI and a novel homozygous 8-bp deletion in C4orf26 (c.39_46del, p.Cys14Glyfs*18) segregating with hypomineralized-hypoplastic AI in this family. This study highlights the phenotypic and genotypic heterogeneity of AI that can exist even within a single consanguineous family. Furthermore, the identification of novel mutations in COL17A1 and C4orf26 and their correlation with distinct AI phenotypes can contribute to a better understanding of the pathophysiology of AI and the contribution of these genes to amelogenesis.

  18. Familial adenomatous polyposis associated APC gene mutation - A case study

    Directory of Open Access Journals (Sweden)

    Avinash Bardia1, Santosh K. Tiwari1, Sandeep K. Vishwakarma1, Md. Aejaz Habeeb1, Pratibha Nallari2, Aleem A. Khan1

    2013-08-01

    Full Text Available Familial adenomatous polyposis (FAP is an autosomal dominant condition characterized by diffuse intestinal polyposis, specific gene mutation, and predisposition for developing colon cancer. Left untreated, patients with FAP will develop colorectal carcinoma during early adulthood. Hence, early detection and surgical intervention are of the utmost importance. Colectomy is required and may include an ileal pouch with ileo-anal anastomosis, which eli-minates the colon and rectal disease while preserving fecal continence and avoidance of a permanent ileostomy. We report a case of colorectal cancer along with FAP showed features consistent with adenomatous polyposis coli and no evidence of malignancy was seen after the surgery.

  19. Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

    Directory of Open Access Journals (Sweden)

    Casey R Richardson

    Full Text Available BACKGROUND: MicroRNAs (miRNAs and trans-acting small-interfering RNAs (tasi-RNAs are small (20-22 nt long RNAs (smRNAs generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery. PRINCIPAL FINDINGS: We explored rice (Oryza sativa sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes. CONCLUSIONS: Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non

  20. Management of asymptomatic gene carriers of transthyretin familial amyloid polyneuropathy.

    Science.gov (United States)

    Schmidt, Hartmut H-J; Barroso, Fabio; González-Duarte, Alejandra; Conceição, Isabel; Obici, Laura; Keohane, Denis; Amass, Leslie

    2016-09-01

    Transthyretin familial amyloid polyneuropathy (TTR-FAP) is a rare, severe, and irreversible, adult-onset, hereditary disorder caused by autosomal-dominant mutations in the TTR gene that increase the intrinsic propensity of transthyretin protein to misfold and deposit systemically as insoluble amyloid fibrils in nerve tissues, the heart, and other organs. TTR-FAP is characterized by relentless, progressively debilitating polyneuropathy, and leads to death, on average, within 10 years of symptom onset without treatment. With increased availability of disease-modifying treatment options for a wider spectrum of patients with TTR-FAP, timely detection of the disease may offer substantial clinical benefits. This review discusses mutation-specific predictive genetic testing in first-degree relatives of index patients diagnosed with TTR-FAP and the structured clinical follow-up of asymptomatic gene carriers for prompt diagnosis and early therapeutic intervention before accumulation of substantial damage. Muscle Nerve 54: 353-360, 2016.

  1. Functional genomics tools applied to plant metabolism: a survey on plant respiration, its connections and the annotation of complex gene functions

    Directory of Open Access Journals (Sweden)

    Wagner L. Araújo

    2012-09-01

    Full Text Available The application of post-genomic techniques in plant respiration studies has greatly improved our ability to assign functions to gene products. In addition it has also revealed previously unappreciated interactions between distal elements of metabolism. Such results have reinforced the need to consider plant respiratory metabolism as part of a complex network and making sense of such interactions will ultimately require the construction of predictive and mechanistic models. Transcriptomics, proteomics, metabolomics and the quantification of metabolic flux will be of great value in creating such models both by facilitating the annotation of complex gene function, determining their structure and by furnishing the quantitative data required to test them. In this review we highlight how these experimental approaches have contributed to our current understanding of plant respiratory metabolism and its interplay with associated process (e.g. photosynthesis, photorespiration and nitrogen metabolism. We also discuss how data from these techniques may be integrated, with the ultimate aim of identifying mechanisms that control and regulate plant respiration and discovering novel gene functions with potential biotechnological implications.

  2. Repeated evolution of chimeric fusion genes in the β-globin gene family of laurasiatherian mammals.

    Science.gov (United States)

    Gaudry, Michael J; Storz, Jay F; Butts, Gary Tyler; Campbell, Kevin L; Hoffmann, Federico G

    2014-05-09

    The evolutionary fate of chimeric fusion genes may be strongly influenced by their recombinational mode of origin and the nature of functional divergence between the parental genes. In the β-globin gene family of placental mammals, the two postnatally expressed δ- and β-globin genes (HBD and HBB, respectively) have a propensity for recombinational exchange via gene conversion and unequal crossing-over. In the latter case, there are good reasons to expect differences in retention rates for the reciprocal HBB/HBD and HBD/HBB fusion genes due to thalassemia pathologies associated with the HBD/HBB "Lepore" deletion mutant in humans. Here, we report a comparative genomic analysis of the mammalian β-globin gene cluster, which revealed that chimeric HBB/HBD fusion genes originated independently in four separate lineages of laurasiatherian mammals: Eulipotyphlans (shrews, moles, and hedgehogs), carnivores, microchiropteran bats, and cetaceans. In cases where an independently derived "anti-Lepore" duplication mutant has become fixed, the parental HBD and/or HBB genes have typically been inactivated or deleted, so that the newly created HBB/HBD fusion gene is primarily responsible for synthesizing the β-type subunits of adult and fetal hemoglobin (Hb). Contrary to conventional wisdom that the HBD gene is a vestigial relict that is typically inactivated or expressed at negligible levels, we show that HBD-like genes often encode a substantial fraction (20-100%) of β-chain Hbs in laurasiatherian taxa. Our results indicate that the ascendancy or resuscitation of genes with HBD-like coding sequence requires the secondary acquisition of HBB-like promoter sequence via unequal crossing-over or interparalog gene conversion.

  3. Molecular Evolution of the TET Gene Family in Mammals

    Directory of Open Access Journals (Sweden)

    Hiromichi Akahori

    2015-12-01

    Full Text Available Ten-eleven translocation (TET proteins, a family of Fe2+- and 2-oxoglutarate-dependent dioxygenases, are involved in DNA demethylation. They also help regulate various cellular functions. Three TET paralogs have been identified (TET1, TET2, and TET3 in humans. This study focuses on the evolution of mammalian TET genes. Distinct patterns in TET1 and TET2 vs. TET3 were revealed by codon-based tests of positive selection. Results indicate that TET1 and TET2 genes have experienced positive selection more frequently than TET3 gene, and that the majority of codon sites evolved under strong negative selection. These findings imply that the selective pressure on TET3 may have been relaxed in several lineages during the course of evolution. Our analysis of convergent amino acid substitutions also supports the different evolutionary dynamics among TET gene subfamily members. All of the five amino acid sites that are inferred to have evolved under positive selection in the catalytic domain of TET2 are localized at the protein’s outer surface. The adaptive changes of these positively selected amino acid sites could be associated with dynamic interactions between other TET-interacting proteins, and positive selection thus appears to shift the regulatory scheme of TET enzyme function.

  4. An atlas of bovine gene expression reveals novel distinctive tissue characteristics and evidence for improving genome annotation

    Science.gov (United States)

    Background A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines. Results The Bovine Gene...

  5. Identiifcation of novel genes and optimization of annotated genes in foxtail millet by RNA-Seq technology%基于RNA-Seq技术的谷子新基因发掘及基因结构优化

    Institute of Scientific and Technical Information of China (English)

    穆彩琴; 张瑞娟; 屈聪玲; 韩渊怀; 王兴春; 杨致荣

    2016-01-01

    尽管谷子(Setaria italica)全基因组序列图谱已经公布,但其基因注释很不完善。为此,本文应用RNA-Seq技术开展了谷子新基因发掘和已注释基因结构优化工作。以‘晋谷21’谷子叶片为材料提取总RNA,构建测序文库并利用Illumina HiSeq 2500测序平台进行双端测序,最终获得37072949条高质量的干净读段(clean reads)。将其进一步与‘豫谷1号’谷子参考基因组进行序列比对,鉴定出614个新基因。在此基础上,利用COG、GO、KEGG、Swiss-Prot和NR等数据库对其进行了功能注释,获得了438个新基因的注释信息。此外,还优化了7175个已注释基因的结构,延伸了4330个基因的5′端和5362个基因的3′端。本研究旨在为后续谷子功能基因组学研究和其他生物基因组注释信息的完善提供有益的借鉴。%Although the reference genome of foxtail millet has been released, the gene annotation is not opti-mized. Therefore, we carried out the experiment for the identiifcation of novel genes and the structural optimi-zation of the annotated genes in foxtail millet by RNA sequencing (RNA-Seq) technology. The total RNA was isolated from the leaves of ‘Jingu 21’, and used for sequencing library construction. The library was paired-end sequenced using the Illumina HiSeq 2500 sequencing platform and, ifnally, 37 072 949 high quality clean reads were obtained. To identify novel genes and optimize the annotated gene structure, the clean reads were further aligned with ‘Yugu 1’ reference genome. A total of 614 novel genes were identiifed and 438 genes among them were annotated using COG, GO, KEGG, Swiss-Prot and NR databases. In addition, 7 175 gene structures were optimized, and 4 330 of 5′ ends and 5 362 of 3′ ends were extended. This study can provide useful references for future functional genomics research in foxtail millet and for optimization of the genome annotation in other related organisms.

  6. Babesia bovis expresses Bbo-6cys-E, a member of a novel gene family that is homologous to the 6-cys family of Plasmodium

    Science.gov (United States)

    A novel Babesia bovis gene family encoding proteins with similarities to the Plasmodium 6cys protein family was identified by TBLASTN searches of the Babesia bovis genome using the sequence of the P. falciparum PFS230 protein as query, and was termed Bbo-6cys gene family. The Bbo-cys6 gene family co...

  7. The roles of gene duplication, gene conversion and positive selection in rodent Esp and Mup pheromone gene families with comparison to the Abp family.

    Science.gov (United States)

    Karn, Robert C; Laukaitis, Christina M

    2012-01-01

    Three proteinaceous pheromone families, the androgen-binding proteins (ABPs), the exocrine-gland secreting peptides (ESPs) and the major urinary proteins (MUPs) are encoded by large gene families in the genomes of Mus musculus and Rattus norvegicus. We studied the evolutionary histories of the Mup and Esp genes and compared them with what is known about the Abp genes. Apparently gene conversion has played little if any role in the expansion of the mouse Class A and Class B Mup genes and pseudogenes, and the rat Mups. By contrast, we found evidence of extensive gene conversion in many Esp genes although not in all of them. Our studies of selection identified at least two amino acid sites in β-sheets as having evolved under positive selection in the mouse Class A and Class B MUPs and in rat MUPs. We show that selection may have acted on the ESPs by determining K(a)/K(s) for Exon 3 sequences with and without the converted sequence segment. While it appears that purifying selection acted on the ESP signal peptides, the secreted portions of the ESPs probably have undergone much more rapid evolution. When the inner gene converted fragment sequences were removed, eleven Esp paralogs were present in two or more pairs with K(a)/K(s) >1.0 and thus we propose that positive selection is detectable by this means in at least some mouse Esp paralogs. We compare and contrast the evolutionary histories of all three mouse pheromone gene families in light of their proposed functions in mouse communication.

  8. Differential roles of TGIF family genes in mammalian reproduction

    Directory of Open Access Journals (Sweden)

    Renfree Marilyn B

    2011-09-01

    Full Text Available Abstract Background TG-interacting factors (TGIFs belong to a family of TALE-homeodomain proteins including TGIF1, TGIF2 and TGIFLX/Y in human. Both TGIF1 and TGIF2 act as transcription factors repressing TGF-β signalling. Human TGIFLX and its orthologue, Tex1 in the mouse, are X-linked genes that are only expressed in the adult testis. TGIF2 arose from TGIF1 by duplication, whereas TGIFLX arose by retrotransposition to the X-chromosome. These genes have not been characterised in any non-eutherian mammals. We therefore studied the TGIF family in the tammar wallaby (a marsupial mammal to investigate their roles in reproduction and how and when these genes may have evolved their functions and chromosomal locations. Results Both TGIF1 and TGIF2 were present in the tammar genome on autosomes but TGIFLX was absent. Tammar TGIF1 shared a similar expression pattern during embryogenesis, sexual differentiation and in adult tissues to that of TGIF1 in eutherian mammals, suggesting it has been functionally conserved. Tammar TGIF2 was ubiquitously expressed throughout early development as in the human and mouse, but in the adult, it was expressed only in the gonads and spleen, more like the expression pattern of human TGIFLX and mouse Tex1. Tammar TGIF2 mRNA was specifically detected in round and elongated spermatids. There was no mRNA detected in mature spermatozoa. TGIF2 protein was specifically located in the cytoplasm of spermatids, and in the residual body and the mid-piece of the mature sperm tail. These data suggest that tammar TGIF2 may participate in spermiogenesis, like TGIFLX does in eutherians. TGIF2 was detected for the first time in the ovary with mRNA produced in the granulosa and theca cells, suggesting it may also play a role in folliculogenesis. Conclusions The restricted and very similar expression of tammar TGIF2 to X-linked paralogues in eutherians suggests that the evolution of TGIF1, TGIF2 and TGIFLX in eutherians was accompanied by

  9. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery.

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata

  10. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  11. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal Matoq Saeed

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  12. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    Science.gov (United States)

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  13. BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments.

    Science.gov (United States)

    Al-Shahrour, Fátima; Minguez, Pablo; Vaquerizas, Juan M; Conde, Lucía; Dopazo, Joaquín

    2005-07-01

    We present Babelomics, a complete suite of web tools for the functional analysis of groups of genes in high-throughput experiments, which includes the use of information on Gene Ontology terms, interpro motifs, KEGG pathways, Swiss-Prot keywords, analysis of predicted transcription factor binding sites, chromosomal positions and presence in tissues with determined histological characteristics, through five integrated modules: FatiGO (fast assignment and transference of information), FatiWise, transcription factor association test, GenomeGO and tissues mining tool, respectively. Additionally, another module, FatiScan, provides a new procedure that integrates biological information in combination with experimental results in order to find groups of genes with modest but coordinate significant differential behaviour. FatiScan is highly sensitive and is capable of finding significant asymmetries in the distribution of genes of common function across a list of ordered genes even if these asymmetries were not extreme. The strong multiple-testing nature of the contrasts made by the tools is taken into account. All the tools are integrated in the gene expression analysis package GEPAS. Babelomics is the natural evolution of our tool FatiGO (which analysed almost 22,000 experiments during the last year) to include more sources on information and new modes of using it. Babelomics can be found at http://www.babelomics.org.

  14. Genome-Wide Analysis of the Expression of WRKY Family Genes in Different Developmental Stages of Wild Strawberry (Fragaria vesca Fruit.

    Directory of Open Access Journals (Sweden)

    Heying Zhou

    Full Text Available WRKY proteins play important regulatory roles in plant developmental processes such as senescence, trichome initiation and embryo morphogenesis. In strawberry, only FaWRKY1 (Fragaria × ananassa has been characterized, leaving numerous WRKY genes to be identified and their function characterized. The publication of the draft genome sequence of the strawberry genome allowed us to conduct a genome-wide search for WRKY proteins in Fragaria vesca, and to compare the identified proteins with their homologs in model plants. Fifty-nine FvWRKY genes were identified and annotated from the F. vesca genome. Detailed analysis, including gene classification, annotation, phylogenetic evaluation, conserved motif determination and expression profiling, based on RNA-seq data, were performed on all members of the family. Additionally, the expression patterns of the WRKY genes in different fruit developmental stages were further investigated using qRT-PCR, to provide a foundation for further comparative genomics and functional studies of this important class of transcriptional regulators in strawberry.

  15. The Keratin 6 gene family. La familia de genes de la queratina 6; Caracterizacion y regulacion

    Energy Technology Data Exchange (ETDEWEB)

    Navarro Espinel, J.M. (Universidad Complutense de Madrid. Dept. Biologia (Spain))

    1992-01-01

    Cytokeratins are a family of ca. 30 proteins that are expressed exclusively in epithelial cells, where they constitute the intermediate filaments cytoskeleton. Keratin 6 is expressed in some tissues (tongue, esophagus, foot sole epidermis, etc.), as well as in the suprabasal layers of epidermis under hyperproliferative stimuli, such as tpa, wound healing, etc. In addition, it is expressed in most cultured epidermal cells lines. We have found that there are three different genes coding for similar-but not identical-k6 polypeptides in the cow. We have used CAT assays, gel retardation and footprinting techniques to analyze the promoter of one of the genes in several cell lines and have found two elements implicated in the regulation of this gene. One of them is a AP1-like site and the other seems to be a retinoic-acid responsive element. Implications of these findings for the regulation of the K6 gene are discussed. (author).250 refs, 48 figs.

  16. The SLEEPER genes: a transposase-derived angiosperm-specific gene family

    Directory of Open Access Journals (Sweden)

    Knip Marijn

    2012-10-01

    Full Text Available Abstract Background DAYSLEEPER encodes a domesticated transposase from the hAT-superfamily, which is essential for development in Arabidopsis thaliana. Little is known about the presence of DAYSLEEPER orthologs in other species, or how and when it was domesticated. We studied the presence of DAYSLEEPER orthologs in plants and propose a model for the domestication of the ancestral DAYSLEEPER gene in angiosperms. Results Using specific BLAST searches in genomic and EST libraries, we found that DAYSLEEPER-like genes (hereafter called SLEEPER genes are unique to angiosperms. Basal angiosperms as well as grasses (Poaceae and dicotyledonous plants possess such putative orthologous genes, but SLEEPER-family genes were not found in gymnosperms, mosses and algae. Most species contain more than one SLEEPER gene. All SLEEPERs contain a C2H2 type BED-zinc finger domain and a hATC dimerization domain. We designated 3 motifs, partly overlapping the BED-zinc finger and dimerization domain, which are hallmark features in the SLEEPER family. Although SLEEPER genes are structurally conserved between species, constructs with SLEEPER genes from grapevine and rice did not complement the daysleeper phenotype in Arabidopsis, when expressed under control of the DAYSLEEPER promoter. However these constructs did cause a dominant phenotype when expressed in Arabidopsis. Rice plant lines with an insertion in the RICESLEEPER1 or 2 locus displayed phenotypic abnormalities, indicating that these genes are functional and important for normal development in rice. We suggest a model in which we hypothesize that an ancestral hAT transposase was retrocopied and stably integrated in the genome during early angiosperm evolution. Evidence is also presented for more recent retroposition events of SLEEPER genes, such as an event in the rice genome, which gave rise to the RICESLEEPER1 and 2 genes. Conclusions We propose the ancestral SLEEPER gene was formed after a process of retro

  17. Unravelling MADS-box gene family in Eucalyptus spp.: a starting point to an understanding of their developmental role in trees

    Directory of Open Access Journals (Sweden)

    Beatriz Fonseca de Oliveira Dias

    2005-01-01

    Full Text Available MADS-box genes encode a family of transcription factors which control diverse developmental processes in flowering plants ranging from root to flower and fruit development. Members of the MADS-box gene family share a highly conserved sequence of approximately 180 nucleotides that encodes a DNA-binding domain. We used bioinformatics tools to investigate the information generated by the Eucalyptus Expressed Sequence Tag (FORESTs genome project in order to identify and annotate MADS-box genes. The comparative phylogenetic analysis of the Eucalyptus MADS-box genes with Arabidopsis homologues allowed us to group them into one of the well-known subfamilies. Trends in gene expression of these putative Eucalyptus MADS-box genes were investigated by hierarchical clustering analysis. Among 24 MADS-box genes identified by our analysis, 12 are expressed in vegetative organs. Out of these, five are expressed predominately in wood. Understanding of the molecular mechanisms performed by MADS-box proteins underlying Eucalyptus growth, development and stress reactions would provide important insights into tree development and could reveal means by which tree characteristics could be modified for the improvement of industrial properties.

  18. Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data.

    Science.gov (United States)

    Lohse, Marc; Nagel, Axel; Herter, Thomas; May, Patrick; Schroda, Michael; Zrenner, Rita; Tohge, Takayuki; Fernie, Alisdair R; Stitt, Mark; Usadel, Björn

    2014-05-01

    Next-generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan 'BIN' ontology, which is tailored for functional annotation of plant 'omics' data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan-to-GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator.

  19. Augmented annotation and orthologue analysis for Oryctolagus cuniculus: Better Bunny

    National Research Council Canada - National Science Library

    Craig, Douglas B; Kannan, Sujatha; Dombkowski, Alan A

    2012-01-01

    .... Using data extracted from several public bioinformatics repositories we created Better Bunny, a database and query tool that extensively augments the available functional annotation for rabbit genes...

  20. A Genome-Wide Analysis Reveals Stress and Hormone Responsive Patterns of TIFY Family Genes in Brassica rapa.

    Science.gov (United States)

    Saha, Gopal; Park, Jong-In; Kayum, Md Abdul; Nou, Ill-Sup

    2016-01-01

    The TIFY family is a plant-specific group of proteins with a diversity of functions and includes four subfamilies, viz. ZML, TIFY, PPD, and JASMONATE ZIM-domain (JAZ) proteins. TIFY family members, particularly JAZ subfamily proteins, play roles in biological processes such as development and stress and hormone responses in Arabidopsis, rice, chickpea, and grape. However, there is no information about this family in any Brassica crop. This study identifies 36 TIFY genes in Brassica rapa, an economically important crop species in the Brassicaceae. An extensive in silico analysis of phylogenetic grouping, protein motif organization and intron-exon distribution confirmed that there are four subfamilies of BrTIFY proteins. Out of 36 BrTIFY genes, we identified 21 in the JAZ subfamily, seven in the TIFY subfamily, six in ZML and two in PPD. Extensive expression profiling of 21 BrTIFY JAZs in various tissues, especially in floral organs and at different flower growth stages revealed constitutive expression patterns, which suggest that BrTIFY JAZ genes are important during growth and development of B. rapa flowers. A protein interaction network analysis also pointed to association of these proteins with fertility and defense processes of B. rapa. Using a low temperature-treated whole-genome microarray data set, most of the JAZ genes were found to have variable transcript abundance between the contrasting inbred lines Chiifu and Kenshin of B. rapa. Subsequently, the expression of all 21 BrTIFY JAZs in response to cold stress was characterized in the same two lines via qPCR, demonstrating that nine genes were up-regulated. Importantly, the BrTIFY JAZs showed strong and differential expression upon JA treatment, pointing to their probable involvement in JA-mediated growth regulatory functions, especially during flower development and stress responses. Additionally, BrTIFY JAZs were induced in response to salt, drought, Fusarium, ABA, and SA treatments, and six genes (BrTIFY3

  1. Molecular evolution of the polyamine oxidase gene family in Metazoa

    Directory of Open Access Journals (Sweden)

    Polticelli Fabio

    2012-06-01

    Full Text Available Abstract Background Polyamine oxidase enzymes catalyze the oxidation of polyamines and acetylpolyamines. Since polyamines are basic regulators of cell growth and proliferation, their homeostasis is crucial for cell life. Members of the polyamine oxidase gene family have been identified in a wide variety of animals, including vertebrates, arthropodes, nematodes, placozoa, as well as in plants and fungi. Polyamine oxidases (PAOs from yeast can oxidize spermine, N1-acetylspermine, and N1-acetylspermidine, however, in vertebrates two different enzymes, namely spermine oxidase (SMO and acetylpolyamine oxidase (APAO, specifically catalyze the oxidation of spermine, and N1-acetylspermine/N1-acetylspermidine, respectively. Little is known about the molecular evolutionary history of these enzymes. However, since the yeast PAO is able to catalyze the oxidation of both acetylated and non acetylated polyamines, and in vertebrates these functions are addressed by two specialized polyamine oxidase subfamilies (APAO and SMO, it can be hypothesized an ancestral reference for the former enzyme from which the latter would have been derived. Results We analysed 36 SMO, 26 APAO, and 14 PAO homologue protein sequences from 54 taxa including various vertebrates and invertebrates. The analysis of the full-length sequences and the principal domains of vertebrate and invertebrate PAOs yielded consensus primary protein sequences for vertebrate SMOs and APAOs, and invertebrate PAOs. This analysis, coupled to molecular modeling techniques, also unveiled sequence regions that confer specific structural and functional properties, including substrate specificity, by the different PAO subfamilies. Molecular phylogenetic trees revealed a basal position of all the invertebrates PAO enzymes relative to vertebrate SMOs and APAOs. PAOs from insects constitute a monophyletic clade. Two PAO variants sampled in the amphioxus are basal to the dichotomy between two well supported

  2. Gene recruitment--a common mechanism in the evolution of transfer RNA gene families.

    Science.gov (United States)

    Wang, Xiujuan; Lavrov, Dennis V

    2011-04-01

    The evolution of alloacceptor transfer RNAs (tRNAs) has been traditionally thought to occur vertically and reflect the evolution of the genetic code. Yet there have been several indications that a tRNA gene could evolve horizontally, from a copy of an alloacceptor tRNA gene in the same genome. Earlier, we provided the first unambiguous evidence for the occurrence of such "tRNA gene recruitment" in nature--in the mitochondrial (mt) genome of the demosponge Axinella corrugata. Yet the extent and the pattern of this process in the evolution of tRNA gene families remained unclear. Here we analyzed tRNA genes from 21 mt genomes of demosponges as well as nuclear genomes of rhesus macaque, chimpanzee and human. We found four new cases of alloacceptor tRNA gene recruitment in mt genomes and eleven cases in the nuclear genomes. In most of these cases we observed a single nucleotide substitution at the middle position of the anticodon, which resulted in the change of not only the tRNA's amino-acid identity but also the class of the amino-acyl tRNA synthetases (aaRSs) involved in amino-acylation. We hypothesize that the switch to a different class of aaRSs may have prevented the conflict between anticodon and amino-acid identities of recruited tRNAs. Overall our results suggest that gene recruitment is a common phenomenon in tRNA multigene family evolution and should be taken into consideration when tRNA evolutionary history is reconstructed.

  3. Concept annotation in the CRAFT corpus

    Directory of Open Access Journals (Sweden)

    Bada Michael

    2012-07-01

    Full Text Available Abstract Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released. Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens, our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection, the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are

  4. Multiple lineage specific expansions within the guanylyl cyclase gene family

    Directory of Open Access Journals (Sweden)

    O'Halloran Damien M

    2006-03-01

    , which have occurred within the GC gene family during metazoan evolution. Our phylogenetic analyses reveal that the rGC and sGC multi-domain proteins evolved early in eumetazoan evolution. Subsequent gene duplications, tissue specific expression patterns and lineage specific expansions resulted in the evolution of new networks of interaction and new biological functions associated with the maintenance of organismal complexity and homeostasis.

  5. Evolution of the multifaceted eukaryotic akirin gene family

    Directory of Open Access Journals (Sweden)

    Johnston Ian A

    2009-02-01

    Full Text Available Abstract Background Akirins are nuclear proteins that form part of an innate immune response pathway conserved in Drosophila and mice. This studies aim was to characterise the evolution of akirin gene structure and protein function in the eukaryotes. Results akirin genes are present throughout the metazoa and arose before the separation of animal, plant and fungi lineages. Using comprehensive phylogenetic analysis, coupled with comparisons of conserved synteny and genomic organisation, we show that the intron-exon structure of metazoan akirin genes was established prior to the bilateria and that a single proto-orthologue duplicated in the vertebrates, before the gnathostome-agnathan separation, producing akirin1 and akirin2. Phylogenetic analyses of seven vertebrate gene families with members in chromosomal proximity to both akirin1 and akirin2 were compatible with a common duplication event affecting the genomic neighbourhood of the akirin proto-orthologue. A further duplication of akirins occurred in the teleost lineage and was followed by lineage-specific patterns of paralogue loss. Remarkably, akirins have been independently characterised by five research groups under different aliases and a comparison of the available literature revealed diverse functions, generally in regulating gene expression. For example, akirin was characterised in arthropods as subolesin, an important growth factor and in Drosophila as bhringi, which has an essential myogenic role. In vertebrates, akirin1 was named mighty in mice and was shown to regulate myogenesis, whereas akirin2 was characterised as FBI1 in rats and promoted carcinogenesis, acting as a transcriptional repressor when bound to a 14-3-3 protein. Both vertebrate Akirins have evolved under comparably strict constraints of purifying selection, although a likelihood ratio test predicted that functional divergence has occurred between paralogues. Bayesian and maximum likelihood tests identified amino

  6. A Comprehensive Family-Based Replication Study of Schizophrenia Genes

    Science.gov (United States)

    Aberg, Karolina A.; Liu, Youfang; Bukszár, Jozsef; McClay, Joseph L.; Khachane, Amit N.; Andreassen, Ole A.; Blackwood, Douglas; Corvin, Aiden; Djurovic, Srdjan; Gurling, Hugh; Ophoff, Roel; Pato, Carlos N.; Pato, Michele T.; Riley, Brien; Webb, Todd; Kendler, Kenneth; O’Donovan, Mick; Craddock, Nick; Kirov, George; Owen, Mike; Rujescu, Dan; St Clair, David; Werge, Thomas; Hultman, Christina M.; Delisi, Lynn E.; Sullivan, Patrick; van den Oord, Edwin J.

    2017-01-01

    Importance Schizophrenia (SCZ) is a devastating psychiatric condition. Identifying the specific genetic variants and pathways that increase susceptibility to SCZ is critical to improve disease understanding and address the urgent need for new drug targets. Objective To identify SCZ susceptibility genes. Design We integrated results from a meta-analysis of 18 genome-wide association studies (GWAS) involving 1 085 772 single-nucleotide polymorphisms (SNPs) and 6 databases that showed significant informativeness for SCZ. The 9380 most promising SNPs were then specifically genotyped in an independent family-based replication study that, after quality control, consisted of 8107 SNPs. Setting Linkage meta-analysis, brain transcriptome meta-analysis, candidate gene database, OMIM, relevant mouse studies, and expression quantitative trait locus databases. Patients We included 11 185 cases and 10 768 control subjects from 6 databases and, after quality control 6298 individuals (including 3286 cases) from 1811 nuclear families. Main Outcomes and Measures Case-control status for SCZ. Results Replication results showed a highly significant enrichment of SNPs with small P values. Of the SNPs with replication values of P<.01, the proportion of SNPs that had the same direction of effects as in the GWAS meta-analysis was 89% in the combined ancestry group (sign test, P<2.20×10−16) and 93% in subjects of European ancestry only (P<2.20×10−16). Our results supported the major histocompatibility complex region showing a 3.7-fold overall enrichment of replication values of P<.01 in subjects from European ancestry. We replicated SNPs in TCF4 (P=2.53×10−10) and NOTCH4 (P=3.16×10−7) that are among the most robust SCZ findings. More novel findings included POM121L2 (P=3.51×10−7), AS3MT (P=9.01×10−7), CNNM2 (P=6.07×10−7), and NT5C2 (P=4.09×10−7). To explore the many small effects, we performed pathway analyses. The most significant pathways involved neuronal function

  7. Angiotensin converting enzyme gene polymorphism in familial hypertrophic cardiomyopathy patients

    Energy Technology Data Exchange (ETDEWEB)

    Yu, B; Peric, S.; Ross, D. [Royal Prince Alfred Hospital, Campertown (Australia)] [and others

    1994-09-01

    An insertion/deletion (I/D) polymorphism of the angiotensin I converting enzyme (ACE) gene is a useful predictor of human plasma ACE levels. ACE levels tend to be lowest in subjects with ACE genotype DD and intermediate in subjects with ACE genotype ID. Angiotensin II (Ang II) as a product of ACE is a cardiac growth factor and produces a marked hypertrophy of the chick myocyte in cell culture. Rat experiments also suggest that a small dose of ACE inhibitor that does not affect the afterload results in prevention or regression of cardiac hypertrophy. In order to study the relationship of ACE and the severity of hypertrophy, the ACE genotype has been determined in 28 patients with a clinical diagnosis of familial hypertrophic cardiomyopathy (FHC) and 51 normal subjects. The respective frequencies of I and D alleles were: 0.52 and 0.48 (in FHC patients) and 0.44 and 0.56 (in the normal controls). There was no significant difference in the allele frequencies between FHC and normal subjects ({chi}{sup 2}=0.023, p>0.05). The II, ID, and DD genotypes were present in 7, 15, and 6 FHC patients, respectively. The averages of maximal thickness of the interventricular septum measured by echocardiography or at autopsy were 18 {plus_minus}3, 19{plus_minus}4, and 19{plus_minus}3 mm in II, ID and DD genotypes, respectively. The ACE gene polymorphism did not correlate with the severity of left ventricular hypertrophy in FHC patients (r{sub s}=0.231, p>0.05). These results do not necessarily exclude the possible effect of Ang II on the hypertrophy since the latter may be produced through the action of chymase in the human ventricles. However, ACE gene polymorphism is not a useful predictor of the severity of myocardial hypertrophy in FHC patients.

  8. MicroSyn: A user friendly tool for detection of microsynteny in a gene family

    Directory of Open Access Journals (Sweden)

    Yang Xiaohan

    2011-03-01

    Full Text Available Abstract Background The traditional phylogeny analysis within gene family is mainly based on DNA or amino acid sequence homologies. However, these phylogenetic tree analyses are not suitable for those "non-traditional" gene families like microRNA with very short sequences. For the normal protein-coding gene families, low bootstrap values are frequently encountered in some nodes, suggesting low confidence or likely inappropriateness of placement of those members in those nodes. Results We introduce MicroSyn software as a means of detecting microsynteny in adjacent genomic regions surrounding genes in gene families. MicroSyn searches for conserved, flanking colinear homologous gene pairs between two genomic fragments to determine the relationship between two members in a gene family. The colinearity of homologous pairs is controlled by a statistical distance function. As a result, gene duplication history can be inferred from the output independent of gene sequences. MicroSyn was designed for both experienced and non-expert users with a user-friendly graphical-user interface. MicroSyn is available from: http://fcsb.njau.edu.cn/microsyn/. Conclusions Case studies of the microRNA167 genes in plants and Xyloglucan ndotransglycosylase/Hydrolase family in Populus trichocarpa were presented to show the utility of the software. The easy using of MicroSyn in these examples suggests that the software is an additional valuable means to address the problem intrinsic in the computational methods and sequence qualities themselves in gene family analysis.

  9. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  10. Genome-Wide Characterization and Expression Profiles of the Superoxide Dismutase Gene Family in Gossypium

    Directory of Open Access Journals (Sweden)

    Jingbo Zhang

    2016-01-01

    Full Text Available Superoxide dismutase (SOD as a group of significant and ubiquitous enzymes plays a critical function in plant growth and development. Previously this gene family has been investigated in Arabidopsis and rice; it has not yet been characterized in cotton. In our study, it was the first time for us to perform a genome-wide analysis of SOD gene family in cotton. Our results showed that 10 genes of SOD gene family were identified in Gossypium arboreum and Gossypium raimondii, including 6 Cu-Zn-SODs, 2 Fe-SODs, and 2 Mn-SODs. The chromosomal distribution analysis revealed that SOD genes are distributed across 7 chromosomes in Gossypium arboreum and 8 chromosomes in Gossypium raimondii. Segmental duplication is predominant duplication event and major contributor for expansion of SOD gene family. Gene structure and protein structure analysis showed that SOD genes have conserved exon/intron arrangement and motif composition. Microarray-based expression analysis revealed that SOD genes have important function in abiotic stress. Moreover, the tissue-specific expression profile reveals the functional divergence of SOD genes in different organs development of cotton. Taken together, this study has imparted new insights into the putative functions of SOD gene family in cotton. Findings of the present investigation could help in understanding the role of SOD gene family in various aspects of the life cycle of cotton.

  11. Dynamic multimedia annotation tool

    Science.gov (United States)

    Pfund, Thomas; Marchand-Maillet, Stephane

    2001-12-01

    Annotating image collections is crucial for different multimedia applications. Not only this provides an alternative access to visual information but it is a critical step to perform the evaluation of content-based image retrieval systems. Annotation is a tedious task so that there is a real need for developing tools that lighten the work of annotators. The tool should be flexible and offer customization so as to make the annotator the most comfortable. It should also automate the most tasks as possible. In this paper, we present a still image annotation tool that has been developed with the aim of being flexible and adaptive. The principle is to create a set of dynamic web pages that are an interface to a SQL database. The keyword set is fixed and every image receives from concurrent annotators a set of keywords along with time stamps and annotator Ids. Each annotator has the possibility of going back and forth within the collection and its previous annotations. He is helped by a number of search services and customization options. An administrative section allows the supervisor to control the parameter of the annotation, including the keyword set, given via an XML structure. The architecture of the tool is made flexible so as to accommodate further options through its development.

  12. A Comprehensive Catalog of Human KRAB-associated Zinc Finger Genes: Insights into the Evolutionary History of a Large Family of Transcriptional Repressors

    Energy Technology Data Exchange (ETDEWEB)

    Huntley, S; Baggott, D M; Hamilton, A T; Tran-Gyamfi, M; Yang, S; Kim, J; Gordon, L; Branscomb, E; Stubbs, L

    2005-09-30

    Krueppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotic species. In mammals, most ZNF proteins comprise a single class of transcriptional repressors in which a chromatin interaction domain, called the Krueppel-associated box (KRAB) is attached to a tandem array of DNA-binding zinc-finger motifs. KRAB-ZNF loci are specific to tetrapod vertebrates, but have expanded dramatically in numbers through repeated rounds of segmental duplication to create a gene family with hundreds of members in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the human genome for key motifs and used them to construct and manually curate gene models. The resulting KRAB-ZNF gene catalog includes 326 known genes, 243 of which were structurally corrected by manual annotation, and 97 novel KRAB-ZNF genes; this single family therefore comprises 20% of all predicted human transcription factor genes. Many of the genes are alternatively spliced, yielding a total of 743 distinct predicted proteins. Although many human KRAB-ZNF genes are conserved in mammals, at least 136 and potentially more than 200 genes of this type are primate-specific including many recent segmental duplicates. KRAB-ZNF genes are active in a wide variety of human tissues suggesting roles in many key biological processes, but most member genes remain completely uncharacterized. Because of their sheer numbers, wide-ranging tissue-specific expression patterns, and remarkable evolutionary divergence we predict that KRAB-ZNF transcription factors have played critical roles in crafting many aspects of human biology, including both deeply conserved and primate-specific traits.

  13. Annotation and retrieval in protein interaction databases

    Science.gov (United States)

    Cannataro, Mario; Hiram Guzzi, Pietro; Veltri, Pierangelo

    2014-06-01

    Biological databases have been developed with a special focus on the efficient retrieval of single records or the efficient computation of specialized bioinformatics algorithms against the overall database, such as in sequence alignment. The continuos production of biological knowledge spread on several biological databases and ontologies, such as Gene Ontology, and the availability of efficient techniques to handle such knowledge, such as annotation and semantic similarity measures, enable the development on novel bioinformatics applications that explicitly use and integrate such knowledge. After introducing the annotation process and the main semantic similarity measures, this paper shows how annotations and semantic similarity can be exploited to improve the extraction and analysis of biologically relevant data from protein interaction databases. As case studies, the paper presents two novel software tools, OntoPIN and CytoSeVis, both based on the use of Gene Ontology annotations, for the advanced querying of protein interaction databases and for the enhanced visualization of protein interaction networks.

  14. Molecular characterization of edestin gene family in Cannabis sativa L.

    Science.gov (United States)

    Docimo, Teresa; Caruso, Immacolata; Ponzoni, Elena; Mattana, Monica; Galasso, Incoronata

    2014-11-01

    Globulins are the predominant class of seed storage proteins in a wide variety of plants. In many plant species globulins are present in several isoforms encoded by gene families. The major seed storage protein of Cannabis sativa L. is the globulin edestin, widely known for its nutritional potential. In this work, we report the isolation of seven cDNAs encoding for edestin from the C. sativa variety Carmagnola. Southern blot hybridization is in agreement with the number of identified edestin genes. All seven sequences showed the characteristic globulin features, but they result to be divergent members/forms of two edestin types. According to their sequence similarity four forms named CsEde1A, CsEde1B, CsEde1C, CsEde1D have been assigned to the edestin type 1 and the three forms CsEde2A, CsEde2B, CsEde2C to the edestin type 2. Analysis of the coding sequences revealed a high percentage of similarity (98-99%) among the different forms belonging to the same type, which decreased significantly to approximately 64% between the forms belonging to different types. Quantitative RT-PCR analysis revealed that both edestin types are expressed in developing hemp seeds and the amount of CsEde1 was 4.44 ± 0.10 higher than CsEde2. Both edestin types exhibited a high percentage of arginine (11-12%), but CsEde2 resulted particularly rich in methionine residues (2.36%) respect to CsEde1 (0.82%). The amino acid composition determined in CsEde1 and CsEde2 types suggests that these seed proteins can be used to improve the nutritional quality of plant food-stuffs.

  15. Ubiquitous Annotation Systems

    DEFF Research Database (Denmark)

    Hansen, Frank Allan

    2006-01-01

    Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...

  16. Genome-wide analysis reveals diverged patterns of codon bias, gene expression, and rates of sequence evolution in picea gene families.

    Science.gov (United States)

    De La Torre, Amanda R; Lin, Yao-Cheng; Van de Peer, Yves; Ingvarsson, Pär K

    2015-03-05

    The recent sequencing of several gymnosperm genomes has greatly facilitated studying the evolution of their genes and gene families. In this study, we examine the evidence for expression-mediated selection in the first two fully sequenced representatives of the gymnosperm plant clade (Picea abies and Picea glauca). We use genome-wide estimates of gene expression (>50,000 expressed genes) to study the relationship between gene expression, codon bias, rates of sequence divergence, protein length, and gene duplication. We found that gene expression is correlated with rates of sequence divergence and codon bias, suggesting that natural selection is acting on Picea protein-coding genes for translational efficiency. Gene expression, rates of sequence divergence, and codon bias are correlated with the size of gene families, with large multicopy gene families having, on average, a lower expression level and breadth, lower codon bias, and higher rates of sequence divergence than single-copy gene families. Tissue-specific patterns of gene expression were more common in large gene families with large gene expression divergence than in single-copy families. Recent family expansions combined with large gene expression variation in paralogs and increased rates of sequence evolution suggest that some Picea gene families are rapidly evolving to cope with biotic and abiotic stress. Our study highlights the importance of gene expression and natural selection in shaping the evolution of protein-coding genes in Picea species, and sets the ground for further studies investigating the evolution of individual gene families in gymnosperms.

  17. Mutation screening of mismatch repair gene Mlh3 in familial esophageal cancer

    Institute of Scientific and Technical Information of China (English)

    Hong-Xu Liu; Yu Li; Xue-Dong Jiang; Hong-Nian Yin; Lin Zhang; Yu Wang; Jun Yang

    2006-01-01

    AIM: To shed light on the possible role of mismatch repair gene Mlh3 in familial esophageal cancer (FEC).METHODS: A total of 66 members from 10 families suggestive of a genetic predisposition to hereditary esophageal cancer were screened for germline mutations in Mlh3 with denaturing high performance liquid chromatography (DHPLC), a newly developed method of comparative sequencing based on heteroduplex detection. For all samples exhibiting abnormal DHPLC profiles,sequence changes were evaluated by cycle sequencing.For any mutation in family members, we conducted a segregation study to compare its prevalence in sporadic esophageal cancer patients and normal controls.RESULTS: Exons of Mlh3 in all samples were successfully examined. Overall, 4 missense mutations and 3 polymorphisms were identified in 4 families. Mlh3 missense mutations in families 9 and 10 might be pathogenic, but had a reduced penetrance. While in families 1 and 7,there was no sufficient evidence supporting the monogenic explanations of esophageal cancers in families.The mutations were found in 33% of high-risk families and 50% of low-risk families.CONCLUSION: Mlh3 is a high risk gene with a reduced penetrance in some families. However, it acts as a low risk gene for esophageal cancer in most families. Mutations of Mlh3 may work together with other genes in an accumulated manner and result in an increased risk of esophageal tumor. DHPLC is a robust and sensitive technique for screening gene mutations.

  18. Automated pipeline for atlas-based annotation of gene expresssion patterns: application to postnatal day 7 mouse brain

    Energy Technology Data Exchange (ETDEWEB)

    Carson, James P.; Ju, Tao; Bello, Musodiq; Thaller, Christina; Warren, Joe; Kakadiaris, Ioannis; Chiu, Wah; Eichele, Gregor

    2010-02-01

    Abstract As bio-medical images and volumes are being collected at an increasing speed, there is a growing demand for efficient means to organize spatial information for comparative analysis. In many scenarios, such as determining gene expression patterns by in situ hybridization, the images are collected from multiple subjects over a common anatomical region, such as the brain. A fundamental challenge in comparing spatial data from different images is how to account for the shape variations among subjects, which makes direct image-to-image comparison meaningless. In this paper, we describe subdivision meshes as a geometric means to efficiently organize 2D images and 3D volumes collected from different subjects for comparison. The key advantages of a subdivision mesh for this purpose are its light-weight geometric structure and its explicit modeling of anatomical boundaries, which enable efficient and accurate registration. The multi-resolution structure of a subdivision mesh also allows development of fast comparison algorithms among registered images and volumes.

  19. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF......Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...

  20. FGF: a web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...... to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...... is freely available on a web server at http://fgf.genomics.org.cn/...

  1. FGF: A web tool for Fishing Gene Family in a whole genome database

    DEFF Research Database (Denmark)

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong

    2007-01-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program...... to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF...... is freely available on a web server at http://fgf.genomics.org.cn/...

  2. Prosecutor : parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    NARCIS (Netherlands)

    Blom, E.J.; Breitling, R.; Hofstede, K.J.; Roerdink, J.B.T.M.; van Hijum, S.A F T; Kuipers, O.P.

    2008-01-01

    Background: Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar

  3. Evolutionary diversification of the vertebrate transferrin multi-gene family.

    Science.gov (United States)

    Hughes, Austin L; Friedman, Robert

    2014-11-01

    In a phylogenetic analysis of vertebrate transferrins (TFs), six major clades (subfamilies) were identified: (a) S, the mammalian serotransferrins; (b) ICA, the mammalian inhibitor of carbonic anhydrase (ICA) homologs; (c) L, the mammalian lactoferrins; (d) O, the ovotransferrins of birds and reptiles; (e) M, the melanotransferrins of bony fishes, amphibians, reptiles, birds, and mammals; and (f) M-like, a newly identified TF subfamily found in bony fishes, amphibians, reptiles, and birds. A phylogenetic tree based on the joint alignment of N-lobes and C-lobes supported the hypothesis that three separate events of internal duplication occurred in vertebrate TFs: (a) in the common ancestor of the M subfamily, (b) in the common ancestor of the M-like subfamily, and (c) in the common ancestor of other vertebrate TFs. The S, ICA, and L subfamilies were found only in placental mammals, and the phylogenetic analysis supported the hypothesis that these three subfamilies arose by gene duplication after the divergence of placental mammals from marsupials. The M-like subfamily was unusual in several respects, including the presence of a uniquely high proportion of clade-specific conserved residues, including distinctive but conserved residues in the sites homologous to those functioning in carbonate binding of human serotransferrin. The M-like family also showed an unusually high proportion of cationic residues in the positively charged region corresponding to human lactoferrampin, suggesting a distinctive role of this region in the M-like subfamily, perhaps in antimicrobial defense.

  4. Redox Homeostasis via Gene Families of Ascorbate-Glutathione Pathway

    Directory of Open Access Journals (Sweden)

    Prachi ePandey

    2015-03-01

    Full Text Available The imposition of environmental stresses on plants brings about disturbance in their metabolism thereby negatively affecting their growth and development and leading to reduction in the productivity. One of the manifestations of abiotic and biotic stress conditions is the enhanced production of reactive oxygen species (ROS which can be hazardous to cells. Therefore, in order to protect themselves against toxic ROS, plant cells employ the anti-oxidant defense system. The ascorbate-glutathione pathway (Halliwell-Asada cycle is an indispensible component of the ROS homeostasis mechanism of plants. This pathway entails the antioxidant metabolites: ascorbate, glutathione and NADPH along with the enzymes linking them. The ascorbate-glutathione pathway is functional in different subcellular compartments and all the enzymes of this pathway exist as multiple isoforms. The expression of different isoforms of the enzymes of ascorbate-glutathione pathway is developmentally as well as spatially regulated. Moreover, various abiotic and biotic stress conditions modulate the expression of the enzyme- isoforms differently. It is the intricate regulation of expression of different isoforms of the ascorbate-glutathione pathway enzymes that helps in the maintenance of redox balance in plants under various abiotic and biotic stress conditions. The present review provides an insight into the gene families of the ascorbate-glutathione pathway, shedding light on their role in different abiotic and biotic stress conditions as well as in the growth and development of plants.

  5. The IQD gene family in soybean: structure, phylogeny, evolution and expression.

    Directory of Open Access Journals (Sweden)

    Lin Feng

    Full Text Available Members of the plant-specific IQ67-domain (IQD protein family are involved in plant development and the basal defense response. Although systematic characterization of this family has been carried out in Arabidopsis, tomato (Solanum lycopersicum, Brachypodium distachyon and rice (Oryza sativa, systematic analysis and expression profiling of this gene family in soybean (Glycine max have not previously been reported. In this study, we identified and structurally characterized IQD genes in the soybean genome. A complete set of 67 soybean IQD genes (GmIQD1-67 was identified using Blast search tools, and the genes were clustered into four subfamilies (IQD I-IV based on phylogeny. These soybean IQD genes are distributed unevenly across all 20 chromosomes, with 30 segmental duplication events, suggesting that segmental duplication has played a major role in the expansion of the soybean IQD gene family. Analysis of the Ka/Ks ratios showed that the duplicated genes of the GmIQD family primarily underwent purifying selection. Microsynteny was detected in most pairs: genes in clade 1-3 might be present in genome regions that were inverted, expanded or contracted after the divergence; most gene pairs in clade 4 showed high conservation with little rearrangement among these gene-residing regions. Of the soybean IQD genes examined, six were most highly expressed in young leaves, six in flowers, one in roots and two in nodules. Our qRT-PCR analysis of 24 soybean IQD III genes confirmed that these genes are regulated by MeJA stress. Our findings present a comprehensive overview of the soybean IQD gene family and provide insights into the evolution of this family. In addition, this work lays a solid foundation for further experiments aimed at determining the biological functions of soybean IQD genes in growth and development.

  6. Investigation of genes encoding calcineurin B-like protein family in legumes and their expression analyses in chickpea (Cicer arietinum L..

    Directory of Open Access Journals (Sweden)

    Mukesh Kumar Meena

    Full Text Available Calcium ion (Ca2+ is a ubiquitous second messenger that transmits various internal and external signals including stresses and, therefore, is important for plants' response process. Calcineurin B-like proteins (CBLs are one of the plant calcium sensors, which sense and convey the changes in cytosolic Ca2+-concentration for response process. A search in four leguminous plant (soybean, Medicago truncatula, common bean and chickpea genomes identified 9 to 15 genes in each species that encode CBL proteins. Sequence analyses of CBL peptides and coding sequences (CDS suggested that there are nine original CBL genes in these legumes and some of them were multiplied during whole genome or local gene duplication. Coding sequences of chickpea CBL genes (CaCBL were cloned from their cDNAs and sequenced, and their annotations in the genome assemblies were corrected accordingly. Analyses of protein sequences and gene structures of CBL family in plant kingdom indicated its diverse origin but showed a remarkable conservation in overall protein structure with appearance of complex gene structure in the course of evolution. Expression of CaCBL genes in different tissues and in response to different stress and hormone treatment were studied. Most of the CaCBL genes exhibited high expression in flowers. Expression profile of CaCBL genes in response to different abiotic stresses and hormones related to development and stresses (ABA, auxin, cytokinin, SA and JA at different time intervals suggests their diverse roles in development and plant defence in addition to abiotic stress tolerance. These data not only contribute to a better understanding of the complex regulation of chickpea CBL gene family, but also provide valuable information for further research in chickpea functional genomics.

  7. Investigation of genes encoding calcineurin B-like protein family in legumes and their expression analyses in chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Meena, Mukesh Kumar; Ghawana, Sanjay; Sardar, Atish; Dwivedi, Vikas; Khandal, Hitaishi; Roy, Riti; Chattopadhyay, Debasis

    2015-01-01

    Calcium ion (Ca2+) is a ubiquitous second messenger that transmits various internal and external signals including stresses and, therefore, is important for plants' response process. Calcineurin B-like proteins (CBLs) are one of the plant calcium sensors, which sense and convey the changes in cytosolic Ca2+-concentration for response process. A search in four leguminous plant (soybean, Medicago truncatula, common bean and chickpea) genomes identified 9 to 15 genes in each species that encode CBL proteins. Sequence analyses of CBL peptides and coding sequences (CDS) suggested that there are nine original CBL genes in these legumes and some of them were multiplied during whole genome or local gene duplication. Coding sequences of chickpea CBL genes (CaCBL) were cloned from their cDNAs and sequenced, and their annotations in the genome assemblies were corrected accordingly. Analyses of protein sequences and gene structures of CBL family in plant kingdom indicated its diverse origin but showed a remarkable conservation in overall protein structure with appearance of complex gene structure in the course of evolution. Expression of CaCBL genes in different tissues and in response to different stress and hormone treatment were studied. Most of the CaCBL genes exhibited high expression in flowers. Expression profile of CaCBL genes in response to different abiotic stresses and hormones related to development and stresses (ABA, auxin, cytokinin, SA and JA) at different time intervals suggests their diverse roles in development and plant defence in addition to abiotic stress tolerance. These data not only contribute to a better understanding of the complex regulation of chickpea CBL gene family, but also provide valuable information for further research in chickpea functional genomics.

  8. Genetic Alterations in Familial Breast Cancer: Mapping and Cloning Genes Other Than BRCAl

    Science.gov (United States)

    1997-09-01

    predispose to breast cancer . These mutations are always in the context of Cowden’s Syndrome, and do not appear in families with brest cancer in the...AD AWARD NUMBER DAMD17-94-J-4307 TITLE: Genetic Alterations in Familial Breast Cancer : Mapping and Cloning Genes Other Than BRCA1 PRINCIPAL...Aug97-) Genetic Alterations in Familial Breast Cancer : Mapping and Cloning Genes Other than BRCA1 6. AUTHOR{S) Mary-Clair King, Ph.D. 7

  9. A novel frameshift mutation in the cylindromatosis (CYLD) gene in a Chinese family with multiple familial trichoepithelioma.

    Science.gov (United States)

    Wu, J W; Xiao, S X; Huo, J; An, J G; Ren, J W

    2014-11-01

    Multiple familial trichoepithelioma (MFT) (OMIM: 601606) is an autosomal dominantly inherited disorder characterized by numerous, skin-colored papules and nodules with pilar differentiation. Recently, several mutations in the cylindromatosis (CYLD) gene have been reported in MFT. In this study, a mutation analysis of the CYLD was conducted in a Chinese pedigree of typical MFT. Affected individuals were identified through probands from Shanxi Province, China. Lesional skin biopsy of the proband revealed the typical histopathological characteristics of trichoepithelioma. Individuals belonging to five consecutive generations were similarly affected, which indicated an autosomal dominant inheritance pattern. Genomic DNA was extracted from peripheral blood lymphocytes using standard phenol/chloroform extraction method. All the coding exons (4-20) and exon-intron boundaries of the CYLD gene were amplified by polymerase chain reaction (PCR). Direct sequencing of all PCR products amplified from the complete coding regions of the CYLD gene was performed to identify mutations. Sequencing of the CYLD gene was performed in a further 100 unrelated, unaffected control individuals to exclude the possibility of polymorphism. A novel heterozygous frameshift mutation c.1169_1170delCA (p.Thr390Argfs) was identified in exon 10 of the CYLD gene in the affected family members. This mutation was also detected in unaffected family members, but not in the unrelated, healthy individuals who were also analyzed. Our study expands the database on the CYLD gene mutations in MFT and should be useful in providing genetic counseling and prenatal diagnosis for families affected by MFT.

  10. Functional annotation of hierarchical modularity.

    Directory of Open Access Journals (Sweden)

    Kanchana Padmanabhan

    Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our

  11. Reconciling gene and genome duplication events: using multiple nuclear gene families to infer the phylogeny of the aquatic plant family Pontederiaceae.

    Science.gov (United States)

    Ness, Rob W; Graham, Sean W; Barrett, Spencer C H

    2011-11-01

    Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.

  12. 小鼠舌肌发育相关基因的功能聚类分析%Functional annotation clustering of genes in mouse tongue myogenesis

    Institute of Scientific and Technical Information of China (English)

    丛蔚; 刘波; 蒋玉玲; 肖晶

    2015-01-01

    目的:研究小鼠舌肌发育的分子调控机制。方法:取胚胎第13.25天(E13.25)及 E15.5小鼠舌组织。应用 Affy-metrix Mouse GeneChip,对胎鼠舌发育过程中的差异基因进行筛选。应用 DAVID 网络分析工具对基因进行功能和聚类分析。结果:基因功能和聚类分析表明,在 E13.25高表达的基因主要与细胞周期相关因子(Exo1、Gsk3B、Kif20b、Skp2)和细胞粘附因子(Neo1、lama1)等相关。在 E15.5高表达的基因主要与细胞骨架(titin、Hspb7)相关。结论:小鼠舌组织增殖和特化与细胞周期和细胞粘附基因相关,舌组织分化和成熟主要与细胞骨架相关。%Objective:To gain insight into the molecular mechanisms associated with mouse tongue myogenesis.Methods:Different genes in the tongue at mouse embryonic day 13.25 (E13.25)and 15.5 was investigated using Affymetrix Mouse GeneChip.Using the twice significance of difference as the standard,the molecular mechanisms of tongue development were studied and several molecules re-lated were identified by DAVID functional annotation clustering analysis.Results:Genes of higher expression level at E13.25 were re-lated to cell cycle and cell adhesion,of whom Exo1 ,Gsk3B,Kif20b,Skp2 (cell cycle related factors)and Neo1 and lama1 (cell adhe-sion factors)were activated.While genes of higher expression level at E15.5 were related to cytoskeleton,such as titin and Hspb7. Conclusions:The proliferation and determination of tongue were related with gene clusters of cell cycle and cell adhesion,and,differen-tiation and maturation of tongue were relevant to gene cluster of cytoskeleton.It had highlighted potential cascades and important candi-dates for further investigation on the genetic mechanism and clinical therapy of tongue related diseases.

  13. Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case

    Directory of Open Access Journals (Sweden)

    Kuraku Shigehiro

    2011-06-01

    Full Text Available Abstract Background In understanding the evolutionary process of vertebrates, cyclostomes (hagfishes and lamprey occupy crucial positions. Resolving molecular phylogenetic relationships of cyclostome genes with gnathostomes (jawed vertebrates genes is indispensable in deciphering both the species tree and gene trees. However, molecular phylogenetic analyses, especially those including lamprey genes, have produced highly discordant results between gene families. To efficiently scrutinize this problem using partial genome assemblies of early vertebrates, we focused on the potassium voltage-gated channel, shaker-related (KCNA family, whose members are mostly single-exon. Results Seven sea lamprey KCNA genes as well as six elephant shark genes were identified, and their orthologies to bony vertebrate subgroups were assessed. In contrast to robustly supported orthology of the elephant shark genes to gnathostome subgroups, clear orthology of any sea lamprey gene could not be established. Notably, sea lamprey KCNA sequences displayed unique codon usage pattern and amino acid composition, probably associated with exceptionally high GC-content in their coding regions. This lamprey-specific property of coding sequences was also observed generally for genes outside this gene family. Conclusions Our results suggest that secondary modifications of sequence properties unique to the lamprey lineage may be one of the factors preventing robust orthology assessments of lamprey genes, which deserves further genome-wide validation. The lamprey lineage-specific alteration of protein-coding sequence properties needs to be taken into consideration in tackling the key questions about early vertebrate evolution.

  14. Evolutionary expansion of SPOP and associated TD/POZ gene family: impact of evolutionary route on gene expression pattern.

    Science.gov (United States)

    Choo, Kong-Bung; Chuang, Trees-Juen; Lin, Wan-Yi; Chang, Che-Ming; Tsai, Yao-Hui; Huang, Chiu-Jung

    2010-07-15

    Evolutionary expansion of a gene family may occur at both the DNA and RNA levels. The rat testis-specific Rtdpoz-T2 and -T1 (rT2 and rT1) retrogenes are members of the TD/POZ gene family which also includes the well-characterized SPOP gene. In this study, rT2/rT1 transcriptional activation in cancer cells is demonstrated; the cancer rT2/rT1 transcripts are structurally similar to the embryonic transcripts reported previously in frequent exonization of transposed elements. On database interrogation, we have identified an uncharacterized rT2/rT1-like SPOP paralog, designated as SPOP-like (SPOPL), in the human and rodent genomes. Ka/Ks analysis indicates that the SPOPL genes are under functional constraints implicating biological functions. Phylogenetic analyses further suggest that segmental duplication and retrotransposition events had occurred giving rise to new gene members or retrogenes in the human-rodent ancestors during the evolution of the TD/POZ gene family. Based on this and previous works, a model is proposed to map the routes of evolutionary expansion of the TD/POZ gene family. More importantly, different gene expression patterns of members of the family are depicted: intron-harboring members are ubiquitously expressed whereas retrogenes are expressed in tissue-specific and developmentally regulated manner, and are fortuitously re-activated in cancer cells involving exonization of transposed elements.

  15. Identification and localisation of the NB-LRR gene family within the potato genome

    Directory of Open Access Journals (Sweden)

    Jupe Florian

    2012-02-01

    Full Text Available Abstract Background The potato genome sequence derived from the Solanum tuberosum Group Phureja clone DM1-3 516 R44 provides unparalleled insight into the genome composition and organisation of this important crop. A key class of genes that comprises the vast majority of plant resistance (R genes contains a nucleotide-binding and leucine-rich repeat domain, and is collectively known as NB-LRRs. Results As part of an effort to accelerate the process of functional R gene isolation, we performed an amino acid motif based search of the annotated potato genome and identified 438 NB-LRR type genes among the ~39,000 potato gene models. Of the predicted genes, 77 contain an N-terminal toll/interleukin 1 receptor (TIR-like domain, and 107 of the remaining 361 non-TIR genes contain an N-terminal coiled-coil (CC domain. Physical map positions were established for 370 predicted NB-LRR genes across all 12 potato chromosomes. The majority of NB-LRRs are physically organised within 63 identified clusters, of which 50 are homogeneous in that they contain NB-LRRs derived from a recent common ancestor. Conclusions By establishing the phylogenetic and positional relationship of potato NB-LRRs, our analysis offers significant insight into the evolution of potato R genes. Furthermore, the data provide a blueprint for future efforts to identify and more rapidly clone functional NB-LRR genes from Solanum species.

  16. Multiple members of the plasminogen-apolipoprotein(a) gene family associated with thrombosis

    Energy Technology Data Exchange (ETDEWEB)

    Ichinose, Akitada (Univ. of Washington, Seattle (United States))

    1992-03-31

    Plasminogen and apolipoprotein(a) (apo(a)) are closely related plasma proteins that are associated with hereditary thrombophilia. Low plasminogen levels are found in some patients who developed venous thrombosis, while a population with high plasma concentrations of apo(a) have a higher incidence of arterial thrombosis. Two different gene coding for human apo(a) have been isolated and characterized in order to study and compare these genes with four other closely related genes in the plasminogen-apo(a) gene family. These include the gene coding for plasminogen, two unique plasminogen-related genes, and a gene coding for hepatocyte growth factor. Nucleotide sequence analysis of these genes revealed that the exons and their boundaries of these genes for plasminogen and apo(a), and the plasminogen-related genes, differ only 1-5% in sequence. The types of exon/intron junctions and positions of introns in the molecules are also exactly identical, suggesting that these genes have evolved from an ancestral plasminogen gene via duplication and exon shuffling. By utilizing these results, gene-specific probes have been designed for the analysis of each of the genes in this gene family. The plasminogen and two apo(a) genes were all localized to chromosome 6 by employing the gene-specific primers and genomic DNAs from human-hamster cell hybrids. These data also make it possible to characterize the apo(a) and plasminogen genes in individuals by in vitro amplification.

  17. Inferring hypotheses on functional relationships of genes: Analysis of the Arabidopsis thaliana subtilase gene family.

    Directory of Open Access Journals (Sweden)

    Carsten Rautengarten

    2005-09-01

    Full Text Available The gene family of subtilisin-like serine proteases (subtilases in Arabidopsis thaliana comprises 56 members, divided into six distinct subfamilies. Whereas the members of five subfamilies are similar to pyrolysins, two genes share stronger similarity to animal kexins. Mutant screens confirmed 144 T-DNA insertion lines with knockouts for 55 out of the 56 subtilases. Apart from SDD1, none of the confirmed homozygous mutants revealed any obvious visible phenotypic alteration during growth under standard conditions. Apart from this specific case, forward genetics gave us no hints about the function of the individual 54 non-characterized subtilase genes. Therefore, the main objective of our work was to overcome the shortcomings of the forward genetic approach and to infer alternative experimental approaches by using an integrative bioinformatics and biological approach. Computational analyses based on transcriptional co-expression and co-response pattern revealed at least two expression networks, suggesting that functional redundancy may exist among subtilases with limited similarity. Furthermore, two hubs were identified, which may be involved in signalling or may represent higher-order regulatory factors involved in responses to environmental cues. A particular enrichment of co-regulated genes with metabolic functions was observed for four subtilases possibly representing late responsive elements of environmental stress. The kexin homologs show stronger associations with genes of transcriptional regulation context. Based on the analyses presented here and in accordance with previously characterized subtilases, we propose three main functions of subtilases: involvement in (i control of development, (ii protein turnover, and (iii action as downstream components of signalling cascades. Supplemental material is available in the Plant Subtilase Database (PSDB (http://csbdb.mpimp-golm.mpg.de/psdb.html, as well as from the CSB.DB (http://csbdb.mpimp-golm.mpg.de.

  18. Inferring Hypotheses on Functional Relationships of Genes: Analysis of the Arabidopsis thaliana Subtilase Gene Family.

    Directory of Open Access Journals (Sweden)

    2005-09-01

    Full Text Available The gene family of subtilisin-like serine proteases (subtilases in Arabidopsis thaliana comprises 56 members, divided into six distinct subfamilies. Whereas the members of five subfamilies are similar to pyrolysins, two genes share stronger similarity to animal kexins. Mutant screens confirmed 144 T-DNA insertion lines with knockouts for 55 out of the 56 subtilases. Apart from SDD1, none of the confirmed homozygous mutants revealed any obvious visible phenotypic alteration during growth under standard conditions. Apart from this specific case, forward genetics gave us no hints about the function of the individual 54 non-characterized subtilase genes. Therefore, the main objective of our work was to overcome the shortcomings of the forward genetic approach and to infer alternative experimental approaches by using an integrative bioinformatics and biological approach. Computational analyses based on transcriptional co-expression and co-response pattern revealed at least two expression networks, suggesting that functional redundancy may exist among subtilases with limited similarity. Furthermore, two hubs were identified, which may be involved in signalling or may represent higher-order regulatory factors involved in responses to environmental cues. A particular enrichment of co-regulated genes with metabolic functions was observed for four subtilases possibly representing late responsive elements of environmental stress. The kexin homologs show stronger associations with genes of transcriptional regulation context. Based on the analyses presented here and in accordance with previously characterized subtilases, we propose three main functions of subtilases: involvement in (i control of development, (ii protein turnover, and (iii action as downstream components of signalling cascades. Supplemental material is available in the Plant Subtilase Database (PSDB (http://csbdb.mpimp-golm.mpg.de/psdb.html , as well as from the CSB.DB (http://csbdb.mpimp-golm.mpg.de.

  19. Cloning of the mouse BTG3 gene and definition of a new gene family (the BTG family) involved in the negative control of the cell cycle.

    Science.gov (United States)

    Guéhenneux, F; Duret, L; Callanan, M B; Bouhas, R; Hayette, S; Berthet, C; Samarut, C; Rimokh, R; Birot, A M; Wang, Q; Magaud, J P; Rouault, J P

    1997-03-01

    It is well known that loss of tumor suppressor genes and more generally of antiproliferative genes plays a key role in the development of most tumors. We report here the cloning of the mouse BTG3 gene and show that its human counterpart maps on chromosome 21. This evolutionarily conserved gene codes for a 30 kDa protein and is expressed in most adult murine and human tissues analyzed. However, we demonstrate that its expression is cell cycle dependent and peaks at the end of the G1 phase. This gene is homologous to the human BTG1, BTG2 and TOB genes which were demonstrated to act as inhibitors of cell proliferation. Its description allowed us to define better this seven gene family (the BTG gene family) at the structural level and to speculate about its physiological role in normal and tumoral cells. This family is mainly characterized by the presence of two conserved domains (BTG boxes A and B) of as yet undetermined function which are separated by a non-conserved 20-25 amino acid sequence.

  20. Annotating Coloured Petri Nets

    DEFF Research Database (Denmark)

    Lindstrøm, Bo; Wells, Lisa Marie

    2002-01-01

    Coloured Petri nets (CP-nets) can be used for several fundamentally different purposes like functional analysis, performance analysis, and visualisation. To be able to use the corresponding tool extensions and libraries it is sometimes necessary to include extra auxiliary information in the CP-ne...... a certain use of the CP-net. We define the semantics of annotations by describing a translation from a CP-net and the corresponding annotation layers to another CP-net where the annotations are an integrated part of the CP-net....... a method which makes it possible to associate auxiliary information, called annotations, with tokens without modifying the colour sets of the CP-net. Annotations are pieces of information that are not essential for determining the behaviour of the system being modelled, but are rather added to support...

  1. Functionally recurrent rearrangements of the MAST kinase and Notch gene families in breast cancer.

    Science.gov (United States)

    Robinson, Dan R; Kalyana-Sundaram, Shanker; Wu, Yi-Mi; Shankar, Sunita; Cao, Xuhong; Ateeq, Bushra; Asangani, Irfan A; Iyer, Matthew; Maher, Christopher A; Grasso, Catherine S; Lonigro, Robert J; Quist, Michael; Siddiqui, Javed; Mehra, Rohit; Jing, Xiaojun; Giordano, Thomas J; Sabel, Michael S; Kleer, Celina G; Palanisamy, Nallasivam; Natrajan, Rachael; Lambros, Maryou B; Reis-Filho, Jorge S; Kumar-Sinha, Chandan; Chinnaiyan, Arul M

    2011-11-20

    Breast cancer is a heterogeneous disease that has a wide range of molecular aberrations and clinical outcomes. Here we used paired-end transcriptome sequencing to explore the landscape of gene fusions in a panel of breast cancer cell lines and tissues. We observed that individual breast cancers have a variety of expressed gene fusions. We identified two classes of recurrent gene rearrangements involving genes encoding microtubule-associated serine-threonine kinase (MAST) and members of the Notch family. Both MAST and Notch-family gene fusions have substantial phenotypic effects in breast epithelial cells. Breast cancer cell lines harboring Notch gene rearrangements are uniquely sensitive to inhibition of Notch signaling, and overexpression of MAST1 or MAST2 gene fusions has a proliferative effect both in vitro and in vivo. These findings show that recurrent gene rearrangements have key roles in subsets of carcinomas and suggest that transcriptome sequencing could identify individuals with rare, targetable gene fusions.

  2. Identification and distribution of the NBS-LRR gene family in the cassava genome

    Science.gov (United States)

    Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analyzing the genomic organization of resistance genes i...

  3. An update on the ABCC transporter family in plants: many genes, many proteins, but how many functions?

    Science.gov (United States)

    Wanke, D; Kolukisaoglu, H Uner

    2010-09-01

    The ABCC subfamily of the ATP binding cassette (ABC) transporters, which were formerly known as multidrug resistance-related proteins (MRPs), consists of closely related members found in all eukaryotic organisms. Although more than a decade of intensive research has elapsed since the first MRP protein was functionally characterised in Arabidopsis thaliana, knowledge of this particular transporter family is still limited in plants. Although ABCC proteins were originally defined as vacuolar pumps of glutathione-S (GS) conjugates, evidence, as well as speculation, on their endogenous functions inside the cell ranges from detoxification and heavy metal sequestration, to chlorophyll catabolite transport and ion channel regulation. The characterisation of knockout mutants in Arabidopsis has been pivotal for elucidation of different roles of ABCC transporters. However, a functional annotation for the majority of these transport proteins is still lacking, even in this model plant. On the one hand, this problem seems to be caused by functional redundancy between family members, which might lead to physiological complementation by a highly homologous gene in the mutant lines. On the other hand, there is growing evidence that the functional diversity of ABCC genes in Arabidopsis and other plants is far greater than previously assumed. For example, analysis of microarray expression data supports involvement of ABCC transporters in the response to biotic stress: particular changes in ABCC transcript levels are found, which are pathogen-specific and evoke distinct signalling cascades. Current knowledge about plant ABCC transporters indicates that novel and unexpected functions and substrates of these proteins are still waiting to be elucidated.

  4. Genomewide analysis of the lateral organ boundaries domain gene family in Vitis vinifera.

    Science.gov (United States)

    Cao, Hui; Liu, Cai-Yun; Liu, Chun-Xiang; Zhao, Yue-Ling; Xu, Rui-Rui

    2016-09-01

    In plants, the transcription factor families have been implicated in many important biological processes. These processes include morphogenesis, signal transduction and environmental stress responses. Proteins containing the lateral organ boundaries domain (LBD), which encodes a zinc finger-like domain are only found in plants. This finding indicates that this unique gene family regulates only plant-specific biological processes. LBD genes play crucial roles in the growth and development of plants such as Arabidopsis, Oryza sativa, Zea mays, poplar, apple and tomato. However, relatively little is known about the LBD genes in grape (Vitis vinifera). In this study, we identified 40 LBD genes in the grape genome. A complete overview of the chromosomal locations, phylogenetic relationships, structures and expression profiles of this gene family during development in grape is presented here. Phylogenetic analysis showed that the LBD genes could be divided into classes I and II, together with LBDs from Arabidopsis. We mapped the 40 LBD genes on the grape chromosomes (chr1-chr19) and found that 37 of the predicted grape LBD genes were distributed in different densities across 12 chromosomes. Grape LBDs were found to share a similar intron/exon structure and gene length within the same class. The expression profiles of grape LBD genes at different developmental stages were analysed using microarray data. Results showed that 21 grape LBD genes may be involved in grape developmental processes, including preveraison, veraison and ripening. Finally, we analysed the expression patterns of six LBD genes through quantitative real-time polymerase chain reation analysis. The six LBD genes showed differential expression patterns among the three representative grape tissues, and five of these genes were found to be involved in responses to mannitol, sodium chloride, heat stress and low temperature treatments. To our knowledge, this is the first study to analyse the LBD gene family in

  5. Genomewide analysis of the lateral organ boundaries domain gene family in Vitis vinifera

    Indian Academy of Sciences (India)

    HUI CAO; CAI-YUN LIU; HUN-XIANG LIU; YUE-LING ZHAO; RUI-RUI XU

    2016-09-01

    In plants, the transcription factor families have been implicated in many important biological processes. These processes include morphogenesis, signal transduction and environmental stress responses. Proteins containing the lateral organ bound-aries domain (LBD), which encodes a zinc finger-like domain are only found in plants. This finding indicates that this unique gene family regulates only plant-specific biological processes. LBD genes play crucial roles in the growth and development of plants such as Arabidopsis, Oryza sativa, Zea mays , poplar, apple and tomato. However, relatively little is known about the LBD genes in grape ( Vitis vinifera). In this study, we identified 40 LBD genes in the grape genome. A complete overview of the chromosomal locations, phylogenetic relationships, structures and expression profiles of this gene family during development in grape is presented here. Phylogenetic analysis showed that the LBD genes could be divided into classes I and II, together with LBDs from Arabidopsis. We mapped the 40 LBD genes on the grape chromosomes (chr1–chr19) and found that 37 of the predicted grape LBD genes were distributed in different densities across 12 chromosomes. Grape LBDs were found to share a similar intron/exon structure and gene length within the same class. The expression profiles of grape LBD genes at different developmental stages were analysed using microarray data. Results showed that 21 grape LBD genes may be involved in grape developmental processes, including preveraison, veraison and ripening. Finally, we analysed the expres-sion patterns of six LBD genes through quantitative real-time polymerase chain reation analysis. The six LBD genes showed differential expression patterns among the three representative grape tissues, and five of these genes were found to be involved in responses to mannitol, sodium chloride, heat stress and low temperature treatments. To our knowledge, this is the first study to analyse the LBD gene

  6. Evolution of the chitin synthase gene family correlates with fungal morphogenesis and adaption to ecological niches

    Science.gov (United States)

    Liu, Ran; Xu, Chuan; Zhang, Qiangqiang; Wang, Shiyi; Fang, Weiguo

    2017-01-01

    The fungal kingdom potentially has the most complex chitin synthase (CHS) gene family, but evolution of the fungal CHS gene family and its diversification to fulfill multiple functions remain to be elucidated. Here, we identified the full complement of CHSs from 231 fungal species. Using the largest dataset to date, we characterized the evolution of the fungal CHS gene family using phylogenetic and domain structure analysis. Gene duplication, domain recombination and accretion are major mechanisms underlying the diversification of the fungal CHS gene family, producing at least 7 CHS classes. Contraction of the CHS gene family is morphology-specific, with significant loss in unicellular fungi, whereas family expansion is lineage-specific with obvious expansion in early-diverging fungi. ClassV and ClassVII CHSs with the same domain structure were produced by the recruitment of domains PF00063 and PF08766 and subsequent duplications. Comparative analysis of their functions in multiple fungal species shows that the emergence of ClassV and ClassVII CHSs is important for the morphogenesis of filamentous fungi, development of pathogenicity in pathogenic fungi, and heat stress tolerance in Pezizomycotina fungi. This work reveals the evolution of the fungal CHS gene family, and its correlation with fungal morphogenesis and adaptation to ecological niches. PMID:28300148

  7. Cloning of novel rice blast resistance genes from two rapidly evolving NBS-LRR gene families in rice.

    Science.gov (United States)

    Guo, Changjiang; Sun, Xiaoguang; Chen, Xiao; Yang, Sihai; Li, Jing; Wang, Long; Zhang, Xiaohui

    2016-01-01

    Most rice blast resistance genes (R-genes) encode proteins with nucleotide-binding site (NBS) and leucine-rich repeat (LRR) domains. Our previous study has shown that more rice blast R-genes can be cloned in rapidly evolving NBS-LRR gene families. In the present study, two rapidly evolving R-gene families in rice were selected for cloning a subset of genes from their paralogs in three resistant rice lines. A total of eight functional blast R-genes were identified among nine NBS-LRR genes, and some of these showed resistance to three or more blast strains. Evolutionary analysis indicated that high nucleotide diversity of coding regions served as important parameters in the determination of gene resistance. We also observed that amino-acid variants (nonsynonymous mutations, insertions, or deletions) in essential motifs of the NBS domain contribute to the blast resistance capacity of NBS-LRR genes. These results suggested that the NBS regions might also play an important role in resistance specificity determination. On the other hand, different splicing patterns of introns were commonly observed in R-genes. The results of the present study contribute to improving the effectiveness of R-gene identification by using evolutionary analysis method and acquisition of novel blast resistance genes.

  8. The cloning and expression characterization of the centrosome protein genes family (centrin genes) in rat testis

    Institute of Scientific and Technical Information of China (English)

    SUN; Xiaodong(孙晓冬); GE; Yehua(葛晔华); MA; Jing(马静); YU; Zuoren(俞作仁); LI; Sai(李赛); WANG; Yongchao(王永潮); XUE; Shepu(薛社普); HAN; Daishu(韩代书)

    2002-01-01

    Centrins are members of the centrosome protein family, which is highly conserved during revolution. The homologous genes of centrin in many organisms had been cloned, but the sequences of the rat centrin genes were not reported yet in GenBank. We cloned the cDNA fragments of centrin-1, -2 and -3 from the rat testis by RT-PCR, and analyzed the homology of the deduced amino acid sequences. The expression characterization of centrin genes in rat spermatogenesis was carried out by semi-quantitative RT-PCR. The results show that the homology of the corresponding centrin proteins in human, mouse and rat is high. The expression of centrin-1 is testis-specific, spermatogenic cell-specific and developmental stage-related. Centrin-1 begins to be transcribed when the meiosis occurs, and its mRNA level reaches the peak in round spermatids. Centrin-2 and centrin-3 are highly expressed in spermatogonia and their mRNA level decreases markedly when meiosis occurs. These results suggest that centrin-1 may play roles in meiosis and spermiogenesis, and centrin-2 and centrin-3 may be related to mitosis.

  9. Dictionary-driven protein annotation.

    Science.gov (United States)

    Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

    2002-09-01

    Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were

  10. Genome dynamics explain the evolution of flowering time CCT domain gene families in the Poaceae.

    Directory of Open Access Journals (Sweden)

    James Cockram

    Full Text Available Numerous CCT domain genes are known to control flowering in plants. They belong to the CONSTANS-like (COL and PREUDORESPONSE REGULATOR (PRR gene families, which in addition to a CCT domain possess B-box or response-regulator domains, respectively. Ghd7 is the most recently identified COL gene to have a proven role in the control of flowering time in the Poaceae. However, as it lacks B-box domains, its inclusion within the COL gene family, technically, is incorrect. Here, we show Ghd7 belongs to a larger family of previously uncharacterized Poaceae genes which possess just a single CCT domain, termed here CCT MOTIF FAMILY (CMF genes. We molecularly describe the CMF (and related COL and PRR gene families in four sequenced Poaceae species, as well as in the draft genome assembly of barley (Hordeum vulgare. Genetic mapping of the ten barley CMF genes identified, as well as twelve previously unmapped HvCOL and HvPRR genes, finds the majority map to colinear positions relative to their Poaceae orthologues. Combined inter-/intra-species comparative and phylogenetic analysis of CMF, COL and PRR gene families indicates they evolved prior to the monocot/dicot divergence ∼200 mya, with Poaceae CMF evolution described as the interplay between whole genome duplication in the ancestral cereal, and subsequent clade-specific mutation, deletion and duplication events. Given the proven role of CMF genes in the modulation of cereals flowering, the molecular, phylogenetic and comparative analysis of the Poaceae CMF, COL and PRR gene families presented here provides the foundation from which functional investigation can be undertaken.

  11. Genome dynamics explain the evolution of flowering time CCT domain gene families in the Poaceae.

    Science.gov (United States)

    Cockram, James; Thiel, Thomas; Steuernagel, Burkhard; Stein, Nils; Taudien, Stefan; Bailey, Paul C; O'Sullivan, Donal M

    2012-01-01

    Numerous CCT domain genes are known to control flowering in plants. They belong to the CONSTANS-like (COL) and PREUDORESPONSE REGULATOR (PRR) gene families, which in addition to a CCT domain possess B-box or response-regulator domains, respectively. Ghd7 is the most recently identified COL gene to have a proven role in the control of flowering time in the Poaceae. However, as it lacks B-box domains, its inclusion within the COL gene family, technically, is incorrect. Here, we show Ghd7 belongs to a larger family of previously uncharacterized Poaceae genes which possess just a single CCT domain, termed here CCT MOTIF FAMILY (CMF) genes. We molecularly describe the CMF (and related COL and PRR) gene families in four sequenced Poaceae species, as well as in the draft genome assembly of barley (Hordeum vulgare). Genetic mapping of the ten barley CMF genes identified, as well as twelve previously unmapped HvCOL and HvPRR genes, finds the majority map to colinear positions relative to their Poaceae orthologues. Combined inter-/intra-species comparative and phylogenetic analysis of CMF, COL and PRR gene families indicates they evolved prior to the monocot/dicot divergence ∼200 mya, with Poaceae CMF evolution described as the interplay between whole genome duplication in the ancestral cereal, and subsequent clade-specific mutation, deletion and duplication events. Given the proven role of CMF genes in the modulation of cereals flowering, the molecular, phylogenetic and comparative analysis of the Poaceae CMF, COL and PRR gene families presented here provides the foundation from which functional investigation can be undertaken.

  12. Evolution of the RH gene family in vertebrates revealed by brown hagfish (Eptatretus atami) genome sequences.

    Science.gov (United States)

    Suzuki, Akinori; Komata, Hidero; Iwashita, Shogo; Seto, Shotaro; Ikeya, Hironobu; Tabata, Mitsutoshi; Kitano, Takashi

    2017-02-01

    In vertebrates, there are four major genes in the RH (Rhesus) gene family, RH, RHAG, RHBG, and RHCG. These genes are thought to have been formed by the two rounds of whole-genome duplication (2R-WGD) in the common ancestor of all vertebrates. In our previous work, where we analyzed details of the gene duplications process of this gene family, three nucleotide sequences belonging to this family were identified in Far Eastern brook lamprey (Lethenteron reissneri), and the phylogenetic positions of the genes were determined. Lampreys, along with hagfishes, are cyclostomata (jawless fishes), which is a sister group of gnathostomata (jawed vertebrates). Although those results suggested that one gene was orthologous to the gnathostome RHCG genes, we did not identify clear orthologues for other genes. In this study, therefore, we identified three novel cDNA sequences that belong to the RH gene family using de novo transcriptome analysis of another cyclostome: the brown hagfish (Eptatretus atami). We also determined the nucleotide sequences for the RHBG and RHCG genes in a red stingray (Dasyatis akajei), which belongs to the cartilaginous fishes. The phylogenetic tree showed that two brown hagfish genes, which were probably duplicated in the cyclostome lineage, formed a cluster with the gnathostome RHAG genes, whereas another brown hagfish gene formed a cluster with the gnathostome RHCG genes. We estimated that the RH genes had a higher evolutionary rate than the RHAG, RHBG, and RHCG genes. Interestingly, in the RHBG genes, only the bird lineage showed a higher rate of nonsynonymous substitutions. It is likely that this higher rate was caused by a state of relaxed functional constraints rather than positive selection nor by pseudogenization.

  13. Diversification of the C-TERMINALLY ENCODED PEPTIDE (CEP) gene family in angiosperms, and evolution of plant-family specific CEP genes.

    Science.gov (United States)

    Ogilvie, Huw A; Imin, Nijat; Djordjevic, Michael A

    2014-10-06

    Small, secreted signaling peptides work in parallel with phytohormones to control important aspects of plant growth and development. Genes from the C-TERMINALLY ENCODED PEPTIDE (CEP) family produce such peptides which negatively regulate plant growth, especially under stress, and affect other important developmental processes. To illuminate how the CEP gene family has evolved within the plant kingdom, including its emergence, diversification and variation between lineages, a comprehensive survey was undertaken to identify and characterize CEP genes in 106 plant genomes. Using a motif-based system developed for this study to identify canonical CEP peptide domains, a total of 916 CEP genes and 1,223 CEP domains were found in angiosperms and for the first time in gymnosperms. This defines a narrow band for the emergence of CEP genes in plants, from the divergence of lycophytes to the angiosperm/gymnosperm split. Both CEP genes and domains were found to have diversified in angiosperms, particularly in the Poaceae and Solanaceae plant families. Multispecies orthologous relationships were determined for 22% of identified CEP genes, and further analysis of those groups found selective constraints upon residues within the CEP peptide and within the previously little-characterized variable region. An examination of public Oryza sativa RNA-Seq datasets revealed an expression pattern that links OsCEP5 and OsCEP6 to panicle development and flowering, and CEP gene trees reveal these emerged from a duplication event associated with the Poaceae plant family. The characterization of the plant-family specific CEP genes OsCEP5 and OsCEP6, the association of CEP genes with angiosperm-specific development processes like panicle development, and the diversification of CEP genes in angiosperms provides further support for the hypothesis that CEP genes have been integral to the evolution of novel traits within the angiosperm lineage. Beyond these findings, the comprehensive set of CEP

  14. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  15. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Science.gov (United States)

    Ely, Bert; Scott, LaTia Etheredge

    2014-01-01

    Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  16. Partitioning SNPs Identified By GBS into Genome Annotation Classes and Calculating SNP-Explained Variances for Heading Date and Disease Resistance from the Resulting Genomic Relationship Matrices - Lolium perenne

    DEFF Research Database (Denmark)

    Byrne, Stephen; Cericola, Fabio; Janss, Luc;

    2015-01-01

    , and an average protein Annotation Edit Distance (AED) of 0.14. Genotyping-By-Sequencing (GBS) data was generated after genome complexity reduction with ApeKI for 995 breeding families. Data was aligned against the annotated sequence assembly, and we identified variants at over 1.8 million positions, which were......,273 SNPs), genes with NB-ARC domains (9,056 SNPs), intron (168,023 SNPs), and inter-genic (1,420,866 SNPs). Genomic relationship matrices were created for each annotation class and SNP-explained variances for heading date and disease resistance were calculated...

  17. Omics data management and annotation.

    Science.gov (United States)

    Harel, Arye; Dalah, Irina; Pietrokovski, Shmuel; Safran, Marilyn; Lancet, Doron

    2011-01-01

    Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter.

  18. Detection of filaggrin gene mutation (2282del4) in Pakistani Ichthyosis vulgaris families.

    Science.gov (United States)

    Naz, Naghma; Samdani, Azam Jah

    2011-06-01

    The aim of this study was to detect an 811 bp filaggrin (FLG) gene fragment known to carry a mutation 2282del4 which causes ichthyosis vulgaris. Seven clinically examined ichthyosis vulgaris families were included in this study. An 811 bp FLG gene fragment was targeted in the genomic DNA of all the members of the seven families by PCR amplification using known primers RPT1P7 and RPT2P1. Successful amplification of an 811 bp FLG gene fragment in all the families suggested the possible role of the 2282del4 mutation in causing ichthyosis vulgaris in Pakistani population.

  19. Genomewide identification, classification and analysis of NAC type gene family in maize

    Indian Academy of Sciences (India)

    Xiaojian Peng; Yang Zhao; Xiaoming Li; Min Wu; Wenbo Chai; Lei Sheng; Yu Wang; Qing Dong; Haiyang Jiang; Beijiu Cheng

    2015-09-01

    NAC transcription factors comprise a large plant-specific gene family. Increasing evidence suggests that members of this family have diverse functions in plant growth and development. In this study, we performed a genomewide survey of NAC type genes in maize (Zea mays L.). A complete set of 148 nonredundant NAC genes (ZmNAC1–ZmNAC148) were identified in the maize genome using Blast search tools, and divided into 12 groups (a–l) based on phylogeny. Chromosomal location of these genes revealed that they are distributed unevenly across all 10 chromosomes. Segmental and tandem duplication contributed largely to the expansion of the maize NAC gene family. The a/s ratio suggested that the duplicated genes of maize NAC family mainly experienced purifying selection, with limited functional divergence after duplication events. Microarray analysis indicated most of the maize NAC genes were expressed across different developmental stages. Moreover, 19 maize NAC genes grouped with published stress-responsive genes from other plants were found to contain putative stress-responsive cis-elements in their promoter regions. All these stress-responsive genes belonged to the group d (stress-related). Further, these genes showed differential expression patterns over time in response to drought treatments by quantitative real-time PCR analysis. Our results reveal a comprehensive overview of the maize NAC, and form the foundation for future functional research to uncover their roles in maize growth and development.

  20. GoPipe:批量序列的Gene Ontology注释和统计分析%GoPipe: Streamlined Gene Ontology Annotation for Batch Anonymous Sequences With Statistics

    Institute of Scientific and Technical Information of China (English)

    陈作舟; 薛成海; 朱晟; 周丰丰; XUEFENG BRUCE LING; 刘国平; 陈良标

    2005-01-01

    随着后基因组时代的到来,批量的测序,特别是EST的测序,逐渐成为普通实验室的日常工作.这些新的序列往往需要进行批量的Gene Ontology(GO)的注释及随后的统计分析.但是目前除了Goblet以外,并没有软件适合对未知序列进行批量的GO注释,而GoBlet因为具有上载量的限制,以及仅仅利用BLAST作为预测工具,所以仍有许多不足之处.开发了一个软件包GoPipe,通过整合BLAST和InterProScan的结果来进行序列注释,并提供了进一步作统计比较的工具.主程序接收任意个BLAST和InterProScan的结果文件,并依次进行文本分析、数据整合、去除冗余、统计分析和显示等工作.还提供了统计的工具来比较不同输入对GO的分布来挖掘生物学意义.另外,在交集工作模式下,程序取InterProScan和BLAST结果的交集,在测试数据集中,其精确度达到99.1%,这大大超过了InterProScan本身对GO预测的精确度,而敏感度只是稍微下降.较高的精确度、较快的速度和较大的灵活性使它成为对未知序列进行批量Gene Ontology注释的理想的工具.上述软件包可以在网站(http://gopipe.fishgenome.org/)免费获得或者与作者联系获取.%Accelerated availability of new sequences, especially ESTs, calls for computational methods to link sequences with Gene Ontology (GO) terms in a batch mode. There is currently no program for such purpose except Goblet, an online tool which uses BLAST to interpret query sequence with proper GO terms, but has a restriction of upload sequence files less than 100 kilobytes in size. GoPipe is a standalone package that integrates BLAST and InterProScan results to obtain Gene Ontology annotation with built-in statistical options. GoPipe takes any number of BLAST and/or InterProScan output files simultaneously and launches jobs sequentially to perform parsing, data integration, redundancy removal, GO distributions calculation and graphic display. A very